org.creativecommons.nutch
Class CCParseFilter
java.lang.Object
org.creativecommons.nutch.CCParseFilter
- All Implemented Interfaces:
- HtmlParseFilter
- public class CCParseFilter
- extends Object
- implements HtmlParseFilter
Adds metadata identifying the Creative Commons license used, if any.
Nested Class Summary |
static class |
CCParseFilter.Walker
Walks DOM tree, looking for RDF in comments and licenses in anchors. |
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
LOG
public static final Logger LOG
CCParseFilter
public CCParseFilter()
filter
public Parse filter(Content content,
Parse parse,
DocumentFragment doc)
throws ParseException
- Adds metadata or otherwise modifies a parse of an HTML document, given
the DOM tree of a page.
- Specified by:
filter
in interface HtmlParseFilter
- Throws:
ParseException
Copyright © 2004 The Nutch Organization.