net.nutch.parse
Interface HtmlParseFilter
- All Known Implementing Classes:
- CCParseFilter, HTMLLanguageParser
- public interface HtmlParseFilter
Extension point for DOM-based HTML parsers. Permits one to add additional
metadata to HTML parses. All plugins found which implement this extension
point are run sequentially on the parse.
X_POINT_ID
public static final String X_POINT_ID
- The name of the extension point.
filter
public Parse filter(Content content,
Parse parse,
DocumentFragment doc)
throws ParseException
- Adds metadata or otherwise modifies a parse of HTML content, given
the DOM tree of a page.
- Throws:
ParseException
Copyright © 2004 The Nutch Organization.