net.nutch.analysis.lang
Class HTMLLanguageParser
java.lang.Object
net.nutch.analysis.lang.HTMLLanguageParser
- All Implemented Interfaces:
- HtmlParseFilter
- public class HTMLLanguageParser
- extends Object
- implements HtmlParseFilter
Adds metadata identifying language of document if found
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
LOG
public static final Logger LOG
HTMLLanguageParser
public HTMLLanguageParser()
filter
public Parse filter(Content content,
Parse parse,
DocumentFragment doc)
throws ParseException
- Adds metadata or otherwise modifies a parse of an HTML document, given
the DOM tree of a page.
We could also run statistical analysis here but we'd miss all other formats
- Specified by:
filter
in interface HtmlParseFilter
- Throws:
ParseException
Copyright © 2004 The Nutch Organization.