net.nutch.analysis.lang
Class HTMLLanguageParser

java.lang.Object
  extended bynet.nutch.analysis.lang.HTMLLanguageParser
All Implemented Interfaces:
HtmlParseFilter

public class HTMLLanguageParser
extends Object
implements HtmlParseFilter

Adds metadata identifying language of document if found


Field Summary
static Logger LOG
           
 
Fields inherited from interface net.nutch.parse.HtmlParseFilter
X_POINT_ID
 
Constructor Summary
HTMLLanguageParser()
           
 
Method Summary
 Parse filter(Content content, Parse parse, DocumentFragment doc)
          Adds metadata or otherwise modifies a parse of an HTML document, given the DOM tree of a page.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

LOG

public static final Logger LOG
Constructor Detail

HTMLLanguageParser

public HTMLLanguageParser()
Method Detail

filter

public Parse filter(Content content,
                    Parse parse,
                    DocumentFragment doc)
             throws ParseException
Adds metadata or otherwise modifies a parse of an HTML document, given the DOM tree of a page. We could also run statistical analysis here but we'd miss all other formats

Specified by:
filter in interface HtmlParseFilter
Throws:
ParseException


Copyright © 2004 The Nutch Organization.