Package net.nutch.parse.html

An HTML document parsing plugin.

See:
          Description

Class Summary
DOMContentUtils A collection of methods for extracting content from DOM trees.
DOMContentUtils.LinkParams  
HtmlParser  
RobotsMetaProcessor Class for parsing META Directives from DOM trees.
RobotsMetaProcessor.RobotsMetaIndicator Utility class with indicators for the robots directives "noindex" and "nofollow", and HTTP-EQUIV/no-cache
 

Package net.nutch.parse.html Description

An HTML document parsing plugin.

This package relies on NekoHTML.



Copyright © 2004 The Nutch Organization.