Nutch 0.5 API

Nutch is the open-source search engine.

See:
          Description

Core
net.nutch.analysis Tokenizer for documents and query parser.
net.nutch.db Web database: tracks page fetches and link structure.
net.nutch.fetcher The Nutch robot.
net.nutch.fs  
net.nutch.html  
net.nutch.indexer Maintain Lucene full-text indexes.
net.nutch.io Generic i/o code for use when reading and writing data to the network, to databases, and to files.
net.nutch.ipc Client/Server code used by distributed search.
net.nutch.linkdb  
net.nutch.net  
net.nutch.net.protocols  
net.nutch.pagedb  
net.nutch.parse  
net.nutch.plugin  
net.nutch.protocol  
net.nutch.quality.dynamic  
net.nutch.searcher Search API
net.nutch.tools  
net.nutch.util  

 

Plugins
net.nutch.analysis.lang Text document language identifier.
net.nutch.indexer.basic A basic indexing plugin.
net.nutch.parse.html An HTML document parsing plugin.
net.nutch.parse.msword A Word document parsing plugin.
net.nutch.parse.msword.chp  
net.nutch.parse.pdf A pdf parsing plugin.
net.nutch.parse.text A plain text parsing plugin.
net.nutch.protocol.file Protocol plugin which supports retrieving local file resources.
net.nutch.protocol.ftp Protocol plugin which supports retrieving documents via the ftp protocol.
net.nutch.protocol.http Protocol plugin which supports retrieving documents via the http protocol.
org.creativecommons.nutch Sample plugins that parse and index Creative Commons medadata.

 

Nutch is the open-source search engine.



Copyright © 2004 The Nutch Organization.