A B C D E F G H I J K L M N O P Q R S T U V W X Z _

A

ACRONYM - Static variable in interface net.nutch.analysis.NutchAnalysisConstants
 
AFTER_EQUALS - Static variable in interface net.nutch.quality.dynamic.PageDescriptionConstants
 
ANCHOR_ANALYZER - Static variable in class net.nutch.analysis.NutchDocumentAnalyzer
Analyzer used to analyze anchors.
APOSTROPHE - Static variable in interface net.nutch.analysis.NutchAnalysisConstants
 
ATSIGN - Static variable in interface net.nutch.analysis.NutchAnalysisConstants
 
ArrayFile - class net.nutch.io.ArrayFile.
A dense file-based mapping from integers to values.
ArrayFile() - Constructor for class net.nutch.io.ArrayFile
 
ArrayFile.Reader - class net.nutch.io.ArrayFile.Reader.
Provide access to an existing array file.
ArrayFile.Reader(String) - Constructor for class net.nutch.io.ArrayFile.Reader
Construct an array reader for the named file.
ArrayFile.Writer - class net.nutch.io.ArrayFile.Writer.
Write a new array file.
ArrayFile.Writer(String, Class) - Constructor for class net.nutch.io.ArrayFile.Writer
Create the named file for values of the named class.
ArrayWritable - class net.nutch.io.ArrayWritable.
A Writable for arrays containing instances of a class.
ArrayWritable(Class) - Constructor for class net.nutch.io.ArrayWritable
 
ArrayWritable(Class, Writable[]) - Constructor for class net.nutch.io.ArrayWritable
 
ArrayWritable(String[]) - Constructor for class net.nutch.io.ArrayWritable
 
add(Summary.Fragment) - Method in class net.nutch.searcher.Summary
Adds a fragment to a summary.
add(Object, int) - Method in class net.nutch.util.FibonacciHeap
Adds the Object item, with the supplied priority.
addAttribute(String, String) - Method in class net.nutch.plugin.Extension
Adds a attribute and is only used until model creation at plugin system start up.
addBlock(Block) - Method in class net.nutch.fs.DatanodeInfo
 
addConfResource(String) - Static method in class net.nutch.util.NutchConf
Adds a resource name to the chain of resources read.
addDependency(String) - Method in class net.nutch.plugin.PluginDescriptor
Adds a dependency
addEscapes(String) - Static method in class net.nutch.quality.dynamic.TokenMgrError
Replaces unprintable characters by their espaced (or unicode escaped) equivalents in the given string
addExportedLibRelative(String) - Method in class net.nutch.plugin.PluginDescriptor
Adds a exported library with a relative path to the plugin directory.
addExtension(Extension) - Method in class net.nutch.plugin.ExtensionPoint
Install a coresponding extension to this extension point.
addExtension(Extension) - Method in class net.nutch.plugin.PluginDescriptor
Adds a extension.
addExtensionPoint(ExtensionPoint) - Method in class net.nutch.plugin.PluginDescriptor
Adds a extension point.
addFile(UTF8, Block[]) - Method in class net.nutch.fs.FSDirectory
Add the given filename to the fs.
addFinalizationListener(SoftHashMap.FinalizationListener) - Method in interface net.nutch.util.SoftHashMap.FinalizationNotifier
Registers a SoftHashMap.FinalizationListener for this object.
addJob(Runnable) - Method in class net.nutch.util.ThreadPool
Post a Runnable to the queue.
addLink(Link) - Method in class net.nutch.db.DistributedWebDBWriter
Add a link to the link database
addLink(Link) - Method in interface net.nutch.db.IWebDBWriter
addLink(Link) will add the given Link to the webdb.
addLink(Link) - Method in class net.nutch.db.WebDBWriter
Add a link to the link database
addNGrams(StringBuffer) - Method in class net.nutch.analysis.lang.NGramProfile
Add ngrams to table from a single word
addNGrams(StringBuffer, int) - Method in class net.nutch.analysis.lang.NGramProfile
add ngram from word, n is submitted
addName(Class, String) - Static method in class net.nutch.io.WritableName
Add an alternate name for a class.
addNotExportedLibRelative(String) - Method in class net.nutch.plugin.PluginDescriptor
Adds a not exported library with a plugin directory relativ path.
addPage(Page) - Method in class net.nutch.db.DistributedWebDBWriter
Add a page to the page database
addPage(Page) - Method in interface net.nutch.db.IWebDBWriter
addPage(Page page) will insert a Page object into the webdb.
addPage(Page) - Method in class net.nutch.db.WebDBWriter
Add a page to the page database
addPageIfNotPresent(Page) - Method in class net.nutch.db.DistributedWebDBWriter
Don't replace the one in the database, if there is one.
addPageIfNotPresent(Page, Link) - Method in class net.nutch.db.DistributedWebDBWriter
Don't replace the one in the database, if there is one.
addPageIfNotPresent(Page) - Method in interface net.nutch.db.IWebDBWriter
addPageIfNotPresent(Page) works just like addPage(), except that the insertion will not take place if there is already a Page with that URL in the webdb.
addPageIfNotPresent(Page, Link) - Method in interface net.nutch.db.IWebDBWriter
addPageIfNotPresent(Page, Link) works just like the above addPage(), except that a Link is also conditionally added to the webdb.
addPageIfNotPresent(Page) - Method in class net.nutch.db.WebDBWriter
Don't replace the one in the database, if there is one.
addPageIfNotPresent(Page, Link) - Method in class net.nutch.db.WebDBWriter
Don't replace the one in the database, if there is one.
addPageWithScore(Page) - Method in class net.nutch.db.DistributedWebDBWriter
Add a page to the page database, with a brand-new score
addPageWithScore(Page) - Method in interface net.nutch.db.IWebDBWriter
addPageWithScore(Page page) inserts a Page into the webdb.
addPageWithScore(Page) - Method in class net.nutch.db.WebDBWriter
Add a page to the page database, with a brand-new score
addPatternBackward(String) - Method in class net.nutch.util.TrieStringMatcher
Adds any necessary nodes to the trie so that the given String can be decoded in reverse and the first character is represented by a terminal node.
addPatternForward(String) - Method in class net.nutch.util.TrieStringMatcher
Adds any necessary nodes to the trie so that the given String can be decoded and the last character is represented by a terminal node.
addProhibitedPhrase(String[]) - Method in class net.nutch.searcher.Query
Add a prohibited phrase in the default field.
addProhibitedPhrase(String[], String) - Method in class net.nutch.searcher.Query
Add a prohibited phrase in the specified field.
addProhibitedTerm(String) - Method in class net.nutch.searcher.Query
Add a prohibited term in the default field.
addProhibitedTerm(String, String) - Method in class net.nutch.searcher.Query
Add a prohibited term in the specified field.
addRequiredPhrase(String[]) - Method in class net.nutch.searcher.Query
Add a required phrase in the default field.
addRequiredPhrase(String[], String) - Method in class net.nutch.searcher.Query
Add a required phrase in the specified field.
addRequiredTerm(String) - Method in class net.nutch.searcher.Query
Add a required term in the default field.
addRequiredTerm(String, String) - Method in class net.nutch.searcher.Query
Add a required term in a specified field.
addScore(float) - Method in class net.nutch.util.ScoreStats
Increment the counter in the right place.
addToken(Token) - Method in class net.nutch.analysis.lang.NGramProfile
add token to this profile
addUrlFeatures(Document, String) - Method in class org.creativecommons.nutch.CCIndexingFilter
Add the features represented by a license URL.
add_escapes(String) - Method in class net.nutch.quality.dynamic.ParseException
Used to convert raw characters to their escaped version when these raw version cannot be used as part of an ASCII string literal.
adjustBeginLineColumn(int, int) - Method in class net.nutch.quality.dynamic.SimpleCharStream
Method to adjust line and column numbers for the start of a token.
analyze(StringBuffer) - Method in class net.nutch.analysis.lang.NGramProfile
analyze a piece of text
append(WritableComparable, Writable) - Method in class net.nutch.db.EditSectionGroupWriter
Add an instruction and append it.
append(WritableComparable, Writable) - Method in class net.nutch.db.EditSectionWriter
Add a key/val pair
append(Writable) - Method in class net.nutch.io.ArrayFile.Writer
Append a value to the file.
append(WritableComparable, Writable) - Method in class net.nutch.io.MapFile.Writer
Append a key/value pair to the map.
append(Writable, Writable) - Method in class net.nutch.io.SequenceFile.Writer
Append a key/value pair.
append(byte[], int, int, int) - Method in class net.nutch.io.SequenceFile.Writer
Append a key/value pair.
append(WritableComparable) - Method in class net.nutch.io.SetFile.Writer
Append a key to a set.
append(String) - Method in class net.nutch.parse.msword.WordTextBuffer
 
appendInstructionInfo(EditSectionGroupWriter, Link, int, Writable) - Method in class net.nutch.db.DistributedWebDBWriter.LinkInstructionWriter
Append the LinkInstruction info to the indicated SequenceFile and keep the LI for later reuse.
appendInstructionInfo(EditSectionGroupWriter, Page, int, Writable) - Method in class net.nutch.db.DistributedWebDBWriter.PageInstructionWriter
Append the PageInstruction info to the indicated SequenceFile, and keep the PI for later reuse.
appendInstructionInfo(EditSectionGroupWriter, Page, Link, int, Writable) - Method in class net.nutch.db.DistributedWebDBWriter.PageInstructionWriter
Append the PageInstruction info to the indicated SequenceFile, and keep the PI for later reuse.
appendInstructionInfo(SequenceFile.Writer, Link, int, Writable) - Method in class net.nutch.db.WebDBWriter.LinkInstructionWriter
Append the LinkInstruction info to the indicated SequenceFile and keep the LI for later reuse.
appendInstructionInfo(SequenceFile.Writer, Page, int, Writable) - Method in class net.nutch.db.WebDBWriter.PageInstructionWriter
Append the PageInstruction info to the indicated SequenceFile, and keep the PI for later reuse.
appendInstructionInfo(SequenceFile.Writer, Page, Link, int, Writable) - Method in class net.nutch.db.WebDBWriter.PageInstructionWriter
Append the PageInstruction info to the indicated SequenceFile, and keep the PI for later reuse.
attrName - Variable in class net.nutch.parse.html.DOMContentUtils.LinkParams
 

B

BLOCKREPORT_INTERVAL - Static variable in interface net.nutch.fs.FSConstants
 
BLOCK_SIZE - Static variable in interface net.nutch.fs.FSConstants
 
BasicIndexingFilter - class net.nutch.indexer.basic.BasicIndexingFilter.
Adds basic searchable fields to a document.
BasicIndexingFilter() - Constructor for class net.nutch.indexer.basic.BasicIndexingFilter
 
BeginToken() - Method in class net.nutch.quality.dynamic.SimpleCharStream
 
Block - class net.nutch.fs.Block.
A Block is a Nutch FS primitive, identified by a long.
Block() - Constructor for class net.nutch.fs.Block
 
Block(long) - Constructor for class net.nutch.fs.Block
 
Block(File) - Constructor for class net.nutch.fs.Block
Find the blockid from the given filename
BooleanWritable - class net.nutch.io.BooleanWritable.
A WritableComparable for booleans.
BooleanWritable() - Constructor for class net.nutch.io.BooleanWritable
 
BooleanWritable(boolean) - Constructor for class net.nutch.io.BooleanWritable
 
BooleanWritable.Comparator - class net.nutch.io.BooleanWritable.Comparator.
A Comparator optimized for BooleanWritable.
BooleanWritable.Comparator() - Constructor for class net.nutch.io.BooleanWritable.Comparator
 
BytesWritable - class net.nutch.io.BytesWritable.
A Writable for byte arrays.
BytesWritable() - Constructor for class net.nutch.io.BytesWritable
 
BytesWritable(byte[]) - Constructor for class net.nutch.io.BytesWritable
 
backup(int) - Method in class net.nutch.quality.dynamic.SimpleCharStream
 
beginColumn - Variable in class net.nutch.quality.dynamic.Token
beginLine and beginColumn describe the position of the first character of this token; endLine and endColumn describe the position of the last character of this token.
beginLine - Variable in class net.nutch.quality.dynamic.Token
beginLine and beginColumn describe the position of the first character of this token; endLine and endColumn describe the position of the last character of this token.
blockReceived(Block, UTF8) - Method in class net.nutch.fs.FSNamesystem
The given node is reporting that it received a certain block.
bufcolumn - Variable in class net.nutch.quality.dynamic.SimpleCharStream
 
buffer - Variable in class net.nutch.quality.dynamic.SimpleCharStream
 
bufline - Variable in class net.nutch.quality.dynamic.SimpleCharStream
 
bufpos - Variable in class net.nutch.quality.dynamic.SimpleCharStream
 

C

CCDeleteUnlicensedTool - class org.creativecommons.nutch.CCDeleteUnlicensedTool.
Deletes documents in a set of Lucene indexes that do not have a Creative Commons license.
CCDeleteUnlicensedTool(IndexReader[]) - Constructor for class org.creativecommons.nutch.CCDeleteUnlicensedTool
Constructs a duplicate detector for the provided indexes.
CCIndexingFilter - class org.creativecommons.nutch.CCIndexingFilter.
Adds basic searchable fields to a document.
CCIndexingFilter() - Constructor for class org.creativecommons.nutch.CCIndexingFilter
 
CCParseFilter - class org.creativecommons.nutch.CCParseFilter.
Adds metadata identifying the Creative Commons license used, if any.
CCParseFilter() - Constructor for class org.creativecommons.nutch.CCParseFilter
 
CCParseFilter.Walker - class org.creativecommons.nutch.CCParseFilter.Walker.
Walks DOM tree, looking for RDF in comments and licenses in anchors.
CCQueryFilter - class org.creativecommons.nutch.CCQueryFilter.
Handles "cc:" query clauses, causing them to search the "cc" field indexed by CCIndexingFilter.
CCQueryFilter() - Constructor for class org.creativecommons.nutch.CCQueryFilter
 
CHUNKED_ENCODING - Static variable in interface net.nutch.fs.FSConstants
 
CJK - Static variable in interface net.nutch.analysis.NutchAnalysisConstants
 
COLON - Static variable in interface net.nutch.analysis.NutchAnalysisConstants
 
CONTENT_ANALYZER - Static variable in class net.nutch.analysis.NutchDocumentAnalyzer
Analyzer used to index textual content.
Client - class net.nutch.ipc.Client.
A client for an IPC service.
Client(Class) - Constructor for class net.nutch.ipc.Client
Construct an IPC client whose values are of the given Writable class.
Client - class net.nutch.protocol.ftp.Client.
Client.java encapsulates functionalities necessary for nutch to get dir list and retrieve file from an FTP server.
Client() - Constructor for class net.nutch.protocol.ftp.Client
 
CommandRunner - class net.nutch.util.CommandRunner.
 
CommandRunner() - Constructor for class net.nutch.util.CommandRunner
 
CommonGrams - class net.nutch.analysis.CommonGrams.
Construct n-grams for frequently occuring terms and phrases while indexing.
Content - class net.nutch.protocol.Content.
 
Content() - Constructor for class net.nutch.protocol.Content
 
Content(String, String, byte[], String, Properties) - Constructor for class net.nutch.protocol.Content
 
CrawlTool - class net.nutch.tools.CrawlTool.
 
CrawlTool() - Constructor for class net.nutch.tools.CrawlTool
 
call(Writable) - Method in class net.nutch.fs.NDFS.NameNode
This method implements the call invoked by client.
call(Writable, InetSocketAddress) - Method in class net.nutch.ipc.Client
Make a call, passing param, to the IPC server running at address, returning the value.
call(Writable[], InetSocketAddress[]) - Method in class net.nutch.ipc.Client
Makes a set of calls in parallel.
call(Writable) - Method in class net.nutch.ipc.Server
Called for each call.
call(Writable) - Method in class net.nutch.searcher.DistributedSearch.Server
 
childLen - Variable in class net.nutch.parse.html.DOMContentUtils.LinkParams
 
children - Variable in class net.nutch.util.TrieStringMatcher.TrieNode
 
childrenList - Variable in class net.nutch.util.TrieStringMatcher.TrieNode
 
clear() - Method in class net.nutch.util.SoftHashMap
 
clone() - Method in class net.nutch.db.Page
 
clone() - Method in class net.nutch.pagedb.FetchListEntry
 
clone() - Method in class net.nutch.searcher.Query.Clause
 
clone() - Method in class net.nutch.searcher.Query
 
close() - Method in class net.nutch.db.DBSectionReader
 
close() - Method in class net.nutch.db.DistributedWebDBReader
Shutdown
close() - Method in class net.nutch.db.DistributedWebDBWriter
Shutdown
close() - Method in class net.nutch.db.EditSectionGroupWriter
Close down the writers
close() - Method in class net.nutch.db.EditSectionWriter
Close down the EditSectionWriter.
close() - Method in interface net.nutch.db.IWebDBReader
Done reading.
close() - Method in interface net.nutch.db.IWebDBWriter
Flush and complete all writes to the db.
close() - Method in class net.nutch.db.WebDBInjector
Close dbWriter and save changes
close() - Method in class net.nutch.db.WebDBReader
Shutdown
close() - Method in class net.nutch.db.WebDBWriter
Shutdown
close() - Method in class net.nutch.fs.FSDirectory
Shutdown the filestore
close() - Method in class net.nutch.fs.FSNamesystem
 
close() - Method in class net.nutch.indexer.DeleteDuplicates
Closes the indexes, saving changes.
close() - Method in class net.nutch.io.MapFile.Reader
Close the map.
close() - Method in class net.nutch.io.MapFile.Writer
Close the map.
close() - Method in class net.nutch.io.SequenceFile.Reader
Close the file.
close() - Method in class net.nutch.io.SequenceFile.Writer
Close the file.
close() - Method in class net.nutch.tools.UpdateDatabaseTool
Shut everything down.
close() - Method in interface net.nutch.util.NutchFileSystem
Close down the fs.
close() - Method in class net.nutch.util.NutchGenericFileSystem
Close down the Generic File System
close() - Method in class org.creativecommons.nutch.CCDeleteUnlicensedTool
Closes the indexes, saving changes.
column - Variable in class net.nutch.quality.dynamic.SimpleCharStream
 
compare(WritableComparable, WritableComparable) - Method in class net.nutch.db.DistributedWebDBWriter.LinkInstruction.MD5Comparator
 
compare(byte[], int, int, byte[], int, int) - Method in class net.nutch.db.DistributedWebDBWriter.LinkInstruction.MD5Comparator
Optimized comparator.
compare(WritableComparable, WritableComparable) - Method in class net.nutch.db.DistributedWebDBWriter.LinkInstruction.UrlComparator
 
compare(byte[], int, int, byte[], int, int) - Method in class net.nutch.db.DistributedWebDBWriter.LinkInstruction.UrlComparator
Optimized comparator.
compare(byte[], int, int, byte[], int, int) - Method in class net.nutch.db.DistributedWebDBWriter.PageInstruction.PageComparator
Optimized comparator.
compare(WritableComparable, WritableComparable) - Method in class net.nutch.db.DistributedWebDBWriter.PageInstruction.UrlComparator
We need to sort by ordered URLs.
compare(byte[], int, int, byte[], int, int) - Method in class net.nutch.db.DistributedWebDBWriter.PageInstruction.UrlComparator
Optimized comparator.
compare(WritableComparable, WritableComparable) - Method in class net.nutch.db.Link.MD5Comparator
 
compare(byte[], int, int, byte[], int, int) - Method in class net.nutch.db.Link.MD5Comparator
Optimized comparator.
compare(WritableComparable, WritableComparable) - Method in class net.nutch.db.Link.UrlComparator
 
compare(byte[], int, int, byte[], int, int) - Method in class net.nutch.db.Link.UrlComparator
Optimized comparator.
compare(byte[], int, int, byte[], int, int) - Method in class net.nutch.db.Page.Comparator
Optimized comparator.
compare(WritableComparable, WritableComparable) - Method in class net.nutch.db.Page.UrlComparator
 
compare(byte[], int, int, byte[], int, int) - Method in class net.nutch.db.Page.UrlComparator
Optimized comparator.
compare(WritableComparable, WritableComparable) - Method in class net.nutch.db.WebDBWriter.LinkInstruction.MD5Comparator
 
compare(byte[], int, int, byte[], int, int) - Method in class net.nutch.db.WebDBWriter.LinkInstruction.MD5Comparator
Optimized comparator.
compare(WritableComparable, WritableComparable) - Method in class net.nutch.db.WebDBWriter.LinkInstruction.UrlComparator
 
compare(byte[], int, int, byte[], int, int) - Method in class net.nutch.db.WebDBWriter.LinkInstruction.UrlComparator
Optimized comparator.
compare(byte[], int, int, byte[], int, int) - Method in class net.nutch.db.WebDBWriter.PageInstruction.PageComparator
Optimized comparator.
compare(WritableComparable, WritableComparable) - Method in class net.nutch.db.WebDBWriter.PageInstruction.UrlComparator
We need to sort by ordered URLs.
compare(byte[], int, int, byte[], int, int) - Method in class net.nutch.db.WebDBWriter.PageInstruction.UrlComparator
Optimized comparator.
compare(byte[], int, int, byte[], int, int) - Method in class net.nutch.indexer.DeleteDuplicates.IndexedDoc.ByHashDoc
 
compare(byte[], int, int, byte[], int, int) - Method in class net.nutch.indexer.DeleteDuplicates.IndexedDoc.ByHashScore
 
compare(byte[], int, int, byte[], int, int) - Method in class net.nutch.io.BooleanWritable.Comparator
 
compare(byte[], int, int, byte[], int, int) - Method in class net.nutch.io.IntWritable.Comparator
 
compare(byte[], int, int, byte[], int, int) - Method in class net.nutch.io.LongWritable.Comparator
 
compare(byte[], int, int, byte[], int, int) - Method in class net.nutch.io.MD5Hash.Comparator
 
compare(byte[], int, int, byte[], int, int) - Method in class net.nutch.io.UTF8.Comparator
 
compare(byte[], int, int, byte[], int, int) - Method in class net.nutch.io.WritableComparator
Optimization hook.
compare(WritableComparable, WritableComparable) - Method in class net.nutch.io.WritableComparator
Compare two WritableComparables.
compareBytes(byte[], int, int, byte[], int, int) - Static method in class net.nutch.io.WritableComparator
Lexicographic order of binary data.
compareTo(Object) - Method in class net.nutch.db.DistributedWebDBWriter.LinkInstruction
 
compareTo(Object) - Method in class net.nutch.db.DistributedWebDBWriter.PageInstruction
 
compareTo(Object) - Method in class net.nutch.db.Link
 
compareTo(Object) - Method in class net.nutch.db.Page
Compare to another Page object
compareTo(Object) - Method in class net.nutch.db.WebDBWriter.LinkInstruction
 
compareTo(Object) - Method in class net.nutch.db.WebDBWriter.PageInstruction
 
compareTo(Object) - Method in class net.nutch.fs.Block
 
compareTo(Object) - Method in class net.nutch.fs.DatanodeInfo
 
compareTo(Object) - Method in class net.nutch.indexer.DeleteDuplicates.IndexedDoc
 
compareTo(Object) - Method in class net.nutch.io.BooleanWritable
 
compareTo(Object) - Method in class net.nutch.io.IntWritable
Compares two IntWritables.
compareTo(Object) - Method in class net.nutch.io.LongWritable
Compares two LongWritables.
compareTo(Object) - Method in class net.nutch.io.MD5Hash
Compares this object with the specified object for order.
compareTo(Object) - Method in class net.nutch.io.UTF8
Compare two UTF8s.
compareTo(Object) - Method in class net.nutch.searcher.Hit
 
compareTo(Object) - Method in class net.nutch.tools.FetchListTool.SortableScore
Sort them in descending order!
compareTo(Object) - Method in class net.nutch.util.TrieStringMatcher.TrieNode
 
completeDir(NutchFile) - Method in interface net.nutch.util.NutchFileSystem
Sometimes the NutchFileSystem user constructs a directory of many subparts, often built slowly over time.
completeDir(NutchFile) - Method in class net.nutch.util.NutchGenericFileSystem
Complete the given directory
completeFile(UTF8) - Method in class net.nutch.fs.FSNamesystem
Finalize the created file and make it world-accessible.
completeRound(File, File) - Method in class net.nutch.tools.DistributedAnalysisTool
This method collates and executes all the instructions computed by the many executors of computeRound().
compound(String) - Method in class net.nutch.analysis.NutchAnalysis
Parse a compound term that is interpreted as an implicit phrase query.
computeDomainID() - Method in class net.nutch.db.Page
Compute domain ID from URL
computeRound(int, File) - Method in class net.nutch.tools.DistributedAnalysisTool
This method is invoked by one of the many processes involved in LinkAnalysis.
contains(Object) - Method in class net.nutch.util.FibonacciHeap
Returns true if item exists in this FibonacciHeap, false otherwise.
containsKey(Object) - Method in class net.nutch.util.SoftHashMap
Returns true if this map contains a mapping for the specified key.
containsValue(Object) - Method in class net.nutch.util.SoftHashMap
Not Implemented Note that the finalizer may invalidate the result an implementation would return.
coord(int, int) - Method in class net.nutch.indexer.NutchSimilarity
 
copyContents(File, File, boolean) - Static method in class net.nutch.util.FileUtil
Copy a file's contents to a new location.
copyFile(File, String, String, String, boolean) - Method in class net.nutch.util.NutchGenericFileSystem
To be implemented by subclasses
copyFile(File, String, String, String, boolean) - Method in class net.nutch.util.NutchNFSFileSystem
Copy a file to the right place in the local dir, which assumes NFS-connectivity.
copyFile(File, String, String, String, boolean) - Method in class net.nutch.util.NutchRemoteFileSystem
Copy a file from one place to another.
create(UTF8) - Method in class net.nutch.fs.NDFSClient
Create an output stream that writes to all the right places.
createDB(NutchFileSystem, String, int) - Static method in class net.nutch.db.DistributedWebDBWriter
Method useful for the first time we create a distributed db project.
createEditGroup(NutchFileSystem, String, String, int, int) - Static method in class net.nutch.db.EditSectionGroupWriter
Initialize an EditSectionGroup.
createNgramProfile(String, InputStream) - Static method in class net.nutch.analysis.lang.NGramProfile
Creates a new Language profile from (preferably quite large) text file
createSocketAddr(String) - Static method in class net.nutch.fs.NDFS
Util method to build socket addr from string
createWebDB(File) - Static method in class net.nutch.db.WebDBWriter
Create the WebDB for the first time.
curChar - Variable in class net.nutch.analysis.NutchAnalysisTokenManager
 
curChar - Variable in class net.nutch.quality.dynamic.PageDescriptionTokenManager
 
currentToken - Variable in class net.nutch.quality.dynamic.ParseException
This is the last token that has been consumed successfully.

D

DATANODE_STARTUP_PERIOD - Static variable in interface net.nutch.fs.FSConstants
 
DATA_FILE_NAME - Static variable in class net.nutch.io.MapFile
The name of the data file.
DBKeyDivision - class net.nutch.db.DBKeyDivision.
DBKeyDivision exists for other DB classes to figure out how to find the right distributed-DB section.
DBKeyDivision() - Constructor for class net.nutch.db.DBKeyDivision
 
DBSectionReader - class net.nutch.db.DBSectionReader.
DBSectionReader reads a discrete portion of a WebDB.
DBSectionReader(File, WritableComparator) - Constructor for class net.nutch.db.DBSectionReader
Right now we assume we're getting a File that is a MapFile.Reader directory.
DEFAULT - Static variable in interface net.nutch.analysis.NutchAnalysisConstants
 
DEFAULT - Static variable in interface net.nutch.quality.dynamic.PageDescriptionConstants
 
DEFAULT_FIELD - Static variable in class net.nutch.searcher.Query.Clause
 
DIGIT - Static variable in interface net.nutch.analysis.NutchAnalysisConstants
 
DIR_NAME - Static variable in class net.nutch.fetcher.FetcherOutput
 
DIR_NAME - Static variable in class net.nutch.pagedb.FetchListEntry
 
DIR_NAME - Static variable in class net.nutch.parse.ParseData
 
DIR_NAME - Static variable in class net.nutch.parse.ParseText
 
DIR_NAME - Static variable in class net.nutch.protocol.Content
 
DOMContentUtils - class net.nutch.parse.html.DOMContentUtils.
A collection of methods for extracting content from DOM trees.
DOMContentUtils() - Constructor for class net.nutch.parse.html.DOMContentUtils
 
DOMContentUtils.LinkParams - class net.nutch.parse.html.DOMContentUtils.LinkParams.
 
DOMContentUtils.LinkParams(String, String, int) - Constructor for class net.nutch.parse.html.DOMContentUtils.LinkParams
 
DONE_NAME - Static variable in class net.nutch.fetcher.FetcherOutput
 
DONE_NAME - Static variable in class net.nutch.indexer.IndexMerger
 
DONE_NAME - Static variable in class net.nutch.indexer.IndexOptimizer
 
DONE_NAME - Static variable in class net.nutch.indexer.IndexSegment
 
DOT - Static variable in interface net.nutch.analysis.NutchAnalysisConstants
 
DataInputBuffer - class net.nutch.io.DataInputBuffer.
A reusable DataInput implementation that reads from an in-memory buffer.
DataInputBuffer() - Constructor for class net.nutch.io.DataInputBuffer
Constructs a new empty buffer.
DataOutputBuffer - class net.nutch.io.DataOutputBuffer.
A reusable DataOutput implementation that writes to an in-memory buffer.
DataOutputBuffer() - Constructor for class net.nutch.io.DataOutputBuffer
Constructs a new empty buffer.
DatanodeInfo - class net.nutch.fs.DatanodeInfo.
DatanodeInfo tracks stats on a given node
DatanodeInfo() - Constructor for class net.nutch.fs.DatanodeInfo
 
DatanodeInfo(UTF8) - Constructor for class net.nutch.fs.DatanodeInfo
 
DatanodeInfo(UTF8, UTF8, int, long, long) - Constructor for class net.nutch.fs.DatanodeInfo
 
DeleteDuplicates - class net.nutch.indexer.DeleteDuplicates.
Deletes duplicate documents in a set of Lucene indexes.
DeleteDuplicates(IndexReader[], String) - Constructor for class net.nutch.indexer.DeleteDuplicates
Constructs a duplicate detector for the provided indexes.
DeleteDuplicates.IndexedDoc - class net.nutch.indexer.DeleteDuplicates.IndexedDoc.
The key used in sorting for duplicates.
DeleteDuplicates.IndexedDoc() - Constructor for class net.nutch.indexer.DeleteDuplicates.IndexedDoc
 
DeleteDuplicates.IndexedDoc.ByHashDoc - class net.nutch.indexer.DeleteDuplicates.IndexedDoc.ByHashDoc.
Order equal hashes by decreasing index and document.
DeleteDuplicates.IndexedDoc.ByHashDoc() - Constructor for class net.nutch.indexer.DeleteDuplicates.IndexedDoc.ByHashDoc
 
DeleteDuplicates.IndexedDoc.ByHashScore - class net.nutch.indexer.DeleteDuplicates.IndexedDoc.ByHashScore.
Order equal hashes by decreasing score and increasing urlLen.
DeleteDuplicates.IndexedDoc.ByHashScore() - Constructor for class net.nutch.indexer.DeleteDuplicates.IndexedDoc.ByHashScore
 
DistributedAnalysisTool - class net.nutch.tools.DistributedAnalysisTool.
DistributedAnalysisTool performs link-analysis by reading exclusively from a IWebDBReader, and writing to an IWebDBWriter.
DistributedAnalysisTool(File) - Constructor for class net.nutch.tools.DistributedAnalysisTool
Give the pagedb and linkdb files and their cache sizes
DistributedSearch - class net.nutch.searcher.DistributedSearch.
Implements the search API over IPC connnections.
DistributedSearch.Client - class net.nutch.searcher.DistributedSearch.Client.
The search client.
DistributedSearch.Client(File) - Constructor for class net.nutch.searcher.DistributedSearch.Client
Construct a client talking to servers listed in the named file.
DistributedSearch.Client(InetSocketAddress[]) - Constructor for class net.nutch.searcher.DistributedSearch.Client
Construct a client talking to the named servers.
DistributedSearch.Param - class net.nutch.searcher.DistributedSearch.Param.
The parameter passed with IPC requests.
DistributedSearch.Param() - Constructor for class net.nutch.searcher.DistributedSearch.Param
 
DistributedSearch.Result - class net.nutch.searcher.DistributedSearch.Result.
The parameter returned with IPC responses.
DistributedSearch.Result() - Constructor for class net.nutch.searcher.DistributedSearch.Result
 
DistributedSearch.Server - class net.nutch.searcher.DistributedSearch.Server.
The search server.
DistributedSearch.Server(File, int) - Constructor for class net.nutch.searcher.DistributedSearch.Server
Construct a search server on the index and segments in the named directory, listening on the named port.
DistributedWebDBReader - class net.nutch.db.DistributedWebDBReader.
The WebDBReader implements all the read-only parts of accessing our web database.
DistributedWebDBReader(NutchFileSystem, String) - Constructor for class net.nutch.db.DistributedWebDBReader
Open a web db reader for the named directory.
DistributedWebDBWriter - class net.nutch.db.DistributedWebDBWriter.
This is a wrapper class that allows us to reorder write operations to the linkdb and pagedb.
DistributedWebDBWriter(NutchFileSystem, String, int) - Constructor for class net.nutch.db.DistributedWebDBWriter
Open the db files.
DistributedWebDBWriter.LinkInstruction - class net.nutch.db.DistributedWebDBWriter.LinkInstruction.
Holds an instruction over a Link.
DistributedWebDBWriter.LinkInstruction() - Constructor for class net.nutch.db.DistributedWebDBWriter.LinkInstruction
 
DistributedWebDBWriter.LinkInstruction(Link, int) - Constructor for class net.nutch.db.DistributedWebDBWriter.LinkInstruction
 
DistributedWebDBWriter.LinkInstruction.MD5Comparator - class net.nutch.db.DistributedWebDBWriter.LinkInstruction.MD5Comparator.
Sorts the instruction first by Md5, then by opcode.
DistributedWebDBWriter.LinkInstruction.MD5Comparator() - Constructor for class net.nutch.db.DistributedWebDBWriter.LinkInstruction.MD5Comparator
 
DistributedWebDBWriter.LinkInstruction.UrlComparator - class net.nutch.db.DistributedWebDBWriter.LinkInstruction.UrlComparator.
Sorts the instruction first by url, then by opcode.
DistributedWebDBWriter.LinkInstruction.UrlComparator() - Constructor for class net.nutch.db.DistributedWebDBWriter.LinkInstruction.UrlComparator
 
DistributedWebDBWriter.LinkInstructionWriter - class net.nutch.db.DistributedWebDBWriter.LinkInstructionWriter.
LinkInstructionWriter very efficiently writes a LinkInstruction to an EditSectionGroupWriter.
DistributedWebDBWriter.LinkInstructionWriter() - Constructor for class net.nutch.db.DistributedWebDBWriter.LinkInstructionWriter
 
DistributedWebDBWriter.PageInstruction - class net.nutch.db.DistributedWebDBWriter.PageInstruction.
PageInstruction holds an operation over a Page.
DistributedWebDBWriter.PageInstruction() - Constructor for class net.nutch.db.DistributedWebDBWriter.PageInstruction
 
DistributedWebDBWriter.PageInstruction(Page, int) - Constructor for class net.nutch.db.DistributedWebDBWriter.PageInstruction
 
DistributedWebDBWriter.PageInstruction(Page, Link, int) - Constructor for class net.nutch.db.DistributedWebDBWriter.PageInstruction
 
DistributedWebDBWriter.PageInstruction.PageComparator - class net.nutch.db.DistributedWebDBWriter.PageInstruction.PageComparator.
Sorts the instruction first by Page, then by opcode.
DistributedWebDBWriter.PageInstruction.PageComparator() - Constructor for class net.nutch.db.DistributedWebDBWriter.PageInstruction.PageComparator
 
DistributedWebDBWriter.PageInstruction.UrlComparator - class net.nutch.db.DistributedWebDBWriter.PageInstruction.UrlComparator.
Sorts the instruction first by url, then by opcode.
DistributedWebDBWriter.PageInstruction.UrlComparator() - Constructor for class net.nutch.db.DistributedWebDBWriter.PageInstruction.UrlComparator
 
DistributedWebDBWriter.PageInstructionWriter - class net.nutch.db.DistributedWebDBWriter.PageInstructionWriter.
PageInstructionWriter very efficiently writes a PageInstruction to an EditSectionGroupWriter.
DistributedWebDBWriter.PageInstructionWriter() - Constructor for class net.nutch.db.DistributedWebDBWriter.PageInstructionWriter
 
Done() - Method in class net.nutch.quality.dynamic.SimpleCharStream
 
DumpSegment - class net.nutch.tools.DumpSegment.
Dump FetcherOutput, ParseData and ParseText for every record in one segment.
DumpSegment(String) - Constructor for class net.nutch.tools.DumpSegment
 
debugStream - Variable in class net.nutch.analysis.NutchAnalysisTokenManager
 
debugStream - Variable in class net.nutch.quality.dynamic.PageDescriptionTokenManager
 
decreaseKey(Object, int) - Method in class net.nutch.util.FibonacciHeap
Decreases the priority value associated with item.
delete() - Method in class net.nutch.db.EditSectionGroupReader
Get rid of the edits encapsulated by this file.
delete(UTF8) - Method in class net.nutch.fs.FSDirectory
Remove the file from management, return blocks
delete(UTF8) - Method in class net.nutch.fs.FSNamesystem
Remove the indicated filename from the namespace.
delete(UTF8) - Method in class net.nutch.fs.NDFSClient
Make a direct connection to namenode and manipulate structures there.
delete(String) - Method in class net.nutch.fs.TestClient
Delete an NDFS file
delete(String) - Static method in class net.nutch.io.MapFile
Deletes the named map file.
delete(NutchFile) - Method in interface net.nutch.util.NutchFileSystem
Delete the given NutchFile and everything below it.
delete(NutchFile) - Method in class net.nutch.util.NutchGenericFileSystem
Take the file out of the NutchFileSystem.
deleteContentDuplicates() - Method in class net.nutch.indexer.DeleteDuplicates
Delete pages with duplicate content hashes.
deleteFile(String, String, String) - Method in class net.nutch.util.NutchGenericFileSystem
 
deleteFile(String, String, String) - Method in class net.nutch.util.NutchNFSFileSystem
Remove a file from its current location.
deleteFile(String, String, String) - Method in class net.nutch.util.NutchRemoteFileSystem
Remove a file the given location.
deleteLink(MD5Hash) - Method in class net.nutch.db.WebDBWriter
Remove links with the given MD5 from the db.
deletePage(String) - Method in class net.nutch.db.DistributedWebDBWriter
Remove a page from the page database.
deletePage(String) - Method in interface net.nutch.db.IWebDBWriter
deletePage(url) will remove a Page object from the db with the given URL.
deletePage(String) - Method in class net.nutch.db.WebDBWriter
Remove a page from the page database.
deleteUnlicensed() - Method in class org.creativecommons.nutch.CCDeleteUnlicensedTool
Delete pages without CC licenes.
deleteUrlDuplicates() - Method in class net.nutch.indexer.DeleteDuplicates
Delete pages with duplicate URLs.
digest(byte[]) - Static method in class net.nutch.io.MD5Hash
Construct a hash value for a byte array.
digest(byte[], int, int) - Static method in class net.nutch.io.MD5Hash
Construct a hash value for a byte array.
digest(String) - Static method in class net.nutch.io.MD5Hash
Construct a hash value for a String.
digest(UTF8) - Static method in class net.nutch.io.MD5Hash
Construct a hash value for a String.
disable_tracing() - Method in class net.nutch.analysis.NutchAnalysis
 
disable_tracing() - Method in class net.nutch.quality.dynamic.PageDescription
 
disconnect() - Method in class net.nutch.protocol.ftp.Client
Closes the connection to the FTP server and restores connection parameters to the default values.
displayByteArray(byte[]) - Static method in class net.nutch.io.WritableUtils
 
dump() - Method in class net.nutch.tools.DumpSegment
 

E

EDITS_PREFIX - Static variable in class net.nutch.db.EditSectionWriter
 
EOF - Static variable in interface net.nutch.analysis.NutchAnalysisConstants
 
EOF - Static variable in interface net.nutch.quality.dynamic.PageDescriptionConstants
 
EQUALS - Static variable in interface net.nutch.quality.dynamic.PageDescriptionConstants
 
ERROR_NAME - Static variable in class net.nutch.fetcher.FetcherOutput
 
EditSectionGroupReader - class net.nutch.db.EditSectionGroupReader.
The EditSectionGroupReader will read in an edits-file that was built in a distributed way.
EditSectionGroupReader(NutchFileSystem, String, String, int, int) - Constructor for class net.nutch.db.EditSectionGroupReader
Open the EditSectionGroupReader for the appropriate file.
EditSectionGroupWriter - class net.nutch.db.EditSectionGroupWriter.
The EditSectionGroupWriter maintains a set of EditSectionWriter objects.
EditSectionGroupWriter(NutchFileSystem, String, int, int, String, Class, Class, EditSectionGroupWriter.KeyExtractor) - Constructor for class net.nutch.db.EditSectionGroupWriter
Start a EditSectionGroupWriter at the indicated location, for a single emitter.
EditSectionGroupWriter.KeyExtractor - class net.nutch.db.EditSectionGroupWriter.KeyExtractor.
Edit instructions are Comparable, but they also have an "inner" key like MD5Hash or URL that is also Comparable.
EditSectionGroupWriter.KeyExtractor() - Constructor for class net.nutch.db.EditSectionGroupWriter.KeyExtractor
 
EditSectionGroupWriter.LinkMD5Extractor - class net.nutch.db.EditSectionGroupWriter.LinkMD5Extractor.
Get the MD5 from a LinkInstruction
EditSectionGroupWriter.LinkMD5Extractor() - Constructor for class net.nutch.db.EditSectionGroupWriter.LinkMD5Extractor
 
EditSectionGroupWriter.LinkURLExtractor - class net.nutch.db.EditSectionGroupWriter.LinkURLExtractor.
Get the URL from a LinkInstruction
EditSectionGroupWriter.LinkURLExtractor() - Constructor for class net.nutch.db.EditSectionGroupWriter.LinkURLExtractor
 
EditSectionGroupWriter.PageMD5Extractor - class net.nutch.db.EditSectionGroupWriter.PageMD5Extractor.
Get the MD5 from a PageInstruction
EditSectionGroupWriter.PageMD5Extractor() - Constructor for class net.nutch.db.EditSectionGroupWriter.PageMD5Extractor
 
EditSectionGroupWriter.PageURLExtractor - class net.nutch.db.EditSectionGroupWriter.PageURLExtractor.
Get the URL from a PageInstruction
EditSectionGroupWriter.PageURLExtractor() - Constructor for class net.nutch.db.EditSectionGroupWriter.PageURLExtractor
 
EditSectionWriter - class net.nutch.db.EditSectionWriter.
EditSectionWriter writes a discrete portion of a WebDB.
EditSectionWriter(NutchFileSystem, String, String, int, int, Class, Class) - Constructor for class net.nutch.db.EditSectionWriter
Make a EditSectionWriter for the appropriate file.
Entities - class net.nutch.html.Entities.
 
Entities() - Constructor for class net.nutch.html.Entities
 
ExpandBuff(boolean) - Method in class net.nutch.quality.dynamic.SimpleCharStream
 
Extension - class net.nutch.plugin.Extension.
A Extension is a kind of listener descriptor that will be installed on a concret ExtensionPoint that act as kind of Publisher.
Extension(PluginDescriptor, String, String, String) - Constructor for class net.nutch.plugin.Extension
 
ExtensionPoint - class net.nutch.plugin.ExtensionPoint.
The ExtensionPoint provide meta information of a extension point.
ExtensionPoint(String, String, String) - Constructor for class net.nutch.plugin.ExtensionPoint
Constructor
elName - Variable in class net.nutch.parse.html.DOMContentUtils.LinkParams
 
element() - Method in class net.nutch.quality.dynamic.PageDescription
 
emitDistribution(PrintStream) - Method in class net.nutch.util.ScoreStats
Print out the distribution, with greater specificity for percentiles 90th - 100th.
emitFetchList(File, long, long) - Method in class net.nutch.tools.FetchListTool
Spit out the fetchlist, to a BDB at the indicated filename.
emitMultipleLists(File, int, long, long) - Method in class net.nutch.tools.FetchListTool
Spit out several fetchlists, so that we can fetch across several machines.
emitTopK(int) - Method in class net.nutch.tools.WebDBAdminTool
Emit the top K-rated Pages.
enable_tracing() - Method in class net.nutch.analysis.NutchAnalysis
 
enable_tracing() - Method in class net.nutch.quality.dynamic.PageDescription
 
encode(String) - Static method in class net.nutch.html.Entities
 
endColumn - Variable in class net.nutch.quality.dynamic.Token
beginLine and beginColumn describe the position of the first character of this token; endLine and endColumn describe the position of the last character of this token.
endLine - Variable in class net.nutch.quality.dynamic.Token
beginLine and beginColumn describe the position of the first character of this token; endLine and endColumn describe the position of the last character of this token.
entrySet() - Method in class net.nutch.util.SoftHashMap
Not Implemented
eol - Variable in class net.nutch.quality.dynamic.ParseException
The end of line string for this machine.
equals(Object) - Method in class net.nutch.db.Page
 
equals(Object) - Method in class net.nutch.fetcher.FetcherOutput
 
equals(Object) - Method in class net.nutch.io.BooleanWritable
 
equals(Object) - Method in class net.nutch.io.IntWritable
Returns true iff o is a IntWritable with the same value.
equals(Object) - Method in class net.nutch.io.LongWritable
Returns true iff o is a LongWritable with the same value.
equals(Object) - Method in class net.nutch.io.MD5Hash
Returns true iff o is an MD5Hash whose digest contains the same values.
equals(Object) - Method in class net.nutch.io.UTF8
Returns true iff o is a UTF8 with the same contents.
equals(Object) - Method in class net.nutch.linkdb.LinkAnalysisEntry
 
equals(Object) - Method in class net.nutch.pagedb.FetchListEntry
 
equals(Object) - Method in class net.nutch.parse.Outlink
 
equals(Object) - Method in class net.nutch.parse.ParseData
 
equals(Object) - Method in class net.nutch.parse.ParseText
 
equals(Object) - Method in class net.nutch.protocol.Content
 
equals(Object) - Method in class net.nutch.searcher.Hit
 
equals(Object) - Method in class net.nutch.searcher.Query.Clause
 
equals(Object) - Method in class net.nutch.searcher.Query.Phrase
 
equals(Object) - Method in class net.nutch.searcher.Query.Term
 
equals(Object) - Method in class net.nutch.searcher.Query
 
evaluate() - Method in class net.nutch.util.CommandRunner
 
expectedTokenSequences - Variable in class net.nutch.quality.dynamic.ParseException
Each entry in this array is an array of integers.
extractInnerKey(WritableComparable) - Method in class net.nutch.db.EditSectionGroupWriter.KeyExtractor
 
extractInnerKey(WritableComparable) - Method in class net.nutch.db.EditSectionGroupWriter.LinkMD5Extractor
 
extractInnerKey(WritableComparable) - Method in class net.nutch.db.EditSectionGroupWriter.LinkURLExtractor
 
extractInnerKey(WritableComparable) - Method in class net.nutch.db.EditSectionGroupWriter.PageMD5Extractor
 
extractInnerKey(WritableComparable) - Method in class net.nutch.db.EditSectionGroupWriter.PageURLExtractor
 
extractText(InputStream) - Method in class net.nutch.parse.msword.WordExtractor
Gets the text from a Word document.

F

FIELD - Static variable in class org.creativecommons.nutch.CCIndexingFilter
The name of the document field we use.
FILE_COMPLETE_FAILED - Static variable in interface net.nutch.fs.FSConstants
 
FILE_COMPLETE_ONGOING - Static variable in interface net.nutch.fs.FSConstants
 
FILE_COMPLETE_SUCCESS - Static variable in interface net.nutch.fs.FSConstants
 
FSConstants - interface net.nutch.fs.FSConstants.
Some handy constants
FSDataset - class net.nutch.fs.FSDataset.
FSDataset manages a set of data blocks.
FSDataset(File, long) - Constructor for class net.nutch.fs.FSDataset
An FSDataset has a directory where it loads its data files.
FSDirectory - class net.nutch.fs.FSDirectory.
FSDirectory stores the filesystem directory state.
FSDirectory(File) - Constructor for class net.nutch.fs.FSDirectory
Create a FileSystem directory, and load its info from the indicated place.
FSNamesystem - class net.nutch.fs.FSNamesystem.
The FSNamesystem tracks several important tables.
FSNamesystem(File) - Constructor for class net.nutch.fs.FSNamesystem
dir is where the filesystem directory state is stored
FSParam - class net.nutch.fs.FSParam.
IPC param
FSParam() - Constructor for class net.nutch.fs.FSParam
 
FSParam(byte) - Constructor for class net.nutch.fs.FSParam
 
FSResults - class net.nutch.fs.FSResults.
The result of an NFS IPC call.
FSResults() - Constructor for class net.nutch.fs.FSResults
 
FSResults(byte) - Constructor for class net.nutch.fs.FSResults
 
FSResults(byte, Writable) - Constructor for class net.nutch.fs.FSResults
 
FSResults(byte, Writable, Writable) - Constructor for class net.nutch.fs.FSResults
 
FastSavedException - exception net.nutch.parse.msword.FastSavedException.
Title:
FastSavedException(String) - Constructor for class net.nutch.parse.msword.FastSavedException
 
FetchListEntry - class net.nutch.pagedb.FetchListEntry.
 
FetchListEntry() - Constructor for class net.nutch.pagedb.FetchListEntry
 
FetchListEntry(boolean, Page, String[]) - Constructor for class net.nutch.pagedb.FetchListEntry
 
FetchListTool - class net.nutch.tools.FetchListTool.
This class takes an IWebDBReader, computes a relevant subset, and then emits the subset.
FetchListTool(File, boolean, boolean, float, int) - Constructor for class net.nutch.tools.FetchListTool
FetchListTool takes a page db, and emits a RECNO-based subset of it.
FetchListTool.SortableScore - class net.nutch.tools.FetchListTool.SortableScore.
SortableScore is just a WritableComparable Float!
FetchListTool.SortableScore() - Constructor for class net.nutch.tools.FetchListTool.SortableScore
 
FetchedSegments - class net.nutch.searcher.FetchedSegments.
Implements HitSummarizer and HitContent for a set of fetched segments.
FetchedSegments(String) - Constructor for class net.nutch.searcher.FetchedSegments
Construct given a directory containing fetcher output.
Fetcher - class net.nutch.fetcher.Fetcher.
The fetcher.
Fetcher(String) - Constructor for class net.nutch.fetcher.Fetcher
 
FetcherOutput - class net.nutch.fetcher.FetcherOutput.
An entry in the fetcher's output.
FetcherOutput() - Constructor for class net.nutch.fetcher.FetcherOutput
 
FetcherOutput(FetchListEntry, MD5Hash, int) - Constructor for class net.nutch.fetcher.FetcherOutput
 
FibonacciHeap - class net.nutch.util.FibonacciHeap.
A Fibonacci Heap, as described in Introduction to Algorithms by Charles E.
FibonacciHeap() - Constructor for class net.nutch.util.FibonacciHeap
Creates a new FibonacciHeap.
File - class net.nutch.protocol.file.File.
File.java deals with file: scheme.
File() - Constructor for class net.nutch.protocol.file.File
 
FileError - exception net.nutch.protocol.file.FileError.
Thrown for File error codes.
FileError(int) - Constructor for class net.nutch.protocol.file.FileError
 
FileException - exception net.nutch.protocol.file.FileException.
 
FileException() - Constructor for class net.nutch.protocol.file.FileException
 
FileException(String) - Constructor for class net.nutch.protocol.file.FileException
 
FileException(String, Throwable) - Constructor for class net.nutch.protocol.file.FileException
 
FileException(Throwable) - Constructor for class net.nutch.protocol.file.FileException
 
FileResponse - class net.nutch.protocol.file.FileResponse.
FileResponse.java mimics file replies as http response.
FileResponse(URL, File) - Constructor for class net.nutch.protocol.file.FileResponse
 
FileResponse(String, URL, File) - Constructor for class net.nutch.protocol.file.FileResponse
 
FileUtil - class net.nutch.util.FileUtil.
A collection of file-processing util methods
FileUtil() - Constructor for class net.nutch.util.FileUtil
 
FillBuff() - Method in class net.nutch.quality.dynamic.SimpleCharStream
 
Ftp - class net.nutch.protocol.ftp.Ftp.
Ftp.java deals with ftp: scheme.
Ftp() - Constructor for class net.nutch.protocol.ftp.Ftp
 
FtpError - exception net.nutch.protocol.ftp.FtpError.
Thrown for Ftp error codes.
FtpError(int) - Constructor for class net.nutch.protocol.ftp.FtpError
 
FtpException - exception net.nutch.protocol.ftp.FtpException.
Superclass for important exceptions thrown during FTP talk, that must be handled with care.
FtpException() - Constructor for class net.nutch.protocol.ftp.FtpException
 
FtpException(String) - Constructor for class net.nutch.protocol.ftp.FtpException
 
FtpException(String, Throwable) - Constructor for class net.nutch.protocol.ftp.FtpException
 
FtpException(Throwable) - Constructor for class net.nutch.protocol.ftp.FtpException
 
FtpExceptionBadSystResponse - exception net.nutch.protocol.ftp.FtpExceptionBadSystResponse.
Exception indicating bad reply of SYST command.
FtpExceptionCanNotHaveDataConnection - exception net.nutch.protocol.ftp.FtpExceptionCanNotHaveDataConnection.
Exception indicating failure of opening data connection.
FtpExceptionControlClosedByForcedDataClose - exception net.nutch.protocol.ftp.FtpExceptionControlClosedByForcedDataClose.
Exception indicating control channel is closed by server end, due to forced closure of data channel at client (our) end.
FtpExceptionUnknownForcedDataClose - exception net.nutch.protocol.ftp.FtpExceptionUnknownForcedDataClose.
Exception indicating unrecognizable reply from server after forced closure of data channel by client (our) side.
FtpResponse - class net.nutch.protocol.ftp.FtpResponse.
FtpResponse.java mimics ftp replies as http response.
FtpResponse(URL, Ftp) - Constructor for class net.nutch.protocol.ftp.FtpResponse
 
FtpResponse(String, URL, Ftp) - Constructor for class net.nutch.protocol.ftp.FtpResponse
 
filter(Content, Parse, DocumentFragment) - Method in class net.nutch.analysis.lang.HTMLLanguageParser
Adds metadata or otherwise modifies a parse of an HTML document, given the DOM tree of a page.
filter(Document, Parse, FetcherOutput) - Method in class net.nutch.analysis.lang.LanguageIdentifier
 
filter(Document, Parse, FetcherOutput) - Method in interface net.nutch.indexer.IndexingFilter
Adds fields or otherwise modifies the document that will be indexed for a parse.
filter(Document, Parse, FetcherOutput) - Static method in class net.nutch.indexer.IndexingFilters
Run all defined filters.
filter(Document, Parse, FetcherOutput) - Method in class net.nutch.indexer.basic.BasicIndexingFilter
 
filter(String) - Method in class net.nutch.net.PrefixURLFilter
 
filter(String) - Method in class net.nutch.net.RegexURLFilter
 
filter(String) - Method in interface net.nutch.net.URLFilter
 
filter(Content, Parse, DocumentFragment) - Method in interface net.nutch.parse.HtmlParseFilter
Adds metadata or otherwise modifies a parse of HTML content, given the DOM tree of a page.
filter(Content, Parse, DocumentFragment) - Static method in class net.nutch.parse.HtmlParseFilters
Run all defined filters.
filter(Query, BooleanQuery) - Method in interface net.nutch.searcher.QueryFilter
Adds clauses or otherwise modifies the BooleanQuery that will be searched.
filter(Query) - Static method in class net.nutch.searcher.QueryFilters
Run all defined filters.
filter(Query, BooleanQuery) - Method in class net.nutch.searcher.RawFieldQueryFilter
 
filter(Document, Parse, FetcherOutput) - Method in class org.creativecommons.nutch.CCIndexingFilter
 
filter(Content, Parse, DocumentFragment) - Method in class org.creativecommons.nutch.CCParseFilter
Adds metadata or otherwise modifies a parse of an HTML document, given the DOM tree of a page.
finalizationOccurring() - Method in interface net.nutch.util.SoftHashMap.FinalizationListener
This method will be called when a SoftHashMap.FinalizationNotifier this Object is registered with is being finalized.
finalize() - Method in class net.nutch.plugin.Plugin
 
finalize() - Method in class net.nutch.plugin.PluginRepository
 
finalize() - Method in class net.nutch.protocol.ftp.Ftp
 
finalizeBlock(Block) - Method in class net.nutch.fs.FSDataset
Complete the block write!
findMD5Section(MD5Hash, int) - Static method in class net.nutch.db.DBKeyDivision
Find the right section index for the given MD5, and the number of sections in the db overall.
findURLSection(String, int) - Static method in class net.nutch.db.DBKeyDivision
Find the right section index for the given URL, and the number of sections in the db overall.
first - Variable in class net.nutch.fs.FSParam
 
first - Variable in class net.nutch.fs.FSResults
 
format - Static variable in class net.nutch.net.protocols.HttpDateFormat
 
format(LogRecord) - Method in class net.nutch.util.LogFormatter
Format the given LogRecord.
fullyDelete(File) - Static method in class net.nutch.util.FileUtil
Delete a directory and all its contents.

G

GROUP_METAINFO - Static variable in class net.nutch.db.EditSectionGroupWriter
 
GZIPUtils - class net.nutch.util.GZIPUtils.
A collection of utility methods for working on GZIPed data.
GZIPUtils() - Constructor for class net.nutch.util.GZIPUtils
 
GetImage() - Method in class net.nutch.quality.dynamic.SimpleCharStream
 
GetSuffix(int) - Method in class net.nutch.quality.dynamic.SimpleCharStream
 
generateParseException() - Method in class net.nutch.analysis.NutchAnalysis
 
generateParseException() - Method in class net.nutch.quality.dynamic.PageDescription
 
get(long, Writable) - Method in class net.nutch.io.ArrayFile.Reader
Return the nth value in the file.
get() - Method in class net.nutch.io.ArrayWritable
 
get() - Method in class net.nutch.io.BooleanWritable
Returns the value of the BooleanWritable
get() - Method in class net.nutch.io.BytesWritable
 
get() - Method in class net.nutch.io.IntWritable
Return the value of this IntWritable.
get() - Method in class net.nutch.io.LongWritable
Return the value of this LongWritable.
get(WritableComparable, Writable) - Method in class net.nutch.io.MapFile.Reader
Return the value for the named key, or null if none exists.
get() - Static method in class net.nutch.io.NullWritable
Returns the single instance of this class.
get(WritableComparable) - Method in class net.nutch.io.SetFile.Reader
Read the matching key from a set into key.
get() - Method in class net.nutch.io.TwoDArrayWritable
 
get(String) - Method in class net.nutch.parse.ParseData
Return the value of a metadata property.
get(String) - Method in class net.nutch.protocol.Content
Return the value of a metadata property.
get(ServletContext) - Static method in class net.nutch.searcher.NutchBean
Cache in servlet context.
get(String) - Static method in class net.nutch.util.NutchConf
Returns the value of the name property, or null if no such property exists.
get(String, String) - Static method in class net.nutch.util.NutchConf
Returns the value of the name property.
get(NutchFile) - Method in interface net.nutch.util.NutchFileSystem
Obtains the indicated NutchFile, whether remote or local.
get(NutchFile, long) - Method in interface net.nutch.util.NutchFileSystem
Same as above, but expires after the given number of ms, returning null.
get(NutchFile) - Method in class net.nutch.util.NutchGenericFileSystem
Wait for a NutchFile from somewhere in NutchSpace.
get(NutchFile, long) - Method in class net.nutch.util.NutchGenericFileSystem
Wait for a NutchFile for the specified amount of time.
get(Object) - Method in class net.nutch.util.SoftHashMap
 
getAdditionalBlock(UTF8) - Method in class net.nutch.fs.FSNamesystem
The client would like to obtain an additional block for the indicated filename (which is being written-to).
getAnchor() - Method in class net.nutch.parse.Outlink
 
getAnchorText() - Method in class net.nutch.db.Link
 
getAnchors() - Method in class net.nutch.fetcher.FetcherOutput
 
getAnchors() - Method in class net.nutch.pagedb.FetchListEntry
 
getAnchors(HitDetails) - Method in class net.nutch.searcher.DistributedSearch.Client
 
getAnchors(HitDetails) - Method in class net.nutch.searcher.FetchedSegments
 
getAnchors(HitDetails) - Method in interface net.nutch.searcher.HitContent
Returns the anchors of a hit document.
getAnchors(HitDetails) - Method in class net.nutch.searcher.NutchBean
 
getAttribute(String) - Method in class net.nutch.plugin.Extension
Returns a attribute value, that is setuped in the manifest file and is definied by the extension point xml schema.
getBaseHref() - Method in class net.nutch.parse.html.RobotsMetaProcessor.RobotsMetaIndicator
Returns the baseHref, if set, or null otherwise.
getBaseUrl() - Method in class net.nutch.protocol.Content
The base url for relative links contained in the content.
getBeginColumn() - Method in class net.nutch.quality.dynamic.SimpleCharStream
 
getBeginLine() - Method in class net.nutch.quality.dynamic.SimpleCharStream
 
getBlockData(Block) - Method in class net.nutch.fs.FSDataset
Get a stream of data from the indicated block.
getBlockId() - Method in class net.nutch.fs.Block
 
getBlockName() - Method in class net.nutch.fs.Block
 
getBlockReport() - Method in class net.nutch.fs.FSDataset
Return a table of block data
getBlocks() - Method in class net.nutch.fs.DatanodeInfo
 
getBoolean(String, boolean) - Static method in class net.nutch.util.NutchConf
Returns the value of the name property as an boolean.
getBytes() - Method in class net.nutch.io.UTF8
The raw bytes.
getBytes(String) - Static method in class net.nutch.io.UTF8
Convert a string to a UTF-8 encoded byte array.
getCapacity() - Method in class net.nutch.fs.DatanodeInfo
 
getCapacity() - Method in class net.nutch.fs.FSDataset
Return total capacity, used and unused
getCapacity() - Method in class net.nutch.fs.HeartbeatData
 
getClass(String) - Static method in class net.nutch.io.WritableName
Return the class for a name.
getClassLoader() - Method in class net.nutch.plugin.PluginDescriptor
Returns a cached classloader for a plugin.
getClauses() - Method in class net.nutch.searcher.Query
Return all clauses.
getClazz() - Method in class net.nutch.plugin.Extension
Returns the full class name of the extension point implementation
getCode() - Method in interface net.nutch.net.protocols.Response
Returns the response code.
getCode(int) - Method in class net.nutch.protocol.file.FileError
 
getCode() - Method in class net.nutch.protocol.file.FileResponse
Returns the response code.
getCode(int) - Method in class net.nutch.protocol.ftp.FtpError
 
getCode() - Method in class net.nutch.protocol.ftp.FtpResponse
Returns the response code.
getCode(int) - Method in class net.nutch.protocol.http.HttpError
 
getCode() - Method in class net.nutch.protocol.http.HttpResponse
Returns the response code.
getColumn() - Method in class net.nutch.quality.dynamic.SimpleCharStream
Deprecated.  
getCommand() - Method in class net.nutch.util.CommandRunner
 
getCompleteFlagName() - Method in class net.nutch.util.NutchFile
Get the almost-fully-qualified name for this NutchFile's 'completed' flag file.
getCompressedContent() - Method in interface net.nutch.net.protocols.Response
Returns the compressed version of the content if the server transmitted a compressed version, or null otherwise.
getConfResourceAsInputStream(String) - Static method in class net.nutch.util.NutchConf
Returns an input stream attached to the configuration resource with the given name.
getConfResourceAsReader(String) - Static method in class net.nutch.util.NutchConf
Returns a reader attached to the configuration resource with the given name.
getContent() - Method in interface net.nutch.net.protocols.Response
Returns the full content of the response.
getContent() - Method in class net.nutch.protocol.Content
The binary content retrieved.
getContent(String) - Method in interface net.nutch.protocol.Protocol
Returns the Content for a url.
getContent(String) - Method in class net.nutch.protocol.file.File
 
getContent() - Method in class net.nutch.protocol.file.FileResponse
 
getContent(String) - Method in class net.nutch.protocol.ftp.Ftp
 
getContent() - Method in class net.nutch.protocol.ftp.FtpResponse
 
getContent(String) - Method in class net.nutch.protocol.http.Http
 
getContent() - Method in class net.nutch.protocol.http.HttpResponse
 
getContent(HitDetails) - Method in class net.nutch.searcher.DistributedSearch.Client
 
getContent(HitDetails) - Method in class net.nutch.searcher.FetchedSegments
 
getContent(HitDetails) - Method in interface net.nutch.searcher.HitContent
Returns the content of a hit document.
getContent(HitDetails) - Method in class net.nutch.searcher.NutchBean
 
getContentType() - Method in class net.nutch.parse.ParserNotFound
 
getContentType() - Method in class net.nutch.protocol.Content
The media type of the retrieved content.
getDBName() - Method in class net.nutch.util.NutchFile
DB Name the NutchFile lives in.
getData() - Method in class net.nutch.io.DataOutputBuffer
Returns the current contents of the buffer.
getData() - Method in interface net.nutch.parse.Parse
Other data extracted from the page.
getData() - Method in class net.nutch.parse.ParseImpl
 
getDependencies() - Method in class net.nutch.plugin.PluginDescriptor
Returns a array of plugin ids.
getDescriptor() - Method in class net.nutch.plugin.Plugin
Returns the plugin descriptor
getDestroyOnTimeout() - Method in class net.nutch.util.CommandRunner
 
getDetails(Hit) - Method in class net.nutch.searcher.DistributedSearch.Client
 
getDetails(Hit[]) - Method in class net.nutch.searcher.DistributedSearch.Client
 
getDetails(Hit) - Method in interface net.nutch.searcher.HitDetailer
Returns the details for a hit document.
getDetails(Hit[]) - Method in interface net.nutch.searcher.HitDetailer
Returns the details for a set of hits.
getDetails(Hit) - Method in class net.nutch.searcher.IndexSearcher
 
getDetails(Hit[]) - Method in class net.nutch.searcher.IndexSearcher
 
getDetails(Hit) - Method in class net.nutch.searcher.NutchBean
 
getDetails(Hit[]) - Method in class net.nutch.searcher.NutchBean
 
getDigest() - Method in class net.nutch.io.MD5Hash
Returns the digest bytes.
getDiscriptor() - Method in class net.nutch.plugin.Extension
return the plugin descriptor.
getDomainID() - Method in class net.nutch.db.Link
 
getEndColumn() - Method in class net.nutch.quality.dynamic.SimpleCharStream
 
getEndLine() - Method in class net.nutch.quality.dynamic.SimpleCharStream
 
getExitValue() - Method in class net.nutch.util.CommandRunner
 
getExpireTime() - Method in class net.nutch.protocol.http.RobotRulesParser.RobotRuleSet
Get expire time
getExplanation(Query, Hit) - Method in class net.nutch.searcher.DistributedSearch.Client
 
getExplanation(Query, Hit) - Method in class net.nutch.searcher.IndexSearcher
 
getExplanation(Query, Hit) - Method in class net.nutch.searcher.NutchBean
 
getExplanation(Query, Hit) - Method in interface net.nutch.searcher.Searcher
Return an HTML-formatted explanation of how a query scored.
getExportedLibUrls() - Method in class net.nutch.plugin.PluginDescriptor
Returns a array exported librareis as URLs
getExtensionInstance() - Method in class net.nutch.plugin.Extension
Return an instance of the extension implementatio.
getExtensionPoint(String) - Method in class net.nutch.plugin.PluginRepository
Returns a extension point indentified by a extension point id.
getExtensions() - Method in class net.nutch.plugin.PluginDescriptor
Returns an array of extensions.
getExtenstionPoints() - Method in class net.nutch.plugin.PluginDescriptor
Returns a array of extension points.
getExtentens() - Method in class net.nutch.plugin.ExtensionPoint
Returns a array of extensions that lsiten to this extension point
getFS() - Method in class net.nutch.util.NutchFile
Grab a handle to the NutchFileSystem
getFactor() - Method in class net.nutch.io.SequenceFile.Sorter
Get the number of streams to merge at once.
getFetch() - Method in class net.nutch.pagedb.FetchListEntry
 
getFetchDate() - Method in class net.nutch.fetcher.FetcherOutput
 
getFetchInterval() - Method in class net.nutch.db.Page
 
getFetchListEntry() - Method in class net.nutch.fetcher.FetcherOutput
 
getField(int) - Method in class net.nutch.searcher.HitDetails
Returns the name of the ith field.
getField() - Method in class net.nutch.searcher.Query.Clause
 
getFile(UTF8) - Method in class net.nutch.fs.FSDirectory
Get the blocks associated with the file
getFilename() - Method in class net.nutch.util.NutchFile
Get the almost-fully-qualified name for this NutchFile.
getFilter(TokenStream, String) - Static method in class net.nutch.analysis.CommonGrams
Construct a token filter that inserts n-grams for common terms.
getFilter() - Static method in class net.nutch.net.URLFilterFactory
Return the default URLFilter implementation.
getFloat() - Method in class net.nutch.tools.FetchListTool.SortableScore
 
getFloat(String, float) - Static method in class net.nutch.util.NutchConf
Returns the value of the name property as a float.
getFragments() - Method in class net.nutch.searcher.Summary
Returns an array of all of this summary's fragments.
getFromID() - Method in class net.nutch.db.Link
 
getHeader(String) - Method in interface net.nutch.net.protocols.Response
Returns the value of a named header.
getHeader(String) - Method in class net.nutch.protocol.file.FileResponse
Returns the value of a named header.
getHeader(String) - Method in class net.nutch.protocol.ftp.FtpResponse
Returns the value of a named header.
getHeader(String) - Method in class net.nutch.protocol.http.HttpResponse
Returns the value of a named header.
getHit(int) - Method in class net.nutch.searcher.Hits
Returns the ith hit in this list.
getHits(int, int) - Method in class net.nutch.searcher.Hits
Returns a subset of the hit objects.
getId() - Method in class net.nutch.plugin.Extension
Return the unique id of the extension.
getId() - Method in class net.nutch.plugin.ExtensionPoint
Returns the unique id of the extension point.
getIndexDocNo() - Method in class net.nutch.searcher.Hit
Return the document number of this hit within an index.
getIndexInterval() - Method in class net.nutch.io.MapFile.Writer
The number of entries that are added before an index entry is added.
getIndexNo() - Method in class net.nutch.searcher.Hit
Return the index number that this hit came from.
getInputs() - Method in class net.nutch.quality.dynamic.PageDescription
 
getInstance() - Static method in class net.nutch.analysis.lang.LanguageIdentifier
return handle to singleton instance
getInstance() - Static method in class net.nutch.plugin.PluginRepository
Returns the singelton instance of the PluginRepository
getInstruction() - Method in class net.nutch.db.DistributedWebDBWriter.LinkInstruction
 
getInstruction() - Method in class net.nutch.db.DistributedWebDBWriter.PageInstruction
 
getInstruction() - Method in class net.nutch.db.WebDBWriter.LinkInstruction
 
getInstruction() - Method in class net.nutch.db.WebDBWriter.PageInstruction
 
getInt(String, int) - Static method in class net.nutch.util.NutchConf
Returns the value of the name property as an integer.
getInterprets() - Method in class net.nutch.quality.dynamic.PageDescription
 
getKeyClass() - Method in class net.nutch.io.MapFile.Reader
Returns the class of keys in this file.
getKeyClass() - Method in class net.nutch.io.SequenceFile.Reader
Returns the class of keys in this file.
getKeyClass() - Method in class net.nutch.io.SequenceFile.Writer
Returns the class of keys in this file.
getKeyClass() - Method in class net.nutch.io.WritableComparator
Returns the WritableComparable implementation class.
getLength(Block) - Method in class net.nutch.fs.FSDataset
Find the block's on-disk length
getLength() - Method in class net.nutch.io.DataOutputBuffer
Returns the length of the valid data currently in the buffer.
getLength() - Method in class net.nutch.io.SequenceFile.Writer
Returns the current length of the output file.
getLength() - Method in class net.nutch.io.UTF8
The number of bytes in the encoded string.
getLength() - Method in class net.nutch.searcher.HitDetails
Returns the number of fields contained in this.
getLength() - Method in class net.nutch.searcher.Hits
Returns the number of hits included in this current listing.
getLine() - Method in class net.nutch.quality.dynamic.SimpleCharStream
Deprecated.  
getLink() - Method in class net.nutch.db.DistributedWebDBWriter.LinkInstruction
 
getLink() - Method in class net.nutch.db.DistributedWebDBWriter.PageInstruction
 
getLink() - Method in class net.nutch.db.WebDBWriter.LinkInstruction
 
getLink() - Method in class net.nutch.db.WebDBWriter.PageInstruction
 
getLinks(UTF8) - Method in class net.nutch.db.DBSectionReader
Get all the hyperlinks that link TO the indicated URL.
getLinks(MD5Hash) - Method in class net.nutch.db.DBSectionReader
Grab all the links from the given MD5 hash.
getLinks(UTF8) - Method in class net.nutch.db.DistributedWebDBReader
Get all the hyperlinks that link TO the indicated URL.
getLinks(MD5Hash) - Method in class net.nutch.db.DistributedWebDBReader
Grab all the links from the given MD5 hash.
getLinks(UTF8) - Method in interface net.nutch.db.IWebDBReader
Return any Link objects that point to the given URL.
getLinks(MD5Hash) - Method in interface net.nutch.db.IWebDBReader
Return all the Link objects that originate from a document with the given MD5 checksum.
getLinks(UTF8) - Method in class net.nutch.db.WebDBReader
Get all the hyperlinks that link TO the indicated URL.
getLinks(MD5Hash) - Method in class net.nutch.db.WebDBReader
Grab all the links from the given MD5 hash.
getListing(UTF8) - Method in class net.nutch.fs.FSDirectory
Get a listing of files given path 'src' This function is admittedly very inefficient right now.
getListing(UTF8) - Method in class net.nutch.fs.FSNamesystem
Get a listing of all files at 'src'.
getLocations() - Method in class net.nutch.util.ShareGroup
Locations for the ShareGroup (machinename:path)
getLogStream(Logger, Level) - Static method in class net.nutch.util.LogFormatter
Returns a stream that, when written to, adds log lines.
getLogger(String) - Static method in class net.nutch.util.LogFormatter
Gets a logger and, as a side effect, installs this as the default formatter.
getLong(String, long) - Static method in class net.nutch.util.NutchConf
Returns the value of the name property as a long.
getMD5() - Method in class net.nutch.db.Page
 
getMD5Hash() - Method in class net.nutch.fetcher.FetcherOutput
 
getMachineName() - Method in class net.nutch.fs.DatanodeInfo
 
getMachineName() - Method in class net.nutch.fs.HeartbeatData
 
getMemory() - Method in class net.nutch.io.SequenceFile.Sorter
Get the total amount of buffer memory, in bytes.
getMessage() - Method in class net.nutch.quality.dynamic.ParseException
This method has the standard behavior when this object has been created using the standard constructors.
getMessage() - Method in class net.nutch.quality.dynamic.TokenMgrError
You can also modify the body of this method to customize your error messages.
getMetadata() - Method in class net.nutch.parse.ParseData
Other page properties.
getMetadata() - Method in class net.nutch.protocol.Content
Other protocol-specific data.
getName() - Method in class net.nutch.analysis.lang.NGramProfile
 
getName() - Method in class net.nutch.fs.DatanodeInfo
 
getName() - Method in class net.nutch.fs.HeartbeatData
 
getName(Class) - Static method in class net.nutch.io.WritableName
Return the name for a class.
getName() - Method in class net.nutch.plugin.ExtensionPoint
Returns the name of the extension point.
getName() - Method in class net.nutch.plugin.PluginDescriptor
Returns the name of the plugin.
getName() - Method in class net.nutch.util.NutchFile
Terminating filename for the NutchFile.
getName() - Method in class net.nutch.util.ShareGroup
ShareGroup name.
getNewUrl() - Method in class net.nutch.protocol.ResourceMoved
 
getNextFetchTime() - Method in class net.nutch.db.Page
 
getNextScore() - Method in class net.nutch.db.Page
 
getNextToken() - Method in class net.nutch.analysis.NutchAnalysis
 
getNextToken() - Method in class net.nutch.analysis.NutchAnalysisTokenManager
 
getNextToken() - Method in class net.nutch.quality.dynamic.PageDescription
 
getNextToken() - Method in class net.nutch.quality.dynamic.PageDescriptionTokenManager
 
getNoCache() - Method in class net.nutch.parse.html.RobotsMetaProcessor.RobotsMetaIndicator
Returns the current value of noCache.
getNoFollow() - Method in class net.nutch.parse.html.RobotsMetaProcessor.RobotsMetaIndicator
Returns the current value of noFollow.
getNoIndex() - Method in class net.nutch.parse.html.RobotsMetaProcessor.RobotsMetaIndicator
Returns the current value of noIndex.
getNotExportedLibUrls() - Method in class net.nutch.plugin.PluginDescriptor
Returns a array of libraries as URLs that are not exported by the plugin.
getNumContinues() - Method in interface net.nutch.net.protocols.Response
Returns the number of 100/Continue headers encountered
getNumOutlinks() - Method in class net.nutch.db.Page
 
getOldUrl() - Method in class net.nutch.protocol.ResourceMoved
 
getOutlinks() - Method in class net.nutch.parse.ParseData
The outlinks of the page.
getOutlinks(URL, ArrayList, Node) - Static method in class net.nutch.parse.html.DOMContentUtils
This method finds all anchors below the supplied DOM node, and creates appropriate Outlink records for each (relative to the supplied base URL), and adds them to the outlinks ArrayList.
getPage(UTF8, Page) - Method in class net.nutch.db.DBSectionReader
Fetch a Page with the given URL, and fill it into the pre-allocated Page 'p'.
getPage(String) - Method in class net.nutch.db.DistributedWebDBReader
Get Page from the pagedb with the given URL.
getPage() - Method in class net.nutch.db.DistributedWebDBWriter.PageInstruction
 
getPage(String) - Method in interface net.nutch.db.IWebDBReader
Return a Page object with the given URL, if any.
getPage(String) - Method in class net.nutch.db.WebDBReader
Get Page from the pagedb with the given URL
getPage() - Method in class net.nutch.db.WebDBWriter.PageInstruction
 
getPage() - Method in class net.nutch.pagedb.FetchListEntry
 
getPages(MD5Hash) - Method in class net.nutch.db.DBSectionReader
Get Pages from the db according to their content hash.
getPages(MD5Hash) - Method in class net.nutch.db.DistributedWebDBReader
Get all the Pages according to their content hash.
getPages(MD5Hash) - Method in interface net.nutch.db.IWebDBReader
Return any Pages with the given MD5 checksum.
getPages(MD5Hash) - Method in class net.nutch.db.WebDBReader
Get Pages from the pagedb according to their content hash.
getParse(Content) - Method in interface net.nutch.parse.Parser
Creates the parse for some content.
getParse(Content) - Method in class net.nutch.parse.html.HtmlParser
 
getParse(Content) - Method in class net.nutch.parse.msword.MSWordParser
 
getParse(Content) - Method in class net.nutch.parse.pdf.PdfParser
 
getParse(Content) - Method in class net.nutch.parse.text.TextParser
 
getParseData(HitDetails) - Method in class net.nutch.searcher.DistributedSearch.Client
 
getParseData(HitDetails) - Method in class net.nutch.searcher.FetchedSegments
 
getParseData(HitDetails) - Method in interface net.nutch.searcher.HitContent
Returns the ParseData of a hit document.
getParseData(HitDetails) - Method in class net.nutch.searcher.NutchBean
 
getParser(String, String) - Static method in class net.nutch.parse.ParserFactory
Returns the appropriate Parser implementation given a content type and url.
getPhrase() - Method in class net.nutch.searcher.Query.Clause
 
getPluginClass() - Method in class net.nutch.plugin.PluginDescriptor
Returns the fully qualified name of the class which implements the abstarct Plugin class.
getPluginDescriptor(String) - Method in class net.nutch.plugin.PluginRepository
Returns the descriptor of one plugin identified by a plugin id.
getPluginDescriptors() - Method in class net.nutch.plugin.PluginRepository
Returns all registed plugin descriptors.
getPluginId() - Method in class net.nutch.plugin.PluginDescriptor
Returns the unique identifier of the plug-in or null.
getPluginInstance(PluginDescriptor) - Method in class net.nutch.plugin.PluginRepository
Returns a instance of a plugin.
getPluginPath() - Method in class net.nutch.plugin.PluginDescriptor
Returns the directory path of the plugin.
getPort() - Method in class net.nutch.fs.DatanodeInfo
 
getPort() - Method in class net.nutch.fs.HeartbeatData
 
getPosition() - Method in class net.nutch.io.DataInputBuffer
Returns the current position in the input.
getPosition() - Method in class net.nutch.io.SequenceFile.Reader
Return the current byte position in the input file.
getProtocol(String) - Static method in class net.nutch.protocol.ProtocolFactory
Returns the appropriate Protocol implementation for a url.
getRemaining() - Method in class net.nutch.fs.DatanodeInfo
 
getRemaining() - Method in class net.nutch.fs.FSDataset
Return how many bytes can still be stored in the FSDataset
getRemaining() - Method in class net.nutch.fs.HeartbeatData
 
getResourceString(String, Locale) - Method in class net.nutch.plugin.PluginDescriptor
Returns a internationalizabel resource string.
getRetriesSinceFetch() - Method in class net.nutch.db.Page
 
getRobotsMetaDirectives(RobotsMetaProcessor.RobotsMetaIndicator, Node, URL) - Static method in class net.nutch.parse.html.RobotsMetaProcessor
Sets the indicators in robotsMeta to appropriate values, based on any META tags found under the given node.
getSchema() - Method in class net.nutch.plugin.ExtensionPoint
Returns a path to the xml schema of a extension point.
getScore() - Method in class net.nutch.db.Page
 
getScore() - Method in class net.nutch.linkdb.LinkAnalysisEntry
 
getScore() - Method in class net.nutch.searcher.Hit
Return the degree to which this document matched the query.
getSegmentNames() - Method in class net.nutch.searcher.DistributedSearch.Client
Return the names of segments searched.
getSegmentNames() - Method in class net.nutch.searcher.FetchedSegments
 
getSegmentNames() - Method in class net.nutch.searcher.NutchBean
 
getShareGroupName() - Method in class net.nutch.util.NutchFile
Get the name of the sharegroup this file belongs to.
getSimilarity(NGramProfile) - Method in class net.nutch.analysis.lang.NGramProfile
Calculates a score how well models do compare This is just an experimental implementation, feel free to enhance
getSite() - Method in class net.nutch.searcher.Hit
Return the name of this this document's website.
getSorted() - Method in class net.nutch.analysis.lang.NGramProfile
return sorted vector of ngrams (sort done by count)
getStatus() - Method in class net.nutch.fetcher.FetcherOutput
 
getStrings(String) - Static method in class net.nutch.util.NutchConf
Returns the value of the name property as an array of strings.
getSummary(HitDetails, Query) - Method in class net.nutch.searcher.DistributedSearch.Client
 
getSummary(HitDetails[], Query) - Method in class net.nutch.searcher.DistributedSearch.Client
 
getSummary(HitDetails, Query) - Method in class net.nutch.searcher.FetchedSegments
 
getSummary(HitDetails[], Query) - Method in class net.nutch.searcher.FetchedSegments
 
getSummary(HitDetails, Query) - Method in interface net.nutch.searcher.HitSummarizer
Returns a summary for the given hit details.
getSummary(HitDetails[], Query) - Method in interface net.nutch.searcher.HitSummarizer
Returns summaries for a set of details.
getSummary(HitDetails, Query) - Method in class net.nutch.searcher.NutchBean
 
getSummary(HitDetails[], Query) - Method in class net.nutch.searcher.NutchBean
 
getSummary(String, Query) - Method in class net.nutch.searcher.Summarizer
Returns a summary for the given pre-tokenized text.
getSystemName() - Method in class net.nutch.protocol.ftp.Client
Fetches the system type name from the server and returns the string.
getTargetPoint() - Method in class net.nutch.plugin.Extension
Returns the Id of the extension point, that is implemented by this extension.
getTerm() - Method in class net.nutch.searcher.Query.Clause
 
getTerms() - Method in class net.nutch.searcher.Query.Phrase
 
getTerms() - Method in class net.nutch.searcher.Query
Flattens a query into the set of text terms that it contains.
getText() - Method in interface net.nutch.parse.Parse
The textual content of the page.
getText() - Method in class net.nutch.parse.ParseImpl
 
getText() - Method in class net.nutch.parse.ParseText
 
getText(StringBuffer, Node, boolean) - Static method in class net.nutch.parse.html.DOMContentUtils
This method takes a StringBuffer and a DOM Node, and will append all the content text found beneath the DOM node to the StringBuffer.
getText(StringBuffer, Node) - Static method in class net.nutch.parse.html.DOMContentUtils
This is a convinience method, equivalent to getText(sb, node, false).
getText() - Method in class net.nutch.searcher.Summary.Fragment
Returns the text of this fragment.
getTextRuns() - Method in class net.nutch.parse.msword.chp.Word6CHPBinTable
 
getThrownError() - Method in class net.nutch.util.CommandRunner
 
getTimeout() - Method in class net.nutch.util.CommandRunner
 
getTitle() - Method in class net.nutch.parse.ParseData
The title of the page.
getTitle(StringBuffer, Node) - Static method in class net.nutch.parse.html.DOMContentUtils
This method takes a StringBuffer and a DOM Node, and will append the content text found beneath the first title node to the StringBuffer.
getToUrl() - Method in class net.nutch.parse.Outlink
 
getToken(int) - Method in class net.nutch.analysis.NutchAnalysis
 
getToken(int) - Method in class net.nutch.quality.dynamic.PageDescription
 
getTotal() - Method in class net.nutch.searcher.Hits
Returns the total number of hits for this query.
getURL() - Method in class net.nutch.db.Link
 
getURL() - Method in class net.nutch.db.Page
 
getUrl() - Method in class net.nutch.fetcher.FetcherOutput
 
getUrl() - Method in interface net.nutch.net.protocols.Response
Returns the URL used to retrieve this response.
getUrl() - Method in class net.nutch.pagedb.FetchListEntry
 
getUrl() - Method in class net.nutch.parse.ParserNotFound
 
getUrl() - Method in class net.nutch.protocol.Content
The url fetched.
getUrl() - Method in class net.nutch.protocol.ProtocolNotFound
 
getUrl() - Method in class net.nutch.protocol.ResourceGone
 
getUrl() - Method in class net.nutch.protocol.RetryLater
 
getValue(int) - Method in class net.nutch.searcher.HitDetails
Returns the value of the ith field.
getValue(String) - Method in class net.nutch.searcher.HitDetails
Returns the value of the first field with the specified name.
getValueClass() - Method in class net.nutch.io.MapFile.Reader
Returns the class of values in this file.
getValueClass() - Method in class net.nutch.io.SequenceFile.Reader
Returns the class of values in this file.
getValueClass() - Method in class net.nutch.io.SequenceFile.Writer
Returns the class of values in this file.
getValues() - Method in class net.nutch.quality.dynamic.PageDescription
 
getVersion() - Method in class net.nutch.fetcher.FetcherOutput
 
getVersion() - Method in class net.nutch.io.VersionedWritable
Return the version number of the current implementation.
getVersion() - Method in class net.nutch.linkdb.LinkAnalysisEntry
 
getVersion() - Method in class net.nutch.parse.ParseData
 
getVersion() - Method in class net.nutch.parse.ParseText
 
getVersion() - Method in class net.nutch.protocol.Content
 
getWaitForExit() - Method in class net.nutch.util.CommandRunner
 
getWeight() - Method in class net.nutch.searcher.Query.Clause
 
getWorkingFile() - Method in interface net.nutch.util.NutchFileSystem
Get a real File for a name that's not yet under NutchFS control.
getWorkingFile() - Method in class net.nutch.util.NutchGenericFileSystem
Acquire a real File for a name that's not yet under NutchFS control.
gotHeartbeat(UTF8, UTF8, int, long, long) - Method in class net.nutch.fs.FSNamesystem
The given node has reported in.

H

HEARTBEAT_INTERVAL - Static variable in interface net.nutch.fs.FSConstants
 
HTMLLanguageParser - class net.nutch.analysis.lang.HTMLLanguageParser.
Adds metadata identifying language of document if found
HTMLLanguageParser() - Constructor for class net.nutch.analysis.lang.HTMLLanguageParser
 
HeartbeatData - class net.nutch.fs.HeartbeatData.
Heartbeat data
HeartbeatData() - Constructor for class net.nutch.fs.HeartbeatData
 
HeartbeatData(String, String, int, long, long) - Constructor for class net.nutch.fs.HeartbeatData
 
HighFreqTerms - class net.nutch.indexer.HighFreqTerms.
Lists the most frequent terms in an index.
HighFreqTerms() - Constructor for class net.nutch.indexer.HighFreqTerms
 
Hit - class net.nutch.searcher.Hit.
A document which matched a query in an index.
Hit() - Constructor for class net.nutch.searcher.Hit
 
Hit(int, int, float, String) - Constructor for class net.nutch.searcher.Hit
 
Hit(int, float, String) - Constructor for class net.nutch.searcher.Hit
 
HitContent - interface net.nutch.searcher.HitContent.
Service that returns the content of a hit.
HitDetailer - interface net.nutch.searcher.HitDetailer.
Service that returns details of a hit within an index.
HitDetails - class net.nutch.searcher.HitDetails.
Data stored in the index for a hit.
HitDetails() - Constructor for class net.nutch.searcher.HitDetails
 
HitDetails(String[], String[]) - Constructor for class net.nutch.searcher.HitDetails
Construct from field names and values arrays.
HitDetails(String, String) - Constructor for class net.nutch.searcher.HitDetails
Construct minimal details from a segment name and document number.
HitSummarizer - interface net.nutch.searcher.HitSummarizer.
Service that builds a summary for a hit on a query.
Hits - class net.nutch.searcher.Hits.
A set of hits matching a query.
Hits() - Constructor for class net.nutch.searcher.Hits
 
Hits(long, Hit[]) - Constructor for class net.nutch.searcher.Hits
 
HtmlParseFilter - interface net.nutch.parse.HtmlParseFilter.
Extension point for DOM-based HTML parsers.
HtmlParseFilters - class net.nutch.parse.HtmlParseFilters.
Creates and caches HtmlParseFilter implementing plugins.
HtmlParser - class net.nutch.parse.html.HtmlParser.
 
HtmlParser() - Constructor for class net.nutch.parse.html.HtmlParser
 
Http - class net.nutch.protocol.http.Http.
An implementation of the Http protocol.
Http() - Constructor for class net.nutch.protocol.http.Http
 
HttpDateFormat - class net.nutch.net.protocols.HttpDateFormat.
class to handle HTTP dates.
HttpDateFormat() - Constructor for class net.nutch.net.protocols.HttpDateFormat
 
HttpError - exception net.nutch.protocol.http.HttpError.
Thrown for HTTP error codes.
HttpError(int) - Constructor for class net.nutch.protocol.http.HttpError
 
HttpException - exception net.nutch.protocol.http.HttpException.
 
HttpException() - Constructor for class net.nutch.protocol.http.HttpException
 
HttpException(String) - Constructor for class net.nutch.protocol.http.HttpException
 
HttpException(String, Throwable) - Constructor for class net.nutch.protocol.http.HttpException
 
HttpException(Throwable) - Constructor for class net.nutch.protocol.http.HttpException
 
HttpResponse - class net.nutch.protocol.http.HttpResponse.
An HTTP response.
HttpResponse(URL) - Constructor for class net.nutch.protocol.http.HttpResponse
 
HttpResponse(String, URL) - Constructor for class net.nutch.protocol.http.HttpResponse
 
halfDigest() - Method in class net.nutch.io.MD5Hash
Construct a half-sized version of this MD5.
hasLoggedSevere() - Static method in class net.nutch.util.LogFormatter
Returns true if this LogFormatter has logged something at Level.SEVERE
hashCode() - Method in class net.nutch.db.Page
 
hashCode() - Method in class net.nutch.io.IntWritable
 
hashCode() - Method in class net.nutch.io.LongWritable
 
hashCode() - Method in class net.nutch.io.MD5Hash
Returns a hash code value for this object.
hashCode() - Method in class net.nutch.searcher.Hit
 
hashCode() - Method in class net.nutch.searcher.Query.Clause
 
hashCode() - Method in class net.nutch.searcher.Query.Phrase
 
hashCode() - Method in class net.nutch.searcher.Query.Term
 
hashCode() - Method in class net.nutch.searcher.Query
 

I

IGNORE_INTERNAL_LINKS - Static variable in class net.nutch.tools.UpdateDatabaseTool
 
INDEX_FILE_NAME - Static variable in class net.nutch.io.MapFile
The name of the index file.
INTER_ANCHOR_GAP - Static variable in class net.nutch.analysis.NutchDocumentAnalyzer
The number of unused term positions between anchors in the anchor field.
IWebDBReader - interface net.nutch.db.IWebDBReader.
IWebDBReader is an interface to the consolidated page/link database.
IWebDBWriter - interface net.nutch.db.IWebDBWriter.
IWebDBWriter is an interface to the consolidated page/link database.
IndexMerger - class net.nutch.indexer.IndexMerger.
Creates an index for the output corresponding to a single fetcher run.
IndexMerger(File, File[]) - Constructor for class net.nutch.indexer.IndexMerger
 
IndexOptimizer - class net.nutch.indexer.IndexOptimizer.
 
IndexOptimizer(File) - Constructor for class net.nutch.indexer.IndexOptimizer
 
IndexSearcher - class net.nutch.searcher.IndexSearcher.
Implements Searcher and HitDetailer for either a single merged index, or for a set of individual segment indexes.
IndexSearcher(File[]) - Constructor for class net.nutch.searcher.IndexSearcher
Construct given a number of indexed segments.
IndexSearcher(String) - Constructor for class net.nutch.searcher.IndexSearcher
Construct given a directory containing fetched segments, and a separate directory naming their merged index.
IndexSegment - class net.nutch.indexer.IndexSegment.
Creates an index for the output corresponding to a single fetcher run.
IndexSegment() - Constructor for class net.nutch.indexer.IndexSegment
 
IndexingException - exception net.nutch.indexer.IndexingException.
 
IndexingException() - Constructor for class net.nutch.indexer.IndexingException
 
IndexingException(String) - Constructor for class net.nutch.indexer.IndexingException
 
IndexingException(String, Throwable) - Constructor for class net.nutch.indexer.IndexingException
 
IndexingException(Throwable) - Constructor for class net.nutch.indexer.IndexingException
 
IndexingFilter - interface net.nutch.indexer.IndexingFilter.
Extension point for indexing.
IndexingFilters - class net.nutch.indexer.IndexingFilters.
Creates and caches IndexingFilter implementing plugins.
IntWritable - class net.nutch.io.IntWritable.
A WritableComparable for ints.
IntWritable() - Constructor for class net.nutch.io.IntWritable
 
IntWritable(int) - Constructor for class net.nutch.io.IntWritable
 
IntWritable.Comparator - class net.nutch.io.IntWritable.Comparator.
A Comparator optimized for IntWritable.
IntWritable.Comparator() - Constructor for class net.nutch.io.IntWritable.Comparator
 
identify(String) - Method in class net.nutch.analysis.lang.LanguageIdentifier
Identify language based on submitted content
identify(StringBuffer) - Method in class net.nutch.analysis.lang.LanguageIdentifier
 
identify(InputStream) - Method in class net.nutch.analysis.lang.LanguageIdentifier
Identify language from inputstream
image - Variable in class net.nutch.quality.dynamic.Token
The string image of the token.
inBuf - Variable in class net.nutch.quality.dynamic.SimpleCharStream
 
infix() - Method in class net.nutch.analysis.NutchAnalysis
Characters which can be used to form compound terms.
initRound(int, File) - Method in class net.nutch.tools.DistributedAnalysisTool
This method prepares the ground for a set of processes to distribute a round of LinkAnalysis work.
injectDmozFile(File, int, boolean, boolean, int, Pattern) - Method in class net.nutch.db.WebDBInjector
Iterate through all the items in this structured DMOZ file.
injectURLFile(File) - Method in class net.nutch.db.WebDBInjector
Iterate through all the items in this flat text file and add them to the db.
inputItem(HashMap) - Method in class net.nutch.quality.dynamic.PageDescription
 
inputStream - Variable in class net.nutch.quality.dynamic.SimpleCharStream
 
input_stream - Variable in class net.nutch.analysis.NutchAnalysisTokenManager
 
input_stream - Variable in class net.nutch.quality.dynamic.PageDescriptionTokenManager
 
invalidate(Block[]) - Method in class net.nutch.fs.FSDataset
We're informed that a block is no longer valid.
isAllowed(String) - Method in class net.nutch.protocol.http.RobotRulesParser.RobotRuleSet
Returns false if the robots.txt file prohibits us from accessing the given path, or true otherwise.
isAllowed(URL) - Static method in class net.nutch.protocol.http.RobotRulesParser
 
isBlockFilename(File) - Static method in class net.nutch.fs.Block
 
isDir(UTF8) - Method in class net.nutch.fs.FSDirectory
Check whether it's a directory
isEllipsis() - Method in class net.nutch.searcher.Summary.Ellipsis
Returns true.
isEllipsis() - Method in class net.nutch.searcher.Summary.Fragment
Returns true iff this fragment is an ellipsis.
isEmpty() - Method in class net.nutch.util.SoftHashMap
 
isField(String) - Static method in class net.nutch.searcher.QueryFilters
 
isHighlight() - Method in class net.nutch.searcher.Summary.Fragment
Returns true iff this fragment is to be highlighted.
isHighlight() - Method in class net.nutch.searcher.Summary.Highlight
Returns true.
isPhrase() - Method in class net.nutch.searcher.Query.Clause
 
isProhibited() - Method in class net.nutch.searcher.Query.Clause
 
isRawField(String) - Static method in class net.nutch.searcher.QueryFilters
 
isRemoteVerificationEnabled() - Method in class net.nutch.protocol.ftp.Client
Return whether or not verification of the remote host participating in data connections is enabled.
isRequired() - Method in class net.nutch.searcher.Query.Clause
 
isStopWord(String) - Static method in class net.nutch.analysis.NutchAnalysis
True iff word is a stop word.
isValidBlock(Block) - Method in class net.nutch.fs.FSDataset
Check whether the given block is a valid one.
isValidBlock(Block) - Method in class net.nutch.fs.FSDirectory
Returns whether the given block is one pointed-to by a file.
isValidToCreate(UTF8) - Method in class net.nutch.fs.FSDirectory
Check whether the filepath could be created
iterate(int, File) - Method in class net.nutch.tools.LinkAnalysisTool
Do a single-process iteration over the database.

J

jjFillToken() - Method in class net.nutch.analysis.NutchAnalysisTokenManager
 
jjFillToken() - Method in class net.nutch.quality.dynamic.PageDescriptionTokenManager
 
jj_nt - Variable in class net.nutch.analysis.NutchAnalysis
 
jj_nt - Variable in class net.nutch.quality.dynamic.PageDescription
 
jjnewLexState - Static variable in class net.nutch.quality.dynamic.PageDescriptionTokenManager
 
jjstrLiteralImages - Static variable in class net.nutch.analysis.NutchAnalysisTokenManager
 
jjstrLiteralImages - Static variable in class net.nutch.quality.dynamic.PageDescriptionTokenManager
 
join() - Method in class net.nutch.ipc.Server
Wait for the server to be stopped.

K

KEYWORD - Static variable in interface net.nutch.quality.dynamic.PageDescriptionConstants
 
key() - Method in class net.nutch.io.ArrayFile.Reader
Returns the key associated with the most recent call to ArrayFile.Reader.seek(long), ArrayFile.Reader.next(Writable), or ArrayFile.Reader.get(long,Writable).
keySet() - Method in class net.nutch.util.SoftHashMap
 
kind - Variable in class net.nutch.quality.dynamic.Token
An integer that describes the kind of this token.

L

LETTER - Static variable in interface net.nutch.analysis.NutchAnalysisConstants
 
LOG - Static variable in class net.nutch.analysis.lang.HTMLLanguageParser
 
LOG - Static variable in class net.nutch.analysis.lang.LanguageIdentifier
 
LOG - Static variable in class net.nutch.db.WebDBInjector
 
LOG - Static variable in class net.nutch.fetcher.Fetcher
 
LOG - Static variable in class net.nutch.indexer.IndexMerger
 
LOG - Static variable in class net.nutch.indexer.IndexSegment
 
LOG - Static variable in class net.nutch.indexer.basic.BasicIndexingFilter
 
LOG - Static variable in class net.nutch.io.SequenceFile
 
LOG - Static variable in class net.nutch.ipc.Client
 
LOG - Static variable in class net.nutch.ipc.Server
 
LOG - Static variable in class net.nutch.net.UrlNormalizer
 
LOG - Static variable in class net.nutch.parse.ParserChecker
 
LOG - Static variable in class net.nutch.parse.ParserFactory
 
LOG - Static variable in class net.nutch.parse.html.HtmlParser
 
LOG - Static variable in class net.nutch.parse.pdf.PdfParser
 
LOG - Static variable in class net.nutch.plugin.PluginDescriptor
 
LOG - Static variable in class net.nutch.plugin.PluginManifestParser
 
LOG - Static variable in class net.nutch.plugin.PluginRepository
 
LOG - Static variable in class net.nutch.protocol.ProtocolFactory
 
LOG - Static variable in class net.nutch.protocol.file.File
 
LOG - Static variable in class net.nutch.protocol.ftp.Ftp
 
LOG - Static variable in class net.nutch.protocol.http.Http
 
LOG - Static variable in class net.nutch.protocol.http.RobotRulesParser
 
LOG - Static variable in class net.nutch.searcher.DistributedSearch
 
LOG - Static variable in class net.nutch.searcher.NutchBean
 
LOG - Static variable in class net.nutch.searcher.Query
 
LOG - Static variable in class net.nutch.tools.CrawlTool
 
LOG - Static variable in class net.nutch.tools.DistributedAnalysisTool
 
LOG - Static variable in class net.nutch.tools.DumpSegment
 
LOG - Static variable in class net.nutch.tools.FetchListTool
 
LOG - Static variable in class net.nutch.tools.SegmentMergeTool
 
LOG - Static variable in class net.nutch.tools.UpdateDatabaseTool
 
LOG - Static variable in class net.nutch.tools.WebDBAdminTool
 
LOG - Static variable in class org.creativecommons.nutch.CCIndexingFilter
 
LOG - Static variable in class org.creativecommons.nutch.CCParseFilter
 
LanguageIdentifier - class net.nutch.analysis.lang.LanguageIdentifier.
 
LanguageIdentifier() - Constructor for class net.nutch.analysis.lang.LanguageIdentifier
 
LanguageQueryFilter - class net.nutch.analysis.lang.LanguageQueryFilter.
Handles "lang:" query clauses, causing them to search the "lang" field indexed by LanguageIdentifier.
LanguageQueryFilter() - Constructor for class net.nutch.analysis.lang.LanguageQueryFilter
 
LexicalError(boolean, int, int, int, String, char) - Static method in class net.nutch.quality.dynamic.TokenMgrError
Returns a detailed message for the Error when it is thrown by the token manager to indicate a lexical error.
Link - class net.nutch.db.Link.
This is the field in the Link Database.
Link() - Constructor for class net.nutch.db.Link
Create the Link with no data
Link(MD5Hash, long, String, String) - Constructor for class net.nutch.db.Link
Create the record
Link.MD5Comparator - class net.nutch.db.Link.MD5Comparator.
MD5Comparator is the opposite.
Link.MD5Comparator() - Constructor for class net.nutch.db.Link.MD5Comparator
 
Link.UrlComparator - class net.nutch.db.Link.UrlComparator.
URLComparator uses the standard method where, uh, the URL comes first.
Link.UrlComparator() - Constructor for class net.nutch.db.Link.UrlComparator
 
LinkAnalysisEntry - class net.nutch.linkdb.LinkAnalysisEntry.
An entry in the LinkAnalysisTool's output.
LinkAnalysisEntry() - Constructor for class net.nutch.linkdb.LinkAnalysisEntry
 
LinkAnalysisTool - class net.nutch.tools.LinkAnalysisTool.
LinkAnalysisTool performs link-analysis by using the DistributedAnalysisTool.
LinkAnalysisTool(File) - Constructor for class net.nutch.tools.LinkAnalysisTool
We need a DistributedAnalysisTool in order to get things done!
LogFormatter - class net.nutch.util.LogFormatter.
Prints just the date and the log message.
LogFormatter() - Constructor for class net.nutch.util.LogFormatter
 
LongWritable - class net.nutch.io.LongWritable.
A WritableComparable for longs.
LongWritable() - Constructor for class net.nutch.io.LongWritable
 
LongWritable(long) - Constructor for class net.nutch.io.LongWritable
 
LongWritable.Comparator - class net.nutch.io.LongWritable.Comparator.
A Comparator optimized for LongWritable.
LongWritable.Comparator() - Constructor for class net.nutch.io.LongWritable.Comparator
 
lastUpdate() - Method in class net.nutch.fs.DatanodeInfo
 
leftPad(String, int) - Static method in class net.nutch.util.StringUtil
Returns a copy of s padded with leading spaces so that it's length is length.
lengthNorm(String, int) - Method in class net.nutch.indexer.NutchSimilarity
Normalize field by length.
lexStateNames - Static variable in class net.nutch.analysis.NutchAnalysisTokenManager
 
lexStateNames - Static variable in class net.nutch.quality.dynamic.PageDescriptionTokenManager
 
line - Variable in class net.nutch.quality.dynamic.SimpleCharStream
 
linkParams - Static variable in class net.nutch.parse.html.DOMContentUtils
 
links() - Method in class net.nutch.db.DBSectionReader
Return all the links, by target URL
links() - Method in class net.nutch.db.DistributedWebDBReader
Return all the links, by target URL
links() - Method in interface net.nutch.db.IWebDBReader
Obtain an Enumeration of all Link objects, sorted by target URL.
links() - Method in class net.nutch.db.WebDBReader
Return all the links, by target URL
listing(UTF8) - Method in class net.nutch.fs.NDFSClient
 
load(InputStream) - Method in class net.nutch.analysis.lang.NGramProfile
Loads a ngram profile from InputStream
lock(NutchFile, boolean) - Method in interface net.nutch.util.NutchFileSystem
Obtain a lock with the given NutchFile as the lock object
lock(NutchFile, boolean) - Method in class net.nutch.util.NutchGenericFileSystem
Obtain a lock with the given NutchFile.
lockFile(String, String, String, boolean) - Method in class net.nutch.util.NutchGenericFileSystem
 
lockFile(String, String, String, boolean) - Method in class net.nutch.util.NutchNFSFileSystem
Obtain a lock with the given info.
lockFile(String, String, String, boolean) - Method in class net.nutch.util.NutchRemoteFileSystem
Currently unimplemented
login(String, String) - Method in class net.nutch.protocol.ftp.Client
Login to the FTP server using the provided username and password.
logout() - Method in class net.nutch.protocol.ftp.Client
Logout of the FTP server by sending the QUIT command.
longestMatch(String) - Method in class net.nutch.util.PrefixStringMatcher
Returns the longest prefix of input that is matched, or null if no match exists.
longestMatch(String) - Method in class net.nutch.util.SuffixStringMatcher
Returns the longest suffix of input that is matched, or null if no match exists.
longestMatch(String) - Method in class net.nutch.util.TrieStringMatcher
Returns the longest substring of input that is matched by a pattern in the trie, or null if no match exists.
lookingAhead - Variable in class net.nutch.analysis.NutchAnalysis
 
ls(String) - Method in class net.nutch.fs.TestClient
Get a listing of all files in NDFS at the indicated name

M

MAX_ANCHOR_LENGTH - Static variable in class net.nutch.db.Link
 
MAX_OUTLINKS_PER_PAGE - Static variable in class net.nutch.tools.UpdateDatabaseTool
 
MAX_SECTIONS - Static variable in class net.nutch.db.DBKeyDivision
 
MD5Hash - class net.nutch.io.MD5Hash.
A Writable for MD5 hash values.
MD5Hash() - Constructor for class net.nutch.io.MD5Hash
Constructs an MD5Hash.
MD5Hash(String) - Constructor for class net.nutch.io.MD5Hash
Constructs an MD5Hash from a hex string.
MD5Hash(byte[]) - Constructor for class net.nutch.io.MD5Hash
Constructs an MD5Hash with a specified value.
MD5Hash.Comparator - class net.nutch.io.MD5Hash.Comparator.
A WritableComparator optimized for MD5Hash keys.
MD5Hash.Comparator() - Constructor for class net.nutch.io.MD5Hash.Comparator
 
MD5_KEYSPACE - Static variable in class net.nutch.db.EditSectionGroupWriter
 
MD5_KEYSPACE_DIVIDERS - Static variable in class net.nutch.db.DBKeyDivision
 
MD5_LEN - Static variable in class net.nutch.io.MD5Hash
 
MINUS - Static variable in interface net.nutch.analysis.NutchAnalysisConstants
 
MSWordParser - class net.nutch.parse.msword.MSWordParser.
parser for mime type application/msword.
MSWordParser() - Constructor for class net.nutch.parse.msword.MSWordParser
 
MapFile - class net.nutch.io.MapFile.
A file-based map from keys to values.
MapFile() - Constructor for class net.nutch.io.MapFile
 
MapFile.Reader - class net.nutch.io.MapFile.Reader.
Provide access to an existing map.
MapFile.Reader(String) - Constructor for class net.nutch.io.MapFile.Reader
Construct a map reader for the named map.
MapFile.Reader(String, WritableComparator) - Constructor for class net.nutch.io.MapFile.Reader
Construct a map reader for the named map using the named comparator.
MapFile.Writer - class net.nutch.io.MapFile.Writer.
Writes a new map.
MapFile.Writer(String, Class, Class) - Constructor for class net.nutch.io.MapFile.Writer
Create the named map for keys of the named class.
MapFile.Writer(String, WritableComparator, Class) - Constructor for class net.nutch.io.MapFile.Writer
Create the named map using the named key comparator.
main(String[]) - Static method in class net.nutch.analysis.CommonGrams
For debugging.
main(String[]) - Static method in class net.nutch.analysis.NutchAnalysis
For debugging.
main(String[]) - Static method in class net.nutch.analysis.NutchDocumentTokenizer
For debugging.
main(String[]) - Static method in class net.nutch.analysis.lang.LanguageIdentifier
main method used for testing
main(String[]) - Static method in class net.nutch.analysis.lang.NGramProfile
main method used for testing only
main(String[]) - Static method in class net.nutch.db.DistributedWebDBReader
The DistributedWebDBReader.main() provides some handy utility methods for looking through the contents of the webdb.
main(String[]) - Static method in class net.nutch.db.DistributedWebDBWriter
The WebDBWriter.main() provides some handy methods for testing the WebDB.
main(String[]) - Static method in class net.nutch.db.WebDBInjector
Command-line access.
main(String[]) - Static method in class net.nutch.db.WebDBReader
The WebDBReader.main() provides some handy utility methods for looking through the contents of the webdb.
main(String[]) - Static method in class net.nutch.db.WebDBWriter
The WebDBWriter.main() provides some handy methods for testing the WebDB.
main(String[]) - Static method in class net.nutch.fetcher.Fetcher
Run the fetcher.
main(String[]) - Static method in class net.nutch.fetcher.FetcherOutput
 
main(String[]) - Static method in class net.nutch.fs.NDFS.DataNode
 
main(String[]) - Static method in class net.nutch.fs.NDFS.NameNode
 
main(String[]) - Static method in class net.nutch.fs.TestClient
main() has some simple utility methods
main(String[]) - Static method in class net.nutch.indexer.DeleteDuplicates
Delete duplicates in the indexes in the named directory.
main(String[]) - Static method in class net.nutch.indexer.HighFreqTerms
 
main(String[]) - Static method in class net.nutch.indexer.IndexMerger
Create an index for the input files in the named directory.
main(String[]) - Static method in class net.nutch.indexer.IndexOptimizer
 
main(String[]) - Static method in class net.nutch.indexer.IndexSegment
Create an index for the input files in the named directory.
main(String[]) - Static method in class net.nutch.io.MapFile
 
main(String[]) - Static method in class net.nutch.net.PrefixURLFilter
 
main(String[]) - Static method in class net.nutch.net.RegexURLFilter
 
main(String[]) - Static method in class net.nutch.net.protocols.HttpDateFormat
 
main(String[]) - Static method in class net.nutch.pagedb.FetchListEntry
 
main(String[]) - Static method in class net.nutch.parse.ParseData
 
main(String[]) - Static method in class net.nutch.parse.ParseText
 
main(String[]) - Static method in class net.nutch.parse.ParserChecker
 
main(String[]) - Static method in class net.nutch.protocol.Content
 
main(String[]) - Static method in class net.nutch.protocol.file.File
For debugging.
main(String[]) - Static method in class net.nutch.protocol.ftp.Ftp
For debugging.
main(String[]) - Static method in class net.nutch.protocol.http.Http
For debugging.
main(String[]) - Static method in class net.nutch.protocol.http.RobotRulesParser
command-line main for testing
main(String[]) - Static method in class net.nutch.quality.dynamic.PageDescription
Test out sherlock parsing
main(String[]) - Static method in class net.nutch.searcher.DistributedSearch.Client
 
main(String[]) - Static method in class net.nutch.searcher.DistributedSearch.Server
Runs a search server.
main(String[]) - Static method in class net.nutch.searcher.NutchBean
For debugging.
main(String[]) - Static method in class net.nutch.searcher.Query
For debugging.
main(String[]) - Static method in class net.nutch.searcher.Summarizer
Tests Summary-generation.
main(String[]) - Static method in class net.nutch.tools.CrawlTool
 
main(String[]) - Static method in class net.nutch.tools.DistributedAnalysisTool
Kick off the link analysis.
main(String[]) - Static method in class net.nutch.tools.DumpSegment
 
main(String[]) - Static method in class net.nutch.tools.FetchListTool
Generate a fetchlist from the pagedb and linkdb
main(String[]) - Static method in class net.nutch.tools.LinkAnalysisTool
Kick off the link analysis.
main(String[]) - Static method in class net.nutch.tools.SegmentMergeTool
 
main(String[]) - Static method in class net.nutch.tools.UpdateDatabaseTool
Create the UpdateDatabaseTool, and pass in a WebDBWriter.
main(String[]) - Static method in class net.nutch.tools.WebDBAdminTool
This tool performs a number of generic db management tasks.
main(String[]) - Static method in class net.nutch.util.CommandRunner
 
main(String[]) - Static method in class net.nutch.util.NutchConf
For debugging.
main(String[]) - Static method in class net.nutch.util.PrefixStringMatcher
 
main(String[]) - Static method in class net.nutch.util.ScoreStats
 
main(String[]) - Static method in class net.nutch.util.StringUtil
 
main(String[]) - Static method in class net.nutch.util.SuffixStringMatcher
 
main(String[]) - Static method in class org.creativecommons.nutch.CCDeleteUnlicensedTool
Delete duplicates in the indexes in the named directory.
matchChar(TrieStringMatcher.TrieNode, String, int) - Method in class net.nutch.util.TrieStringMatcher
Returns the next TrieStringMatcher.TrieNode visited, given that you are at node, and the the next character in the input is the idx'th character of s.
matchItem(HashMap) - Method in class net.nutch.quality.dynamic.PageDescription
 
matches(String) - Method in class net.nutch.util.PrefixStringMatcher
Returns true if the given String is matched by a prefix in the trie
matches(String) - Method in class net.nutch.util.SuffixStringMatcher
Returns true if the given String is matched by a suffix in the trie
matches(String) - Method in class net.nutch.util.TrieStringMatcher
Returns true if the given String is matched by a pattern in the trie
maxNextCharInd - Variable in class net.nutch.quality.dynamic.SimpleCharStream
 
md5Compare(Object) - Method in class net.nutch.db.Link
Compare MD5s, then compare URLs.
merge(String[], String) - Method in class net.nutch.io.SequenceFile.Sorter
Merge the provided files.
mergeSectionComponents() - Method in class net.nutch.db.EditSectionGroupReader
Merge all the components of the Section into a single file and return the location.
moreFromSiteExcluded() - Method in class net.nutch.searcher.Hit
True iff other, lower-scoring, hits from the same site have been excluded from the list which contains this hit..

N

NDFS - class net.nutch.fs.NDFS.
The NDFS class holds the NDFS client and server.
NDFS.DataNode - class net.nutch.fs.NDFS.DataNode.
DataNode controls just one critical table: block-> BLOCK_SIZE stream of bytes This info is stored on disk (the NameNode is responsible for asking other machines to replicate the data).
NDFS.DataNode(String, File, String, int, InetSocketAddress) - Constructor for class net.nutch.fs.NDFS.DataNode
Needs a directory to find its data (and config info)
NDFS.NameNode - class net.nutch.fs.NDFS.NameNode.
NameNode controls two critical tables: 1) filename->blocksequence,version 2) block->machinelist The first table is stored on disk and is very precious.
NDFS.NameNode(File, int) - Constructor for class net.nutch.fs.NDFS.NameNode
Create a NameNode at the specified location
NDFSClient - class net.nutch.fs.NDFSClient.
NDFSClient does what's necessary to connect to a Nutch Filesystem and perform basic file tasks.
NDFSClient(InetSocketAddress) - Constructor for class net.nutch.fs.NDFSClient
 
NEW_EXTERNAL_LINK_FACTOR - Static variable in class net.nutch.tools.UpdateDatabaseTool
 
NEW_INTERNAL_LINK_FACTOR - Static variable in class net.nutch.tools.UpdateDatabaseTool
 
NGramProfile - class net.nutch.analysis.lang.NGramProfile.
This class runs a ngram analysis over submitted text, results might be used for automatic language identifiaction.
NGramProfile(String) - Constructor for class net.nutch.analysis.lang.NGramProfile
Construct a new ngram profile
NOT_FOUND - Static variable in class net.nutch.fetcher.FetcherOutput
 
NullWritable - class net.nutch.io.NullWritable.
Singleton Writable with no data.
NutchAnalysis - class net.nutch.analysis.NutchAnalysis.
The JavaCC-generated Nutch lexical analyzer and query parser.
NutchAnalysis(CharStream) - Constructor for class net.nutch.analysis.NutchAnalysis
 
NutchAnalysis(NutchAnalysisTokenManager) - Constructor for class net.nutch.analysis.NutchAnalysis
 
NutchAnalysisConstants - interface net.nutch.analysis.NutchAnalysisConstants.
 
NutchAnalysisTokenManager - class net.nutch.analysis.NutchAnalysisTokenManager.
 
NutchAnalysisTokenManager(Reader) - Constructor for class net.nutch.analysis.NutchAnalysisTokenManager
Constructs a token manager for the provided Reader.
NutchAnalysisTokenManager(CharStream) - Constructor for class net.nutch.analysis.NutchAnalysisTokenManager
 
NutchAnalysisTokenManager(CharStream, int) - Constructor for class net.nutch.analysis.NutchAnalysisTokenManager
 
NutchBean - class net.nutch.searcher.NutchBean.
One stop shopping for search-related functionality.
NutchBean() - Constructor for class net.nutch.searcher.NutchBean
Construct reading from connected directory.
NutchBean(File) - Constructor for class net.nutch.searcher.NutchBean
Construct in a named directory.
NutchConf - class net.nutch.util.NutchConf.
Provides access to Nutch configuration parameters.
NutchConf() - Constructor for class net.nutch.util.NutchConf
 
NutchDocumentAnalyzer - class net.nutch.analysis.NutchDocumentAnalyzer.
The analyzer used for Nutch documents.
NutchDocumentAnalyzer() - Constructor for class net.nutch.analysis.NutchDocumentAnalyzer
 
NutchDocumentTokenizer - class net.nutch.analysis.NutchDocumentTokenizer.
The tokenizer used for Nutch document text.
NutchDocumentTokenizer(Reader) - Constructor for class net.nutch.analysis.NutchDocumentTokenizer
Construct a tokenizer for the text in a Reader.
NutchFile - class net.nutch.util.NutchFile.
A class that names a file in the "NutchFileSpace".
NutchFile(NutchFileSystem, String, String, File) - Constructor for class net.nutch.util.NutchFile
A NutchFile contains: dbName, which labels the cooperating NutchFileSystem it belongs to.
NutchFile(NutchFile, String) - Constructor for class net.nutch.util.NutchFile
Create a NutchFile from a previous one that is a directory.
NutchFileSystem - interface net.nutch.util.NutchFileSystem.
NutchFileSystem is an interface for a fairly simple distributed file system.
NutchGenericFileSystem - class net.nutch.util.NutchGenericFileSystem.
NutchGenericFileSystem implements the NutchFileSystem interface and adds some generic utility methods for subclasses to use.
NutchGenericFileSystem(File, ShareSet, boolean) - Constructor for class net.nutch.util.NutchGenericFileSystem
Create a Nutch Filesystem at the indicated mounted directory.
NutchNFSFileSystem - class net.nutch.util.NutchNFSFileSystem.
NutchNFSFileSystem implements NutchFileSystem over the Network File System.
NutchNFSFileSystem(File, boolean) - Constructor for class net.nutch.util.NutchNFSFileSystem
Create the ShareSet automatically, and then go on to the regular constructor.
NutchNFSFileSystem(File, ShareSet, boolean) - Constructor for class net.nutch.util.NutchNFSFileSystem
Create a Nutch Filesystem at the indicated mounted directory.
NutchRemoteFileSystem - class net.nutch.util.NutchRemoteFileSystem.
NutchRemoteFileSystem implements the NutchFileSystem over machines that can be linked via some set of command-line args.
NutchRemoteFileSystem(File, String, String, String) - Constructor for class net.nutch.util.NutchRemoteFileSystem
Create the ShareSet automatically, then do regular constructor.
NutchRemoteFileSystem(File, ShareSet, String, String, String) - Constructor for class net.nutch.util.NutchRemoteFileSystem
The NutchRemoteFileSystem takes template-strings for its various needed commands, which may differ among installations.
NutchSimilarity - class net.nutch.indexer.NutchSimilarity.
Similarity implementatation used by Nutch indexing and search.
NutchSimilarity() - Constructor for class net.nutch.indexer.NutchSimilarity
 
net.nutch.analysis - package net.nutch.analysis
Tokenizer for documents and query parser.
net.nutch.analysis.lang - package net.nutch.analysis.lang
Text document language identifier.
net.nutch.db - package net.nutch.db
Web database: tracks page fetches and link structure.
net.nutch.fetcher - package net.nutch.fetcher
The Nutch robot.
net.nutch.fs - package net.nutch.fs
 
net.nutch.html - package net.nutch.html
 
net.nutch.indexer - package net.nutch.indexer
Maintain Lucene full-text indexes.
net.nutch.indexer.basic - package net.nutch.indexer.basic
A basic indexing plugin.
net.nutch.io - package net.nutch.io
Generic i/o code for use when reading and writing data to the network, to databases, and to files.
net.nutch.ipc - package net.nutch.ipc
Client/Server code used by distributed search.
net.nutch.linkdb - package net.nutch.linkdb
 
net.nutch.net - package net.nutch.net
 
net.nutch.net.protocols - package net.nutch.net.protocols
 
net.nutch.pagedb - package net.nutch.pagedb
 
net.nutch.parse - package net.nutch.parse
 
net.nutch.parse.html - package net.nutch.parse.html
An HTML document parsing plugin.
net.nutch.parse.msword - package net.nutch.parse.msword
A Word document parsing plugin.
net.nutch.parse.msword.chp - package net.nutch.parse.msword.chp
 
net.nutch.parse.pdf - package net.nutch.parse.pdf
A pdf parsing plugin.
net.nutch.parse.text - package net.nutch.parse.text
A plain text parsing plugin.
net.nutch.plugin - package net.nutch.plugin
 
net.nutch.protocol - package net.nutch.protocol
 
net.nutch.protocol.file - package net.nutch.protocol.file
Protocol plugin which supports retrieving local file resources.
net.nutch.protocol.ftp - package net.nutch.protocol.ftp
Protocol plugin which supports retrieving documents via the ftp protocol.
net.nutch.protocol.http - package net.nutch.protocol.http
Protocol plugin which supports retrieving documents via the http protocol.
net.nutch.quality.dynamic - package net.nutch.quality.dynamic
 
net.nutch.searcher - package net.nutch.searcher
Search API
net.nutch.tools - package net.nutch.tools
 
net.nutch.util - package net.nutch.util
 
newKey() - Method in class net.nutch.io.WritableComparator
Construct a new WritableComparable instance.
newToken(int) - Static method in class net.nutch.quality.dynamic.Token
Returns a new Token object, by default.
next() - Method in class net.nutch.analysis.NutchDocumentTokenizer
Returns the next token in the stream, or null at EOF.
next(Writable) - Method in class net.nutch.io.ArrayFile.Reader
Read and return the next value in the file.
next(WritableComparable, Writable) - Method in class net.nutch.io.MapFile.Reader
Read the next key/value pair in the map into key and val.
next(Writable) - Method in class net.nutch.io.SequenceFile.Reader
Read the next key in the file into key, skipping its value.
next(Writable, Writable) - Method in class net.nutch.io.SequenceFile.Reader
Read the next key/value pair in the file into key and val.
next(DataOutputBuffer) - Method in class net.nutch.io.SequenceFile.Reader
Read the next key/value pair in the file into buffer.
next(WritableComparable) - Method in class net.nutch.io.SetFile.Reader
Read the next key in a set into key.
next - Variable in class net.nutch.quality.dynamic.Token
A reference to the next regular (non-special) token from the input stream.
nodeChar - Variable in class net.nutch.util.TrieStringMatcher.TrieNode
 
nonOpInfix() - Method in class net.nutch.analysis.NutchAnalysis
Parse infix characters except plus and minus.
nonOpOrTerm() - Method in class net.nutch.analysis.NutchAnalysis
Parse anything but a term or an operator (plur or minus or quote).
nonTerm() - Method in class net.nutch.analysis.NutchAnalysis
Parse anything but a term or a quote.
normalize(String) - Static method in class net.nutch.net.UrlNormalizer
 
numEdits() - Method in class net.nutch.db.EditSectionGroupReader
Return how many edits there are in this section.
numLinks() - Method in class net.nutch.db.DistributedWebDBReader
Return the number of links in our db.
numLinks() - Method in interface net.nutch.db.IWebDBReader
Simple count of all Link objects in db.
numLinks() - Method in class net.nutch.db.WebDBReader
Return the number of links in our db.
numMachines() - Method in class net.nutch.db.DistributedWebDBReader
How many sections (machines) there are in this distributed db.
numPages() - Method in class net.nutch.db.DistributedWebDBReader
Return the number of pages we're dealing with.
numPages() - Method in interface net.nutch.db.IWebDBReader
Simple count of all Page objects in db.
numPages() - Method in class net.nutch.db.WebDBReader
Return the number of pages we're dealing with
numTerms - Static variable in class net.nutch.indexer.HighFreqTerms
 

O

OP_ACK - Static variable in interface net.nutch.fs.FSConstants
 
OP_BLOCKRECEIVED - Static variable in interface net.nutch.fs.FSConstants
 
OP_BLOCKREPORT - Static variable in interface net.nutch.fs.FSConstants
 
OP_CLIENT_ADDBLOCK - Static variable in interface net.nutch.fs.FSConstants
 
OP_CLIENT_ADDBLOCK_ACK - Static variable in interface net.nutch.fs.FSConstants
 
OP_CLIENT_COMPLETEFILE - Static variable in interface net.nutch.fs.FSConstants
 
OP_CLIENT_COMPLETEFILE_ACK - Static variable in interface net.nutch.fs.FSConstants
 
OP_CLIENT_DELETE - Static variable in interface net.nutch.fs.FSConstants
 
OP_CLIENT_DELETE_ACK - Static variable in interface net.nutch.fs.FSConstants
 
OP_CLIENT_LISTING - Static variable in interface net.nutch.fs.FSConstants
 
OP_CLIENT_LISTING_ACK - Static variable in interface net.nutch.fs.FSConstants
 
OP_CLIENT_OPEN - Static variable in interface net.nutch.fs.FSConstants
 
OP_CLIENT_OPEN_ACK - Static variable in interface net.nutch.fs.FSConstants
 
OP_CLIENT_RENAMETO - Static variable in interface net.nutch.fs.FSConstants
 
OP_CLIENT_RENAMETO_ACK - Static variable in interface net.nutch.fs.FSConstants
 
OP_CLIENT_STARTFILE - Static variable in interface net.nutch.fs.FSConstants
 
OP_CLIENT_STARTFILE_ACK - Static variable in interface net.nutch.fs.FSConstants
 
OP_CLIENT_TRYAGAIN - Static variable in interface net.nutch.fs.FSConstants
 
OP_ERROR - Static variable in interface net.nutch.fs.FSConstants
 
OP_FAILURE - Static variable in interface net.nutch.fs.FSConstants
 
OP_HEARTBEAT - Static variable in interface net.nutch.fs.FSConstants
 
OP_INVALIDATE_BLOCKS - Static variable in interface net.nutch.fs.FSConstants
 
OP_READ_BLOCK - Static variable in interface net.nutch.fs.FSConstants
 
OP_TRANSFERBLOCKS - Static variable in interface net.nutch.fs.FSConstants
 
OP_TRANSFERDATA - Static variable in interface net.nutch.fs.FSConstants
 
OP_WRITE_BLOCK - Static variable in interface net.nutch.fs.FSConstants
 
Outlink - class net.nutch.parse.Outlink.
 
Outlink() - Constructor for class net.nutch.parse.Outlink
 
Outlink(String, String) - Constructor for class net.nutch.parse.Outlink
 
offerService() - Method in class net.nutch.fs.NDFS.DataNode
Main loop for the DataNode.
op - Variable in class net.nutch.fs.FSParam
 
op - Variable in class net.nutch.fs.FSResults
 
open(UTF8) - Method in class net.nutch.fs.FSNamesystem
The client wants to open the given filename.
open(UTF8) - Method in class net.nutch.fs.NDFSClient
Create an input stream that obtains a nodelist from the namenode, and then reads from all the right places.
optimize() - Method in class net.nutch.indexer.IndexOptimizer
 
optimizePhrase(Query.Phrase, String) - Static method in class net.nutch.analysis.CommonGrams
Optimizes phrase queries to use n-grams when possible.
org.creativecommons.nutch - package org.creativecommons.nutch
Sample plugins that parse and index Creative Commons medadata.

P

PLUS - Static variable in interface net.nutch.analysis.NutchAnalysisConstants
 
Page - class net.nutch.db.Page.
A row in the Page Database.
Page() - Constructor for class net.nutch.db.Page
Construct a page ready to be read by Page.readFields(DataInput).
Page(String, MD5Hash) - Constructor for class net.nutch.db.Page
Construct a new, default page, due to be fetched.
Page(String, float, float, long) - Constructor for class net.nutch.db.Page
Construct a new, default page, due to be fetched.
Page(String, float, float) - Constructor for class net.nutch.db.Page
Construct a new, default page, due to be fetched.
Page.Comparator - class net.nutch.db.Page.Comparator.
Compares pages by MD5, then by URL.
Page.Comparator() - Constructor for class net.nutch.db.Page.Comparator
 
Page.UrlComparator - class net.nutch.db.Page.UrlComparator.
Compares pages by URL only.
Page.UrlComparator() - Constructor for class net.nutch.db.Page.UrlComparator
 
PageDescription - class net.nutch.quality.dynamic.PageDescription.
PageDescription gives the URL and the textual description for a target page.
PageDescription(InputStream) - Constructor for class net.nutch.quality.dynamic.PageDescription
 
PageDescription(Reader) - Constructor for class net.nutch.quality.dynamic.PageDescription
 
PageDescription(PageDescriptionTokenManager) - Constructor for class net.nutch.quality.dynamic.PageDescription
 
PageDescriptionConstants - interface net.nutch.quality.dynamic.PageDescriptionConstants.
 
PageDescriptionTokenManager - class net.nutch.quality.dynamic.PageDescriptionTokenManager.
 
PageDescriptionTokenManager(SimpleCharStream) - Constructor for class net.nutch.quality.dynamic.PageDescriptionTokenManager
 
PageDescriptionTokenManager(SimpleCharStream, int) - Constructor for class net.nutch.quality.dynamic.PageDescriptionTokenManager
 
Parse - interface net.nutch.parse.Parse.
The result of parsing a page's raw content.
ParseData - class net.nutch.parse.ParseData.
Data extracted from a page's content.
ParseData() - Constructor for class net.nutch.parse.ParseData
 
ParseData(String, Outlink[], Properties) - Constructor for class net.nutch.parse.ParseData
 
ParseException - exception net.nutch.parse.ParseException.
 
ParseException() - Constructor for class net.nutch.parse.ParseException
 
ParseException(String) - Constructor for class net.nutch.parse.ParseException
 
ParseException(String, Throwable) - Constructor for class net.nutch.parse.ParseException
 
ParseException(Throwable) - Constructor for class net.nutch.parse.ParseException
 
ParseException - exception net.nutch.quality.dynamic.ParseException.
This exception is thrown when parse errors are encountered.
ParseException(Token, int[][], String[]) - Constructor for class net.nutch.quality.dynamic.ParseException
This constructor is used by the method "generateParseException" in the generated parser.
ParseException() - Constructor for class net.nutch.quality.dynamic.ParseException
The following constructors are for use by you for whatever purpose you can think of.
ParseException(String) - Constructor for class net.nutch.quality.dynamic.ParseException
 
ParseImpl - class net.nutch.parse.ParseImpl.
The result of parsing a page's raw content.
ParseImpl(String, ParseData) - Constructor for class net.nutch.parse.ParseImpl
 
ParseText - class net.nutch.parse.ParseText.
 
ParseText() - Constructor for class net.nutch.parse.ParseText
 
ParseText(String) - Constructor for class net.nutch.parse.ParseText
 
Parser - interface net.nutch.parse.Parser.
A parser for content generated by a Protocol implementation.
ParserChecker - class net.nutch.parse.ParserChecker.
Parser checker, useful for testing parser.
ParserChecker() - Constructor for class net.nutch.parse.ParserChecker
 
ParserFactory - class net.nutch.parse.ParserFactory.
Creates and caches Parser plugins.
ParserNotFound - exception net.nutch.parse.ParserNotFound.
 
ParserNotFound(String, String) - Constructor for class net.nutch.parse.ParserNotFound
 
ParserNotFound(String, String, String) - Constructor for class net.nutch.parse.ParserNotFound
 
PasswordProtectedException - exception net.nutch.parse.msword.PasswordProtectedException.
 
PasswordProtectedException(String) - Constructor for class net.nutch.parse.msword.PasswordProtectedException
 
PdfParser - class net.nutch.parse.pdf.PdfParser.
parser for mime type application/pdf.
PdfParser() - Constructor for class net.nutch.parse.pdf.PdfParser
 
Plugin - class net.nutch.plugin.Plugin.
A nutch-plugin is an container for a set of custom logic that provide extensions to the nutch core functionality or a other plugin that proides a API for extending.
Plugin(PluginDescriptor) - Constructor for class net.nutch.plugin.Plugin
Constructor
PluginClassLoader - class net.nutch.plugin.PluginClassLoader.
The PluginClassLoader contains only classes of the runtime libraries setuped in the plugin manifest file and exported libraries of plugins that are required pluguin.
PluginClassLoader(URL[], ClassLoader) - Constructor for class net.nutch.plugin.PluginClassLoader
Construtor
PluginDescriptor - class net.nutch.plugin.PluginDescriptor.
The PluginDescriptor provide access to all meta information of a nutch-plugin, as well to the internationalizable resources and the plugin own classloader.
PluginDescriptor(String, String, String, String, String, String) - Constructor for class net.nutch.plugin.PluginDescriptor
Constructor
PluginManifestParser - class net.nutch.plugin.PluginManifestParser.
The PluginManifestParser parser just parse the manifest file in all plugin directories.
PluginManifestParser() - Constructor for class net.nutch.plugin.PluginManifestParser
 
PluginRepository - class net.nutch.plugin.PluginRepository.
The plugin repositority is a registry of all plugins.
PluginRuntimeException - exception net.nutch.plugin.PluginRuntimeException.
PluginRuntimeException will be thrown until a exception in the plugin managemnt occurs.
PluginRuntimeException(Throwable) - Constructor for class net.nutch.plugin.PluginRuntimeException
 
PluginRuntimeException(String) - Constructor for class net.nutch.plugin.PluginRuntimeException
 
PrefixStringMatcher - class net.nutch.util.PrefixStringMatcher.
A class for efficiently matching Strings against a set of prefixes.
PrefixStringMatcher(String[]) - Constructor for class net.nutch.util.PrefixStringMatcher
Creates a new PrefixStringMatcher which will match Strings with any prefix in the supplied array.
PrefixStringMatcher(Collection) - Constructor for class net.nutch.util.PrefixStringMatcher
Creates a new PrefixStringMatcher which will match Strings with any prefix in the supplied Collection.
PrefixURLFilter - class net.nutch.net.PrefixURLFilter.
Filters URLs based on a file of URL prefixes.
PrefixURLFilter() - Constructor for class net.nutch.net.PrefixURLFilter
 
PrefixURLFilter(String) - Constructor for class net.nutch.net.PrefixURLFilter
 
PrintCommandListener - class net.nutch.protocol.ftp.PrintCommandListener.
This is a support class for logging all ftp command/reply traffic.
PrintCommandListener(Logger) - Constructor for class net.nutch.protocol.ftp.PrintCommandListener
 
Protocol - interface net.nutch.protocol.Protocol.
A retriever of url content.
ProtocolException - exception net.nutch.net.protocols.ProtocolException.
Base exception for all protocol handlers
ProtocolException() - Constructor for class net.nutch.net.protocols.ProtocolException
 
ProtocolException(String) - Constructor for class net.nutch.net.protocols.ProtocolException
 
ProtocolException(String, Throwable) - Constructor for class net.nutch.net.protocols.ProtocolException
 
ProtocolException(Throwable) - Constructor for class net.nutch.net.protocols.ProtocolException
 
ProtocolException - exception net.nutch.protocol.ProtocolException.
Thrown by Protocol.getContent(String).
ProtocolException() - Constructor for class net.nutch.protocol.ProtocolException
 
ProtocolException(String) - Constructor for class net.nutch.protocol.ProtocolException
 
ProtocolException(String, Throwable) - Constructor for class net.nutch.protocol.ProtocolException
 
ProtocolException(Throwable) - Constructor for class net.nutch.protocol.ProtocolException
 
ProtocolFactory - class net.nutch.protocol.ProtocolFactory.
Creates and caches Protocol plugins.
ProtocolNotFound - exception net.nutch.protocol.ProtocolNotFound.
 
ProtocolNotFound(String) - Constructor for class net.nutch.protocol.ProtocolNotFound
 
ProtocolNotFound(String, String) - Constructor for class net.nutch.protocol.ProtocolNotFound
 
pageExists(MD5Hash) - Method in class net.nutch.db.DBSectionReader
Test whether a certain piece of content is in the db, but don't bother returning it.
pageExists(MD5Hash) - Method in class net.nutch.db.DistributedWebDBReader
Test whether a certain piece of content is in the database, but don't bother returning the Page(s) itself.
pageExists(MD5Hash) - Method in interface net.nutch.db.IWebDBReader
Returns whether a Page with the given MD5 checksum is in the db.
pageExists(MD5Hash) - Method in class net.nutch.db.WebDBReader
Test whether a certain piece of content is in the database, but don't bother returning the Page(s) itself.
pages() - Method in class net.nutch.db.DBSectionReader
Iterate through all the Pages, sorted by URL
pages() - Method in class net.nutch.db.DistributedWebDBReader
Iterate through all the Pages, sorted by URL.
pages() - Method in interface net.nutch.db.IWebDBReader
Obtain an Enumeration of all Page objects, sorted by URL
pages() - Method in class net.nutch.db.WebDBReader
Iterate through all the Pages, sorted by URL
pagesByMD5() - Method in class net.nutch.db.DBSectionReader
Iterate through all the Pages, sorted by MD5
pagesByMD5() - Method in class net.nutch.db.DistributedWebDBReader
Iterate through all the Pages, sorted by MD5.
pagesByMD5() - Method in interface net.nutch.db.IWebDBReader
Obtain an Enumeration of all Page objects, sorted by MD5.
pagesByMD5() - Method in class net.nutch.db.WebDBReader
Iterate through all the Pages, sorted by MD5
param() - Method in class net.nutch.quality.dynamic.PageDescription
 
parse() - Method in class net.nutch.analysis.NutchAnalysis
Parse a query.
parse() - Method in class net.nutch.quality.dynamic.PageDescription
 
parse(String) - Static method in class net.nutch.searcher.Query
Parse a query from a string.
parseCharacterEncoding(String) - Static method in class net.nutch.util.StringUtil
Parse the character encoding from the specified content type header.
parsePluginFolder() - Static method in class net.nutch.plugin.PluginManifestParser
Returns a list with plugin descriptors.
parseQuery(String) - Static method in class net.nutch.analysis.NutchAnalysis
Construct a query parser for the text in a reader.
peekMin() - Method in class net.nutch.util.FibonacciHeap
Returns the same Object that FibonacciHeap.popMin() would, without removing it.
pendingTransfers(DatanodeInfo) - Method in class net.nutch.fs.FSNamesystem
Return with a list of Block/DataNodeInfo sets, indicating where various Blocks should be copied, ASAP.
phrase(String) - Method in class net.nutch.analysis.NutchAnalysis
Parse an explcitly quoted phrase query.
popMin() - Method in class net.nutch.util.FibonacciHeap
Returns the object which has the lowest priority in the heap.
prevCharIsCR - Variable in class net.nutch.quality.dynamic.SimpleCharStream
 
prevCharIsLF - Variable in class net.nutch.quality.dynamic.SimpleCharStream
 
printStatus() - Method in class net.nutch.db.WebDBInjector
Utility to present performance stats
printStatusBar(int, int) - Method in class net.nutch.db.WebDBInjector
Utility to present small status bar
processReport(Block[], UTF8) - Method in class net.nutch.fs.FSNamesystem
The given node is reporting all its blocks.
protocolCommandSent(ProtocolCommandEvent) - Method in class net.nutch.protocol.ftp.PrintCommandListener
 
protocolReplyReceived(ProtocolCommandEvent) - Method in class net.nutch.protocol.ftp.PrintCommandListener
 
purgeQueuedKeys() - Method in class net.nutch.util.SoftHashMap
 
put(NutchFile, File, boolean) - Method in interface net.nutch.util.NutchFileSystem
Associates a NutchFile with a given real-fs File.
put(NutchFile, File, boolean) - Method in class net.nutch.util.NutchGenericFileSystem
Add a single file or a directory of files to the filesystem.
put(Object, Object) - Method in class net.nutch.util.SoftHashMap
Associates the specified value with the specified key in this map.

Q

QUOTE - Static variable in interface net.nutch.analysis.NutchAnalysisConstants
 
QUOTED_VALUE - Static variable in interface net.nutch.quality.dynamic.PageDescriptionConstants
 
Query - class net.nutch.searcher.Query.
A Nutch query.
Query() - Constructor for class net.nutch.searcher.Query
 
Query.Clause - class net.nutch.searcher.Query.Clause.
A query clause.
Query.Clause(Query.Term, String, boolean, boolean) - Constructor for class net.nutch.searcher.Query.Clause
 
Query.Clause(Query.Term, boolean, boolean) - Constructor for class net.nutch.searcher.Query.Clause
 
Query.Clause(Query.Phrase, String, boolean, boolean) - Constructor for class net.nutch.searcher.Query.Clause
 
Query.Clause(Query.Phrase, boolean, boolean) - Constructor for class net.nutch.searcher.Query.Clause
 
Query.Phrase - class net.nutch.searcher.Query.Phrase.
A phrase query clause.
Query.Phrase(Query.Term[]) - Constructor for class net.nutch.searcher.Query.Phrase
 
Query.Phrase(String[]) - Constructor for class net.nutch.searcher.Query.Phrase
 
Query.Term - class net.nutch.searcher.Query.Term.
A single-term query clause.
Query.Term(String) - Constructor for class net.nutch.searcher.Query.Term
 
QueryException - exception net.nutch.searcher.QueryException.
 
QueryException(String) - Constructor for class net.nutch.searcher.QueryException
 
QueryFilter - interface net.nutch.searcher.QueryFilter.
Extension point for query translation.
QueryFilters - class net.nutch.searcher.QueryFilters.
Creates and caches QueryFilter implementing plugins.
queueKeyForDeletion(Object) - Method in class net.nutch.util.SoftHashMap
 

R

RETRY - Static variable in class net.nutch.fetcher.FetcherOutput
 
RUNLENGTH_ENCODING - Static variable in interface net.nutch.fs.FSConstants
 
RawFieldQueryFilter - class net.nutch.searcher.RawFieldQueryFilter.
Translate raw query fields to search the same-named field, as indexed by an IndexingFilter.
RawFieldQueryFilter(String) - Constructor for class net.nutch.searcher.RawFieldQueryFilter
Construct for the named field, lowercasing query values.
RawFieldQueryFilter(String, boolean) - Constructor for class net.nutch.searcher.RawFieldQueryFilter
Construct for the named field, potentially lowercasing query values.
ReInit(CharStream) - Method in class net.nutch.analysis.NutchAnalysis
 
ReInit(NutchAnalysisTokenManager) - Method in class net.nutch.analysis.NutchAnalysis
 
ReInit(CharStream) - Method in class net.nutch.analysis.NutchAnalysisTokenManager
 
ReInit(CharStream, int) - Method in class net.nutch.analysis.NutchAnalysisTokenManager
 
ReInit(InputStream) - Method in class net.nutch.quality.dynamic.PageDescription
 
ReInit(Reader) - Method in class net.nutch.quality.dynamic.PageDescription
 
ReInit(PageDescriptionTokenManager) - Method in class net.nutch.quality.dynamic.PageDescription
 
ReInit(SimpleCharStream) - Method in class net.nutch.quality.dynamic.PageDescriptionTokenManager
 
ReInit(SimpleCharStream, int) - Method in class net.nutch.quality.dynamic.PageDescriptionTokenManager
 
ReInit(Reader, int, int, int) - Method in class net.nutch.quality.dynamic.SimpleCharStream
 
ReInit(Reader, int, int) - Method in class net.nutch.quality.dynamic.SimpleCharStream
 
ReInit(Reader) - Method in class net.nutch.quality.dynamic.SimpleCharStream
 
ReInit(InputStream, int, int, int) - Method in class net.nutch.quality.dynamic.SimpleCharStream
 
ReInit(InputStream) - Method in class net.nutch.quality.dynamic.SimpleCharStream
 
ReInit(InputStream, int, int) - Method in class net.nutch.quality.dynamic.SimpleCharStream
 
RegexURLFilter - class net.nutch.net.RegexURLFilter.
Filters URLs based on a file of regular expressions.
RegexURLFilter() - Constructor for class net.nutch.net.RegexURLFilter
 
RegexURLFilter(String) - Constructor for class net.nutch.net.RegexURLFilter
 
ResourceGone - exception net.nutch.protocol.ResourceGone.
Thrown by Protocol.getContent(String) when a URL is invalid.
ResourceGone(URL, String) - Constructor for class net.nutch.protocol.ResourceGone
 
ResourceMoved - exception net.nutch.protocol.ResourceMoved.
Thrown by Protocol.getContent(String) when a URL no longer exists.
ResourceMoved(URL, URL, String) - Constructor for class net.nutch.protocol.ResourceMoved
 
Response - interface net.nutch.net.protocols.Response.
A response inteface.
RetryLater - exception net.nutch.protocol.RetryLater.
Thrown by Protocol.getContent(String) when a URL should be retried later.
RetryLater(URL, String) - Constructor for class net.nutch.protocol.RetryLater
 
RobotRulesParser - class net.nutch.protocol.http.RobotRulesParser.
This class handles the parsing of robots.txt files.
RobotRulesParser() - Constructor for class net.nutch.protocol.http.RobotRulesParser
 
RobotRulesParser(String[]) - Constructor for class net.nutch.protocol.http.RobotRulesParser
Creates a new RobotRulesParser which will use the supplied robotNames when choosing which stanza to follow in robots.txt files.
RobotRulesParser.RobotRuleSet - class net.nutch.protocol.http.RobotRulesParser.RobotRuleSet.
This class holds the rules which were parsed from a robots.txt file, and can test paths against those rules.
RobotsMetaProcessor - class net.nutch.parse.html.RobotsMetaProcessor.
Class for parsing META Directives from DOM trees.
RobotsMetaProcessor() - Constructor for class net.nutch.parse.html.RobotsMetaProcessor
 
RobotsMetaProcessor.RobotsMetaIndicator - class net.nutch.parse.html.RobotsMetaProcessor.RobotsMetaIndicator.
Utility class with indicators for the robots directives "noindex" and "nofollow", and HTTP-EQUIV/no-cache
RobotsMetaProcessor.RobotsMetaIndicator() - Constructor for class net.nutch.parse.html.RobotsMetaProcessor.RobotsMetaIndicator
 
read(DataInput) - Static method in class net.nutch.db.Link
 
read(DataInput) - Static method in class net.nutch.db.Page
 
read(DataInput) - Static method in class net.nutch.fetcher.FetcherOutput
 
read(DataInput) - Static method in class net.nutch.io.MD5Hash
Constructs, reads and returns an instance.
read(DataInput) - Static method in class net.nutch.linkdb.LinkAnalysisEntry
 
read(DataInput) - Static method in class net.nutch.pagedb.FetchListEntry
 
read(DataInput) - Static method in class net.nutch.parse.Outlink
 
read(DataInput) - Static method in class net.nutch.parse.ParseData
 
read(DataInput) - Static method in class net.nutch.parse.ParseText
 
read(DataInput) - Static method in class net.nutch.protocol.Content
 
read(DataInput) - Static method in class net.nutch.searcher.HitDetails
Constructs, reads and returns an instance.
read(DataInput) - Static method in class net.nutch.searcher.Query.Clause
 
read(DataInput) - Static method in class net.nutch.searcher.Query.Phrase
 
read(DataInput) - Static method in class net.nutch.searcher.Query.Term
 
read(DataInput) - Static method in class net.nutch.searcher.Query
 
readChar() - Method in class net.nutch.quality.dynamic.SimpleCharStream
 
readCompressedByteArray(DataInput) - Static method in class net.nutch.io.WritableUtils
 
readCompressedString(DataInput) - Static method in class net.nutch.io.WritableUtils
 
readFields(DataInput) - Method in class net.nutch.db.DistributedWebDBWriter.LinkInstruction
 
readFields(DataInput) - Method in class net.nutch.db.DistributedWebDBWriter.PageInstruction
 
readFields(DataInput) - Method in class net.nutch.db.Link
Read in fields from a bytestream
readFields(DataInput) - Method in class net.nutch.db.Page
 
readFields(DataInput) - Method in class net.nutch.db.WebDBWriter.LinkInstruction
 
readFields(DataInput) - Method in class net.nutch.db.WebDBWriter.PageInstruction
 
readFields(DataInput) - Method in class net.nutch.fetcher.FetcherOutput
 
readFields(DataInput) - Method in class net.nutch.fs.Block
 
readFields(DataInput) - Method in class net.nutch.fs.DatanodeInfo
 
readFields(DataInput) - Method in class net.nutch.fs.FSParam
Deserialize the opcode and the args
readFields(DataInput) - Method in class net.nutch.fs.FSResults
 
readFields(DataInput) - Method in class net.nutch.fs.HeartbeatData
 
readFields(DataInput) - Method in class net.nutch.indexer.DeleteDuplicates.IndexedDoc
 
readFields(DataInput) - Method in class net.nutch.io.ArrayWritable
 
readFields(DataInput) - Method in class net.nutch.io.BooleanWritable
 
readFields(DataInput) - Method in class net.nutch.io.BytesWritable
 
readFields(DataInput) - Method in class net.nutch.io.IntWritable
 
readFields(DataInput) - Method in class net.nutch.io.LongWritable
 
readFields(DataInput) - Method in class net.nutch.io.MD5Hash
 
readFields(DataInput) - Method in class net.nutch.io.NullWritable
 
readFields(DataInput) - Method in class net.nutch.io.TwoDArrayWritable
 
readFields(DataInput) - Method in class net.nutch.io.UTF8
 
readFields(DataInput) - Method in class net.nutch.io.VersionedWritable
 
readFields(DataInput) - Method in interface net.nutch.io.Writable
Reads the fields of this object from in.
readFields(DataInput) - Method in class net.nutch.linkdb.LinkAnalysisEntry
 
readFields(DataInput) - Method in class net.nutch.pagedb.FetchListEntry
 
readFields(DataInput) - Method in class net.nutch.parse.Outlink
 
readFields(DataInput) - Method in class net.nutch.parse.ParseData
 
readFields(DataInput) - Method in class net.nutch.parse.ParseText
 
readFields(DataInput) - Method in class net.nutch.protocol.Content
 
readFields(DataInput) - Method in class net.nutch.searcher.DistributedSearch.Param
 
readFields(DataInput) - Method in class net.nutch.searcher.DistributedSearch.Result
 
readFields(DataInput) - Method in class net.nutch.searcher.Hit
 
readFields(DataInput) - Method in class net.nutch.searcher.HitDetails
 
readFields(DataInput) - Method in class net.nutch.searcher.Hits
 
readFields(DataInput) - Method in class net.nutch.searcher.Query
 
readFields(DataInput) - Method in class net.nutch.tools.FetchListTool.SortableScore
 
readFloat(byte[], int) - Static method in class net.nutch.io.WritableComparator
Parse a float from a byte array.
readInt(byte[], int) - Static method in class net.nutch.io.WritableComparator
Parse an integer from a byte array.
readLong(byte[], int) - Static method in class net.nutch.io.WritableComparator
Parse a long from a byte array.
readString(DataInput) - Static method in class net.nutch.io.UTF8
Read a UTF-8 encoded string.
readString(DataInput) - Static method in class net.nutch.io.WritableUtils
 
readStringArray(DataInput) - Static method in class net.nutch.io.WritableUtils
 
readUnsignedShort(byte[], int) - Static method in class net.nutch.io.WritableComparator
Parse an unsigned short from a byte array.
recentlyInvalidBlocks(UTF8) - Method in class net.nutch.fs.FSNamesystem
Return with a list of Blocks that should be invalidated at the given node.
recursiveCopy(File, File) - Static method in class net.nutch.util.FileUtil
Copy a file and/or directory and all its contents (whether data or other files/dirs)
release(NutchFile) - Method in interface net.nutch.util.NutchFileSystem
Release the lock.
release(NutchFile) - Method in class net.nutch.util.NutchGenericFileSystem
Release the lock for the given NutchFile
release(String, String, String) - Method in class net.nutch.util.NutchGenericFileSystem
 
release(String, String, String) - Method in class net.nutch.util.NutchNFSFileSystem
Release the lock for the given NutchFile
release(String, String, String) - Method in class net.nutch.util.NutchRemoteFileSystem
 
remove(Object) - Method in class net.nutch.util.SoftHashMap
 
rename(UTF8, UTF8) - Method in class net.nutch.fs.NDFSClient
Make a direct connection to namenode and manipulate structures there.
rename(String, String) - Method in class net.nutch.fs.TestClient
Rename an NDFS file
rename(String, String) - Static method in class net.nutch.io.MapFile
Renames an existing map directory.
renameFile(File, String, String, String, boolean) - Method in class net.nutch.util.NutchGenericFileSystem
 
renameFile(File, String, String, String, boolean) - Method in class net.nutch.util.NutchNFSFileSystem
Rename the existing file or dir to a new location
renameFile(File, String, String, String, boolean) - Method in class net.nutch.util.NutchRemoteFileSystem
 
renameTo(UTF8, UTF8) - Method in class net.nutch.fs.FSDirectory
Change the filename
renameTo(UTF8, UTF8) - Method in class net.nutch.fs.FSNamesystem
Change the indicated filename.
renameTo(NutchFile, NutchFile) - Method in interface net.nutch.util.NutchFileSystem
Rename the given NutchFile to something new.
renameTo(NutchFile, NutchFile) - Method in class net.nutch.util.NutchGenericFileSystem
Rename the thing.
reset(byte[], int) - Method in class net.nutch.io.DataInputBuffer
Resets the data that the buffer reads.
reset(byte[], int, int) - Method in class net.nutch.io.DataInputBuffer
Resets the data that the buffer reads.
reset() - Method in class net.nutch.io.DataOutputBuffer
Resets the buffer to empty.
reset() - Method in class net.nutch.io.MapFile.Reader
Re-positions the reader before its first key.
reset() - Method in class net.nutch.parse.html.RobotsMetaProcessor.RobotsMetaIndicator
Sets noIndex, noFollow and noCache to false.
resolveEncodingAlias(String) - Static method in class net.nutch.util.StringUtil
 
retrieveFile(String, OutputStream, int) - Method in class net.nutch.protocol.ftp.Client
 
retrieveList(String, List, int, FTPFileEntryParser) - Method in class net.nutch.protocol.ftp.Client
 
rightPad(String, int) - Static method in class net.nutch.util.StringUtil
Returns a copy of s padded with trailing spaces so that it's length is length.
root - Variable in class net.nutch.util.TrieStringMatcher
 
run() - Method in class net.nutch.fetcher.Fetcher
Runs the fetcher.
run() - Method in class net.nutch.tools.SegmentMergeTool
 

S

SIGRAM - Static variable in interface net.nutch.analysis.NutchAnalysisConstants
 
SLASH - Static variable in interface net.nutch.analysis.NutchAnalysisConstants
 
SUCCESS - Static variable in class net.nutch.fetcher.FetcherOutput
 
SYSTEM_STARTUP_PERIOD - Static variable in interface net.nutch.fs.FSConstants
 
ScoreStats - class net.nutch.util.ScoreStats.
When we generate a fetchlist, we need to choose a "cutoff" score, such that any scores above that cutoff will be included in the fetchlist.
ScoreStats() - Constructor for class net.nutch.util.ScoreStats
 
Searcher - interface net.nutch.searcher.Searcher.
Service that searches.
SegmentMergeTool - class net.nutch.tools.SegmentMergeTool.
This class cleans up accumulated segments data, and merges them into a single segment, with no duplicates in it.
SegmentMergeTool(String, String, String, boolean, boolean, boolean, boolean) - Constructor for class net.nutch.tools.SegmentMergeTool
 
SequenceFile - class net.nutch.io.SequenceFile.
Support for flat files of binary key/value pairs.
SequenceFile.Reader - class net.nutch.io.SequenceFile.Reader.
Writes key/value pairs from a sequence-format file.
SequenceFile.Reader(String) - Constructor for class net.nutch.io.SequenceFile.Reader
Open the named file.
SequenceFile.Sorter - class net.nutch.io.SequenceFile.Sorter.
Sorts key/value pairs in a sequence-format file.
SequenceFile.Sorter(Class, Class) - Constructor for class net.nutch.io.SequenceFile.Sorter
Sort and merge files containing the named classes.
SequenceFile.Sorter(WritableComparator, Class) - Constructor for class net.nutch.io.SequenceFile.Sorter
Sort and merge using an arbitrary WritableComparator.
SequenceFile.Writer - class net.nutch.io.SequenceFile.Writer.
Write key/value pairs to a sequence-format file.
SequenceFile.Writer(String, Class, Class) - Constructor for class net.nutch.io.SequenceFile.Writer
Create the named file.
Server - class net.nutch.ipc.Server.
An abstract IPC service.
Server(int, Class, int) - Constructor for class net.nutch.ipc.Server
Constructs a server listening on the named port.
SetFile - class net.nutch.io.SetFile.
A file-based set of keys.
SetFile() - Constructor for class net.nutch.io.SetFile
 
SetFile.Reader - class net.nutch.io.SetFile.Reader.
Provide access to an existing set file.
SetFile.Reader(String) - Constructor for class net.nutch.io.SetFile.Reader
Construct a set reader for the named set.
SetFile.Reader(String, WritableComparator) - Constructor for class net.nutch.io.SetFile.Reader
Construct a set reader for the named set using the named comparator.
SetFile.Writer - class net.nutch.io.SetFile.Writer.
Write a new set file.
SetFile.Writer(String, Class) - Constructor for class net.nutch.io.SetFile.Writer
Create the named set for keys of the named class.
SetFile.Writer(String, WritableComparator) - Constructor for class net.nutch.io.SetFile.Writer
Create the named set using the named key comparator.
ShareGroup - class net.nutch.util.ShareGroup.
A ShareGroup combines the name of a group with where the Nutch filesystem can find members of that group.
ShareGroup(String, String) - Constructor for class net.nutch.util.ShareGroup
Make a named ShareGroup, to be found at the given location.
ShareGroup(String) - Constructor for class net.nutch.util.ShareGroup
Create a ShareGroup as above, but assume the location description can be found via NutchConf.
ShareSet - class net.nutch.util.ShareSet.
A ShareSet is a library of ShareGroup objects.
ShareSet(File, Vector) - Constructor for class net.nutch.util.ShareSet
Build a ShareSet out of a Vector of ShareGroup objects.
ShareSet(File) - Constructor for class net.nutch.util.ShareSet
Default constructor.
SimpleCharStream - class net.nutch.quality.dynamic.SimpleCharStream.
An implementation of interface CharStream, where the stream is assumed to contain only ASCII characters (without unicode processing).
SimpleCharStream(Reader, int, int, int) - Constructor for class net.nutch.quality.dynamic.SimpleCharStream
 
SimpleCharStream(Reader, int, int) - Constructor for class net.nutch.quality.dynamic.SimpleCharStream
 
SimpleCharStream(Reader) - Constructor for class net.nutch.quality.dynamic.SimpleCharStream
 
SimpleCharStream(InputStream, int, int, int) - Constructor for class net.nutch.quality.dynamic.SimpleCharStream
 
SimpleCharStream(InputStream, int, int) - Constructor for class net.nutch.quality.dynamic.SimpleCharStream
 
SimpleCharStream(InputStream) - Constructor for class net.nutch.quality.dynamic.SimpleCharStream
 
SoftHashMap - class net.nutch.util.SoftHashMap.
A Map which uses SoftReferences to keep track of values.
SoftHashMap() - Constructor for class net.nutch.util.SoftHashMap
 
SoftHashMap.FinalizationListener - interface net.nutch.util.SoftHashMap.FinalizationListener.
An interface for Object which accept notification when an another Object is finalized.
SoftHashMap.FinalizationNotifier - interface net.nutch.util.SoftHashMap.FinalizationNotifier.
An interface for a Objects which can notify an object when they are finalized.
StringUtil - class net.nutch.util.StringUtil.
A collection of String processing utility methods.
StringUtil() - Constructor for class net.nutch.util.StringUtil
 
SuffixStringMatcher - class net.nutch.util.SuffixStringMatcher.
A class for efficiently matching Strings against a set of suffixes.
SuffixStringMatcher(String[]) - Constructor for class net.nutch.util.SuffixStringMatcher
Creates a new PrefixStringMatcher which will match Strings with any suffix in the supplied array.
SuffixStringMatcher(Collection) - Constructor for class net.nutch.util.SuffixStringMatcher
Creates a new PrefixStringMatcher which will match Strings with any suffix in the supplied Collection
Summarizer - class net.nutch.searcher.Summarizer.
Implements hit summarization.
Summarizer() - Constructor for class net.nutch.searcher.Summarizer
 
Summary - class net.nutch.searcher.Summary.
A document summary dynamically generated to match a query.
Summary() - Constructor for class net.nutch.searcher.Summary
Constructs an empty Summary.
Summary.Ellipsis - class net.nutch.searcher.Summary.Ellipsis.
An ellipsis fragment within a summary.
Summary.Ellipsis() - Constructor for class net.nutch.searcher.Summary.Ellipsis
Constructs an ellipsis fragment for the given text.
Summary.Fragment - class net.nutch.searcher.Summary.Fragment.
A fragment of text within a summary.
Summary.Fragment(String) - Constructor for class net.nutch.searcher.Summary.Fragment
Constructs a fragment for the given text.
Summary.Highlight - class net.nutch.searcher.Summary.Highlight.
A highlighted fragment of text within a summary.
Summary.Highlight(String) - Constructor for class net.nutch.searcher.Summary.Highlight
Constructs a highlighted fragment for the given text.
SwitchTo(int) - Method in class net.nutch.analysis.NutchAnalysisTokenManager
 
SwitchTo(int) - Method in class net.nutch.quality.dynamic.PageDescriptionTokenManager
 
save(OutputStream) - Method in class net.nutch.analysis.lang.NGramProfile
Writes ngram profile into OutputStream
scoreDump() - Method in class net.nutch.tools.WebDBAdminTool
Emit each page's score and link data
search(Query, int) - Method in class net.nutch.searcher.DistributedSearch.Client
 
search(Query, int) - Method in class net.nutch.searcher.IndexSearcher
 
search(Query, int) - Method in class net.nutch.searcher.NutchBean
 
search(Query, int, int) - Method in class net.nutch.searcher.NutchBean
Search for pages matching a query, eliminating excessive hits from sites.
search(Query, int) - Method in interface net.nutch.searcher.Searcher
Return the top-scoring hits for a query.
second - Variable in class net.nutch.fs.FSParam
 
second - Variable in class net.nutch.fs.FSResults
 
seek(long) - Method in class net.nutch.io.ArrayFile.Reader
Positions the reader before its nth value.
seek(WritableComparable) - Method in class net.nutch.io.MapFile.Reader
Positions the reader at the named key, or if none such exists, at the first entry after the named key.
seek(long) - Method in class net.nutch.io.SequenceFile.Reader
Set the current byte position in the input file.
seek(WritableComparable) - Method in class net.nutch.io.SetFile.Reader
 
sendNoOp() - Method in class net.nutch.protocol.ftp.Client
Sends a NOOP command to the FTP server.
set(DistributedWebDBWriter.LinkInstruction) - Method in class net.nutch.db.DistributedWebDBWriter.LinkInstruction
Re-init from another LinkInstruction's info.
set(Link, int) - Method in class net.nutch.db.DistributedWebDBWriter.LinkInstruction
Re-init with a Link and an instruction
set(DistributedWebDBWriter.PageInstruction) - Method in class net.nutch.db.DistributedWebDBWriter.PageInstruction
Init from another PageInstruction object.
set(Page, int) - Method in class net.nutch.db.DistributedWebDBWriter.PageInstruction
Init PageInstruction with no Link
set(Page, Link, int) - Method in class net.nutch.db.DistributedWebDBWriter.PageInstruction
Init PageInstruction with a Link
set(Link) - Method in class net.nutch.db.Link
 
set(Page) - Method in class net.nutch.db.Page
Copy the contents of another instance into this instance.
set(WebDBWriter.LinkInstruction) - Method in class net.nutch.db.WebDBWriter.LinkInstruction
Re-init from another LinkInstruction's info.
set(Link, int) - Method in class net.nutch.db.WebDBWriter.LinkInstruction
Re-init with a Link and an instruction
set(WebDBWriter.PageInstruction) - Method in class net.nutch.db.WebDBWriter.PageInstruction
Init from another PageInstruction object.
set(Page, int) - Method in class net.nutch.db.WebDBWriter.PageInstruction
Init PageInstruction with no Link
set(Page, Link, int) - Method in class net.nutch.db.WebDBWriter.PageInstruction
Init PageInstruction with a Link
set(Writable[]) - Method in class net.nutch.io.ArrayWritable
 
set(boolean) - Method in class net.nutch.io.BooleanWritable
Set the value of the BooleanWritable
set(int) - Method in class net.nutch.io.IntWritable
Set the value of this IntWritable.
set(long) - Method in class net.nutch.io.LongWritable
Set the value of this LongWritable.
set(MD5Hash) - Method in class net.nutch.io.MD5Hash
Copy the contents of another instance into this instance.
set(Writable[][]) - Method in class net.nutch.io.TwoDArrayWritable
 
set(String) - Method in class net.nutch.io.UTF8
Set to contain the contents of a string.
set(UTF8) - Method in class net.nutch.io.UTF8
Set to contain the contents of a string.
set(float) - Method in class net.nutch.tools.FetchListTool.SortableScore
 
setBaseHref(URL) - Method in class net.nutch.parse.html.RobotsMetaProcessor.RobotsMetaIndicator
Sets the baseHref.
setClazz(String) - Method in class net.nutch.plugin.Extension
Sets the Class that implement the concret extension and is only used until model creation at system start up.
setCommand(String) - Method in class net.nutch.util.CommandRunner
 
setContent(byte[]) - Method in class net.nutch.protocol.Content
 
setContentType(String) - Method in class net.nutch.protocol.Content
 
setDataTimeout(int) - Method in class net.nutch.protocol.ftp.Client
Sets the timeout in milliseconds to use for data connection.
setDebugStream(PrintStream) - Method in class net.nutch.analysis.NutchAnalysisTokenManager
 
setDebugStream(PrintStream) - Method in class net.nutch.quality.dynamic.PageDescriptionTokenManager
 
setDestroyOnTimeout(boolean) - Method in class net.nutch.util.CommandRunner
 
setDigest(String) - Method in class net.nutch.io.MD5Hash
Sets the digest value from a hex string.
setDiscriptor(PluginDescriptor) - Method in class net.nutch.plugin.Extension
Sets the plugin descriptor and is only used until model creation at system start up.
setExpireTime(long) - Method in class net.nutch.protocol.http.RobotRulesParser.RobotRuleSet
Change when the ruleset goes stale.
setFactor(int) - Method in class net.nutch.io.SequenceFile.Sorter
Set the number of streams to merge at once.
setFetchDate(long) - Method in class net.nutch.fetcher.FetcherOutput
 
setFetchInterval(byte) - Method in class net.nutch.db.Page
 
setFileType(int) - Method in class net.nutch.protocol.ftp.Client
Sets the file type to be transferred.
setFollowTalk(boolean) - Method in class net.nutch.protocol.ftp.Ftp
Set followTalk
setId(String) - Method in class net.nutch.plugin.Extension
Sets the unique extension Id and is only used until model creation at system start up.
setIndexInterval(int) - Method in class net.nutch.io.MapFile.Writer
Sets the index interval.
setIndexNo(int) - Method in class net.nutch.searcher.Hit
 
setInputStream(InputStream) - Method in class net.nutch.util.CommandRunner
 
setKeepConnection(boolean) - Method in class net.nutch.protocol.ftp.Ftp
Set keepConnection
setLogLevel(Level) - Static method in class net.nutch.fetcher.Fetcher
Set the logging level.
setMD5(MD5Hash) - Method in class net.nutch.db.Page
 
setMaxContentLength(int) - Method in class net.nutch.protocol.file.File
Set the point at which content is truncated.
setMaxContentLength(int) - Method in class net.nutch.protocol.ftp.Ftp
Set the point at which content is truncated.
setMemory(int) - Method in class net.nutch.io.SequenceFile.Sorter
Set the total amount of buffer memory, in bytes.
setMoreFromSiteExcluded(boolean) - Method in class net.nutch.searcher.Hit
True iff other, lower-scoring, hits from the same site have been excluded from the list which contains this hit..
setName(String) - Method in class net.nutch.analysis.lang.NGramProfile
 
setName(Class, String) - Static method in class net.nutch.io.WritableName
Set the name that a class should be known as to something other than the class name.
setNextFetchTime(long) - Method in class net.nutch.db.Page
 
setNoCache() - Method in class net.nutch.parse.html.RobotsMetaProcessor.RobotsMetaIndicator
Sets noCache to true.
setNoFollow() - Method in class net.nutch.parse.html.RobotsMetaProcessor.RobotsMetaIndicator
Sets noFollow to true.
setNoIndex() - Method in class net.nutch.parse.html.RobotsMetaProcessor.RobotsMetaIndicator
Sets noIndex to true.
setNumOutlinks(int) - Method in class net.nutch.db.Page
 
setRemoteVerificationEnabled(boolean) - Method in class net.nutch.protocol.ftp.Client
Enable or disable verification that the remote host taking part of a data connection is the same as the host to which the control connection is attached.
setRetriesSinceFetch(int) - Method in class net.nutch.db.Page
 
setScore(float, float) - Method in class net.nutch.db.Page
 
setScore(float) - Method in class net.nutch.linkdb.LinkAnalysisEntry
 
setScorePower(float) - Method in class net.nutch.indexer.IndexSegment
Determines the power of link analyis scores.
setShowThreadIDs(boolean) - Static method in class net.nutch.util.LogFormatter
When set true, thread IDs are logged.
setStdErrorStream(OutputStream) - Method in class net.nutch.util.CommandRunner
 
setStdOutputStream(OutputStream) - Method in class net.nutch.util.CommandRunner
 
setTargetHasOutlink(boolean) - Method in class net.nutch.db.Link
 
setThreadCount(int) - Method in class net.nutch.fetcher.Fetcher
Set thread count
setTimeout(int) - Method in class net.nutch.ipc.Client
Sets the timeout used for network i/o.
setTimeout(int) - Method in class net.nutch.ipc.Server
Sets the timeout used for network i/o.
setTimeout(int) - Method in class net.nutch.protocol.ftp.Ftp
Set the timeout.
setTimeout(int) - Method in class net.nutch.util.CommandRunner
 
setTotalIsExact(boolean) - Method in class net.nutch.searcher.Hits
Set Hits.totalIsExact().
setURL(String) - Method in class net.nutch.db.Page
 
setWaitForExit(boolean) - Method in class net.nutch.util.CommandRunner
 
setWeight(float) - Method in class net.nutch.searcher.Query.Clause
 
shortestMatch(String) - Method in class net.nutch.util.PrefixStringMatcher
Returns the shortest prefix of input that is matched, or null if no match exists.
shortestMatch(String) - Method in class net.nutch.util.SuffixStringMatcher
Returns the shortest suffix of input that is matched, or null if no match exists.
shortestMatch(String) - Method in class net.nutch.util.TrieStringMatcher
Returns the shortest substring of input that is matched by a pattern in the trie, or null if no match exists.
shutDown() - Method in class net.nutch.plugin.Plugin
Shutdown the plugin.
shutdown() - Method in class net.nutch.util.ThreadPool
Turn off the pool.
size() - Method in class net.nutch.util.FibonacciHeap
Returns the number of objects in the heap.
size() - Method in class net.nutch.util.SoftHashMap
 
skip(DataInput) - Static method in class net.nutch.io.UTF8
Skips over one UTF8 in the input.
skip(DataInput) - Static method in class net.nutch.parse.Outlink
Skips over one Outlink in the input.
sort(String, String) - Method in class net.nutch.io.SequenceFile.Sorter
Perform a file sort.
sort() - Method in class net.nutch.tools.DumpSegment
 
specialConstructor - Variable in class net.nutch.quality.dynamic.ParseException
This variable determines which constructor was used to create this object and thereby affects the semantics of the "getMessage" method (see below).
specialToken - Variable in class net.nutch.quality.dynamic.Token
This field is used to access special tokens that occur prior to this token, but after the immediately preceding regular (non-special) token.
start() - Method in class net.nutch.ipc.Server
Starts the service.
startBlock(Block) - Method in class net.nutch.fs.FSDataset
A Block b will be coming soon!
startFile(UTF8) - Method in class net.nutch.fs.FSNamesystem
The client would like to create a new block for the indicated filename.
startUp() - Method in class net.nutch.plugin.Plugin
Will be invoked until plugin start up.
staticFlag - Static variable in class net.nutch.quality.dynamic.SimpleCharStream
 
status() - Method in class net.nutch.fetcher.Fetcher
Display the status of the fetcher run.
stop() - Method in class net.nutch.ipc.Client
Stop all threads related to this client.
stop() - Method in class net.nutch.ipc.Server
Stops the service.
success() - Method in class net.nutch.fs.FSResults
Whether the call worked.

T

TestClient - class net.nutch.fs.TestClient.
This class tests the NutchFS system.
TestClient(InetSocketAddress) - Constructor for class net.nutch.fs.TestClient
 
TextParser - class net.nutch.parse.text.TextParser.
 
TextParser() - Constructor for class net.nutch.parse.text.TextParser
 
ThreadPool - class net.nutch.util.ThreadPool.
ThreadPool.java ThreadPool maintains a large set of threads, which can be dedicated to a certain task, and then recycled.
ThreadPool(int) - Constructor for class net.nutch.util.ThreadPool
Creates a pool of numThreads size.
Token - class net.nutch.quality.dynamic.Token.
Describes the input token stream.
Token() - Constructor for class net.nutch.quality.dynamic.Token
 
TokenMgrError - error net.nutch.quality.dynamic.TokenMgrError.
 
TokenMgrError() - Constructor for class net.nutch.quality.dynamic.TokenMgrError
 
TokenMgrError(String, int) - Constructor for class net.nutch.quality.dynamic.TokenMgrError
 
TokenMgrError(boolean, int, int, int, String, char, int) - Constructor for class net.nutch.quality.dynamic.TokenMgrError
 
TrieStringMatcher - class net.nutch.util.TrieStringMatcher.
TrieStringMatcher is a base class for simple tree-based string matching.
TrieStringMatcher() - Constructor for class net.nutch.util.TrieStringMatcher
 
TrieStringMatcher.TrieNode - class net.nutch.util.TrieStringMatcher.TrieNode.
Node class for the character tree.
TwoDArrayWritable - class net.nutch.io.TwoDArrayWritable.
A Writable for 2D arrays containing a matrix of instances of a class.
TwoDArrayWritable(Class) - Constructor for class net.nutch.io.TwoDArrayWritable
 
TwoDArrayWritable(Class, Writable[][]) - Constructor for class net.nutch.io.TwoDArrayWritable
 
targetHasOutlink() - Method in class net.nutch.db.Link
 
term() - Method in class net.nutch.analysis.NutchAnalysis
Parse a single term.
terminal - Variable in class net.nutch.util.TrieStringMatcher.TrieNode
 
textDump(String) - Method in class net.nutch.tools.WebDBAdminTool
Emit the webdb to 2 text files.
toArray() - Method in class net.nutch.io.ArrayWritable
 
toArray() - Method in class net.nutch.io.TwoDArrayWritable
 
toContent() - Method in class net.nutch.protocol.file.FileResponse
 
toContent() - Method in class net.nutch.protocol.ftp.FtpResponse
 
toContent() - Method in class net.nutch.protocol.http.HttpResponse
 
toDate(String) - Static method in class net.nutch.net.protocols.HttpDateFormat
 
toHtml() - Method in class net.nutch.searcher.HitDetails
Display as HTML.
toLong(String) - Static method in class net.nutch.net.protocols.HttpDateFormat
 
toString() - Method in class net.nutch.analysis.lang.NGramProfile
textual representation of this ngramprofile
toString() - Method in class net.nutch.db.Link
Print out the record
toString() - Method in class net.nutch.db.Page
Print out the Page
toString() - Method in class net.nutch.fetcher.FetcherOutput
 
toString() - Method in class net.nutch.fs.Block
 
toString() - Method in class net.nutch.fs.DatanodeInfo
 
toString() - Method in class net.nutch.io.IntWritable
 
toString() - Method in class net.nutch.io.LongWritable
 
toString() - Method in class net.nutch.io.MD5Hash
Returns a string representation of this object.
toString() - Method in class net.nutch.io.SequenceFile.Reader
Returns the name of the file.
toString() - Method in class net.nutch.io.UTF8
Convert to a String.
toString() - Method in class net.nutch.io.VersionMismatchException
Returns a string representation of this object.
toString(Date) - Static method in class net.nutch.net.protocols.HttpDateFormat
Get the HTTP format of the specified date.
toString(Calendar) - Static method in class net.nutch.net.protocols.HttpDateFormat
 
toString(long) - Static method in class net.nutch.net.protocols.HttpDateFormat
 
toString() - Method in class net.nutch.pagedb.FetchListEntry
 
toString() - Method in class net.nutch.parse.Outlink
 
toString() - Method in class net.nutch.parse.ParseData
 
toString() - Method in class net.nutch.parse.ParseText
 
toString() - Method in class net.nutch.parse.html.DOMContentUtils.LinkParams
 
toString() - Method in class net.nutch.parse.msword.WordTextBuffer
 
toString() - Method in class net.nutch.protocol.Content
 
toString() - Method in class net.nutch.protocol.http.RobotRulesParser.RobotRuleSet
 
toString() - Method in class net.nutch.quality.dynamic.Token
Returns the image.
toString() - Method in class net.nutch.searcher.Hit
Display as a string.
toString() - Method in class net.nutch.searcher.HitDetails
Display as a string.
toString() - Method in class net.nutch.searcher.Query.Clause
 
toString() - Method in class net.nutch.searcher.Query.Phrase
 
toString() - Method in class net.nutch.searcher.Query.Term
 
toString() - Method in class net.nutch.searcher.Query
 
toString() - Method in class net.nutch.searcher.Summary.Ellipsis
Returns an HTML representation of this fragment.
toString() - Method in class net.nutch.searcher.Summary.Fragment
Returns an HTML representation of this fragment.
toString() - Method in class net.nutch.searcher.Summary.Highlight
Returns an HTML representation of this fragment.
toString() - Method in class net.nutch.searcher.Summary
Returns an HTML representation of this fragment.
toString() - Method in class net.nutch.util.NutchFile
 
toStrings() - Method in class net.nutch.io.ArrayWritable
 
toTabbedString() - Method in class net.nutch.db.Link
Get a tab-delimited version of the text data.
toTabbedString() - Method in class net.nutch.db.Page
A tab-delimited text version of the Page's data.
token - Variable in class net.nutch.analysis.NutchAnalysis
 
token - Variable in class net.nutch.quality.dynamic.PageDescription
 
tokenImage - Static variable in interface net.nutch.analysis.NutchAnalysisConstants
 
tokenImage - Static variable in interface net.nutch.quality.dynamic.PageDescriptionConstants
 
tokenImage - Variable in class net.nutch.quality.dynamic.ParseException
This is a reference to the "tokenImage" array of the generated parser within which the parse error occurred.
tokenStream(String, Reader) - Method in class net.nutch.analysis.NutchDocumentAnalyzer
Returns a new token stream for text from the named field.
token_source - Variable in class net.nutch.analysis.NutchAnalysis
 
token_source - Variable in class net.nutch.quality.dynamic.PageDescription
 
totalIsExact() - Method in class net.nutch.searcher.Hits
True if Hits.getTotal() gives the exact number of hits, or false if it is only an estimate of the total number of hits.
tryagain() - Method in class net.nutch.fs.FSResults
Whether the client should give it another shot

U

UNQUOTED_VALUE - Static variable in interface net.nutch.quality.dynamic.PageDescriptionConstants
 
URLFilter - interface net.nutch.net.URLFilter.
Interface used to limit which URLs enter Nutch.
URLFilterFactory - class net.nutch.net.URLFilterFactory.
Factory to create a URLFilter from "urlfilter.class" config property.
URL_KEYSPACE - Static variable in class net.nutch.db.EditSectionGroupWriter
 
URL_KEYSPACE_DIVIDERS - Static variable in class net.nutch.db.DBKeyDivision
 
UTF8 - class net.nutch.io.UTF8.
A WritableComparable for strings that uses the UTF8 encoding.
UTF8() - Constructor for class net.nutch.io.UTF8
 
UTF8(String) - Constructor for class net.nutch.io.UTF8
Construct from a given string.
UTF8(UTF8) - Constructor for class net.nutch.io.UTF8
Construct from a given string.
UTF8.Comparator - class net.nutch.io.UTF8.Comparator.
A WritableComparator optimized for UTF8 keys.
UTF8.Comparator() - Constructor for class net.nutch.io.UTF8.Comparator
 
UpdateDatabaseTool - class net.nutch.tools.UpdateDatabaseTool.
This class takes the output of the fetcher and updates the page and link DBs accordingly.
UpdateDatabaseTool(IWebDBWriter, boolean, int) - Constructor for class net.nutch.tools.UpdateDatabaseTool
Take in the WebDBWriter, instantiated elsewhere.
UpdateLineColumn(char) - Method in class net.nutch.quality.dynamic.SimpleCharStream
 
UrlNormalizer - class net.nutch.net.UrlNormalizer.
Converts URLs to a normal form .
UrlNormalizer() - Constructor for class net.nutch.net.UrlNormalizer
 
unzip(byte[]) - Static method in class net.nutch.util.GZIPUtils
Returns an gunzipped copy of the input array.
unzipBestEffort(byte[]) - Static method in class net.nutch.util.GZIPUtils
Returns an gunzipped copy of the input array.
unzipBestEffort(byte[], int) - Static method in class net.nutch.util.GZIPUtils
Returns an gunzipped copy of the input array, truncated to sizeLimit bytes, if necessary.
updateBlocks(Block[]) - Method in class net.nutch.fs.DatanodeInfo
 
updateForSegment(String) - Method in class net.nutch.tools.UpdateDatabaseTool
Iterate through items in the FetcherOutput.
updateHeartbeat(long, long) - Method in class net.nutch.fs.DatanodeInfo
 
urlCompare(Object) - Method in class net.nutch.db.Link
Compare URLs, then compare MD5s.

V

VersionMismatchException - exception net.nutch.io.VersionMismatchException.
Thrown by VersionedWritable.readFields(DataInput) when the version of an object being read does not match the current implementation version as returned by VersionedWritable.getVersion().
VersionMismatchException(byte, byte) - Constructor for class net.nutch.io.VersionMismatchException
 
VersionedWritable - class net.nutch.io.VersionedWritable.
A base class for Writables that provides version checking.
VersionedWritable() - Constructor for class net.nutch.io.VersionedWritable
 
value() - Method in class net.nutch.quality.dynamic.PageDescription
 
values() - Method in class net.nutch.util.SoftHashMap
Not Implemented

W

WHITE - Static variable in interface net.nutch.analysis.NutchAnalysisConstants
 
WORD - Static variable in interface net.nutch.analysis.NutchAnalysisConstants
 
WORD_PUNCT - Static variable in interface net.nutch.analysis.NutchAnalysisConstants
 
WRITE_METAINFO_PREFIX - Static variable in class net.nutch.db.EditSectionWriter
 
WebDBAdminTool - class net.nutch.tools.WebDBAdminTool.
The WebDBAdminTool is for Nutch administrators who need special access to the webdb.
WebDBAdminTool(IWebDBReader) - Constructor for class net.nutch.tools.WebDBAdminTool
 
WebDBInjector - class net.nutch.db.WebDBInjector.
This class takes a flat file of URLs and adds them as entries into a pagedb.
WebDBInjector(IWebDBWriter) - Constructor for class net.nutch.db.WebDBInjector
WebDBInjector takes a reference to a WebDBWriter that it should add to.
WebDBReader - class net.nutch.db.WebDBReader.
The WebDBReader implements all the read-only parts of accessing our web database.
WebDBReader(File) - Constructor for class net.nutch.db.WebDBReader
Open a web db reader for the named directory.
WebDBWriter - class net.nutch.db.WebDBWriter.
This is a wrapper class that allows us to reorder write operations to the linkdb and pagedb.
WebDBWriter(File) - Constructor for class net.nutch.db.WebDBWriter
Create a WebDBWriter.
WebDBWriter.LinkInstruction - class net.nutch.db.WebDBWriter.LinkInstruction.
Holds an instruction over a Link.
WebDBWriter.LinkInstruction() - Constructor for class net.nutch.db.WebDBWriter.LinkInstruction
 
WebDBWriter.LinkInstruction(Link, int) - Constructor for class net.nutch.db.WebDBWriter.LinkInstruction
 
WebDBWriter.LinkInstruction.MD5Comparator - class net.nutch.db.WebDBWriter.LinkInstruction.MD5Comparator.
Sorts the instruction first by Md5, then by opcode.
WebDBWriter.LinkInstruction.MD5Comparator() - Constructor for class net.nutch.db.WebDBWriter.LinkInstruction.MD5Comparator
 
WebDBWriter.LinkInstruction.UrlComparator - class net.nutch.db.WebDBWriter.LinkInstruction.UrlComparator.
Sorts the instruction first by url, then by opcode.
WebDBWriter.LinkInstruction.UrlComparator() - Constructor for class net.nutch.db.WebDBWriter.LinkInstruction.UrlComparator
 
WebDBWriter.LinkInstructionWriter - class net.nutch.db.WebDBWriter.LinkInstructionWriter.
LinkInstructionWriter very efficiently writes a LinkInstruction to a SequenceFile.Writer.
WebDBWriter.LinkInstructionWriter() - Constructor for class net.nutch.db.WebDBWriter.LinkInstructionWriter
 
WebDBWriter.PageInstruction - class net.nutch.db.WebDBWriter.PageInstruction.
PageInstruction holds an operation over a Page.
WebDBWriter.PageInstruction() - Constructor for class net.nutch.db.WebDBWriter.PageInstruction
 
WebDBWriter.PageInstruction(Page, int) - Constructor for class net.nutch.db.WebDBWriter.PageInstruction
 
WebDBWriter.PageInstruction(Page, Link, int) - Constructor for class net.nutch.db.WebDBWriter.PageInstruction
 
WebDBWriter.PageInstruction.PageComparator - class net.nutch.db.WebDBWriter.PageInstruction.PageComparator.
Sorts the instruction first by Page, then by opcode.
WebDBWriter.PageInstruction.PageComparator() - Constructor for class net.nutch.db.WebDBWriter.PageInstruction.PageComparator
 
WebDBWriter.PageInstruction.UrlComparator - class net.nutch.db.WebDBWriter.PageInstruction.UrlComparator.
Sorts the instruction first by url, then by opcode.
WebDBWriter.PageInstruction.UrlComparator() - Constructor for class net.nutch.db.WebDBWriter.PageInstruction.UrlComparator
 
WebDBWriter.PageInstructionWriter - class net.nutch.db.WebDBWriter.PageInstructionWriter.
PageInstructionWriter very efficiently writes a PageInstruction to a SequenceFile.Writer.
WebDBWriter.PageInstructionWriter() - Constructor for class net.nutch.db.WebDBWriter.PageInstructionWriter
 
Word6CHPBinTable - class net.nutch.parse.msword.chp.Word6CHPBinTable.
This class holds all of the character formatting properties from a Word 6.0/95 document.
Word6CHPBinTable(byte[], int, int, int) - Constructor for class net.nutch.parse.msword.chp.Word6CHPBinTable
Constructor used to read a binTable in from a Word document.
WordExtractor - class net.nutch.parse.msword.WordExtractor.
This class extracts the text from a Word 6.0/95/97/2000/XP word doc
WordExtractor() - Constructor for class net.nutch.parse.msword.WordExtractor
Constructor
WordTextBuffer - class net.nutch.parse.msword.WordTextBuffer.
This class acts as a StringBuffer for text from a word document.
WordTextBuffer() - Constructor for class net.nutch.parse.msword.WordTextBuffer
 
Writable - interface net.nutch.io.Writable.
A simple, efficient, serialization protocol, based on DataInput and DataOutput.
WritableComparable - interface net.nutch.io.WritableComparable.
An interface which extends both Writable and Comparable.
WritableComparator - class net.nutch.io.WritableComparator.
A Comparator for WritableComparables.
WritableComparator(Class) - Constructor for class net.nutch.io.WritableComparator
Construct for a WritableComparable implementation.
WritableName - class net.nutch.io.WritableName.
Utility to permit renaming of Writable implementation classes without invalidiating files that contain their class name.
WritableUtils - class net.nutch.io.WritableUtils.
 
WritableUtils() - Constructor for class net.nutch.io.WritableUtils
 
walk(Node, URL, Properties) - Static method in class org.creativecommons.nutch.CCParseFilter.Walker
Scan the document adding attributes to metadata.
write(DataOutput) - Method in class net.nutch.db.DistributedWebDBWriter.LinkInstruction
 
write(DataOutput) - Method in class net.nutch.db.DistributedWebDBWriter.PageInstruction
 
write(DataOutput) - Method in class net.nutch.db.Link
Write bytes out to stream
write(DataOutput) - Method in class net.nutch.db.Page
Write the bytes out to the bytestream
write(DataOutput) - Method in class net.nutch.db.WebDBWriter.LinkInstruction
 
write(DataOutput) - Method in class net.nutch.db.WebDBWriter.PageInstruction
 
write(DataOutput) - Method in class net.nutch.fetcher.FetcherOutput
 
write(DataOutput) - Method in class net.nutch.fs.Block
 
write(DataOutput) - Method in class net.nutch.fs.DatanodeInfo
 
write(DataOutput) - Method in class net.nutch.fs.FSParam
 
write(DataOutput) - Method in class net.nutch.fs.FSResults
 
write(DataOutput) - Method in class net.nutch.fs.HeartbeatData
 
write(DataOutput) - Method in class net.nutch.indexer.DeleteDuplicates.IndexedDoc
 
write(DataOutput) - Method in class net.nutch.io.ArrayWritable
 
write(DataOutput) - Method in class net.nutch.io.BooleanWritable
 
write(DataOutput) - Method in class net.nutch.io.BytesWritable
 
write(DataInput, int) - Method in class net.nutch.io.DataOutputBuffer
Writes bytes from a DataInput directly into the buffer.
write(DataOutput) - Method in class net.nutch.io.IntWritable
 
write(DataOutput) - Method in class net.nutch.io.LongWritable
 
write(DataOutput) - Method in class net.nutch.io.MD5Hash
 
write(DataOutput) - Method in class net.nutch.io.NullWritable
 
write(DataOutput) - Method in class net.nutch.io.TwoDArrayWritable
 
write(DataOutput) - Method in class net.nutch.io.UTF8
 
write(DataOutput) - Method in class net.nutch.io.VersionedWritable
 
write(DataOutput) - Method in interface net.nutch.io.Writable
Writes the fields of this object to out.
write(DataOutput) - Method in class net.nutch.linkdb.LinkAnalysisEntry
 
write(DataOutput) - Method in class net.nutch.pagedb.FetchListEntry
 
write(DataOutput) - Method in class net.nutch.parse.Outlink
 
write(DataOutput) - Method in class net.nutch.parse.ParseData
 
write(DataOutput) - Method in class net.nutch.parse.ParseText
 
write(DataOutput) - Method in class net.nutch.protocol.Content
 
write(DataOutput) - Method in class net.nutch.searcher.DistributedSearch.Param
 
write(DataOutput) - Method in class net.nutch.searcher.DistributedSearch.Result
 
write(DataOutput) - Method in class net.nutch.searcher.Hit
 
write(DataOutput) - Method in class net.nutch.searcher.HitDetails
 
write(DataOutput) - Method in class net.nutch.searcher.Hits
 
write(DataOutput) - Method in class net.nutch.searcher.Query.Clause
 
write(DataOutput) - Method in class net.nutch.searcher.Query.Phrase
 
write(DataOutput) - Method in class net.nutch.searcher.Query.Term
 
write(DataOutput) - Method in class net.nutch.searcher.Query
 
write(DataOutput) - Method in class net.nutch.tools.FetchListTool.SortableScore
 
writeCompressedByteArray(DataOutput, byte[]) - Static method in class net.nutch.io.WritableUtils
 
writeCompressedString(DataOutput, String) - Static method in class net.nutch.io.WritableUtils
 
writeString(DataOutput, String) - Static method in class net.nutch.io.UTF8
Write a UTF-8 encoded string.
writeString(DataOutput, String) - Static method in class net.nutch.io.WritableUtils
 
writeStringArray(DataOutput, String[]) - Static method in class net.nutch.io.WritableUtils
 
writeToBlock(Block) - Method in class net.nutch.fs.FSDataset
Start writing to a block file

X

X_POINT_ID - Static variable in interface net.nutch.indexer.IndexingFilter
The name of the extension point.
X_POINT_ID - Static variable in interface net.nutch.parse.HtmlParseFilter
The name of the extension point.
X_POINT_ID - Static variable in interface net.nutch.parse.Parser
The name of the extension point.
X_POINT_ID - Static variable in interface net.nutch.protocol.Protocol
The name of the extension point.
X_POINT_ID - Static variable in interface net.nutch.searcher.QueryFilter
The name of the extension point.

Z

zip(byte[]) - Static method in class net.nutch.util.GZIPUtils
Returns an gzipped copy of the input array.

_

__openPassiveDataConnection(int, String) - Method in class net.nutch.protocol.ftp.Client
 

A B C D E F G H I J K L M N O P Q R S T U V W X Z _

Copyright © 2004 The Nutch Organization.