net.nutch.db
Class WebDBInjector

java.lang.Object
  extended bynet.nutch.db.WebDBInjector

public class WebDBInjector
extends Object

This class takes a flat file of URLs and adds them as entries into a pagedb. Useful for bootstrapping the system.

Author:
Mike Cafarella, Doug Cutting

Field Summary
static Logger LOG
           
 
Constructor Summary
WebDBInjector(IWebDBWriter dbWriter)
          WebDBInjector takes a reference to a WebDBWriter that it should add to.
 
Method Summary
 void close()
          Close dbWriter and save changes
 void injectDmozFile(File dmozFile, int subsetDenom, boolean includeAdult, boolean includeDmozDesc, int skew, Pattern topicPattern)
          Iterate through all the items in this structured DMOZ file.
 void injectURLFile(File urlList)
          Iterate through all the items in this flat text file and add them to the db.
static void main(String[] argv)
          Command-line access.
 void printStatus()
          Utility to present performance stats
 void printStatusBar(int small, int big)
          Utility to present small status bar
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

LOG

public static final Logger LOG
Constructor Detail

WebDBInjector

public WebDBInjector(IWebDBWriter dbWriter)
WebDBInjector takes a reference to a WebDBWriter that it should add to.

Method Detail

close

public void close()
           throws IOException
Close dbWriter and save changes

Throws:
IOException

printStatusBar

public void printStatusBar(int small,
                           int big)
Utility to present small status bar


printStatus

public void printStatus()
Utility to present performance stats


injectURLFile

public void injectURLFile(File urlList)
                   throws IOException
Iterate through all the items in this flat text file and add them to the db.

Throws:
IOException

injectDmozFile

public void injectDmozFile(File dmozFile,
                           int subsetDenom,
                           boolean includeAdult,
                           boolean includeDmozDesc,
                           int skew,
                           Pattern topicPattern)
                    throws IOException,
                           SAXException,
                           ParserConfigurationException
Iterate through all the items in this structured DMOZ file. Add each URL to the web db.

Throws:
IOException
SAXException
ParserConfigurationException

main

public static void main(String[] argv)
                 throws Exception
Command-line access. User may add URLs via a flat text file or the structured DMOZ file. By default, we ignore Adult material (as categorized by DMOZ).

Throws:
Exception


Copyright © 2004 The Nutch Organization.