net.nutch.tools
Class SegmentMergeTool

java.lang.Object
  extended bynet.nutch.tools.SegmentMergeTool

public class SegmentMergeTool
extends Object

This class cleans up accumulated segments data, and merges them into a single segment, with no duplicates in it. It uses a "master" unique index of all documents, which either must already exist (by running IndexSegment for each segment, then DeleteDuplicates, and finally IndexMerger), OR the tool can create it just before merging, including creation of per segment sub-indices as needed.

The newly created segment is then optionally indexed, so that it can be either merged with more new segments, or used for searching as it is.

The original "master" index can be optionally deleted - since it still points to the old segments the new index should be used instead. Old segments may be optionally removed as well, because all needed data has already been copied to the new merged segment.

If you use all provided functionality, you can save some manual steps in Nutch operational procedures. After you've run a couple of cycles of fetchlist generation, fetching, DB updating and analyzing, you end up with several segments, possibly containing duplicates. You may then directly run the SegmentMergerTool, with all options turned on, i.e. to first create the master unique index, merge segments into the output segment, index it, and then delete the original segments data and the master index.

Author:
Andrzej Bialecki

Field Summary
static Logger LOG
           
 
Constructor Summary
SegmentMergeTool(String segments, String output, String master, boolean createMaster, boolean runIndexer, boolean delSegs, boolean delMaster)
           
 
Method Summary
static void main(String[] args)
           
 void run()
           
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

LOG

public static final Logger LOG
Constructor Detail

SegmentMergeTool

public SegmentMergeTool(String segments,
                        String output,
                        String master,
                        boolean createMaster,
                        boolean runIndexer,
                        boolean delSegs,
                        boolean delMaster)
                 throws Exception
Method Detail

run

public void run()

main

public static void main(String[] args)
                 throws Exception
Throws:
Exception


Copyright © 2004 The Nutch Organization.