Nutch Wiki TWiki > Main > MergeOptions TWiki webs:
Main | TWiki | Know | Sandbox
Main . { Changes | Index? | Search | Go }

bin/nutch merge

The merge command just merges indexes. The other stuff in your segments directory is still needed. So if, for example, your segments are in a directory called segments, they've all been indexed, and you've run duplicate detection, then you're ready to merge.

called java class

net.nutch.indexer.IndexMerger

command line options

bin/nutch merge <indexDirectory> <segment_dirs>

bugs and solutions

You merge with something like:

bin/nutch merge . segments/*

This creates a merged index, containing the contents of all of the segments/*/index, in a new directory named after those segments, in your case 20030422113844-0_20030423144418-2.

Here's the bug. NutchBean? looks for a merged index in a directory named index. So, to make things work, you currently have to manually rename the merged index directory to be just index:

mv 20030422113844-0_20030423144418-2 index

If you run Tomcat while connected to a directory with subdirectories named index and segments, it will use the merged index data in index and get the rest of the segment data from the segments directory. Searches are much faster with a merged index.


Maybe I'm missing the point but I have the following directory structure.

data
data/db
data/segments
With the associate index and fetch under these.

If I run the command

bin/nutch merge . segments/*

in the data directory it tries to delete the contents of the data directory.

However I have found if I do

bin/nutch merge index segments/*

it creates a merged index. The created index directory should then be stored in the data directory:

data/
data/db
data/index
data/segments
and everything works as expected (N.B. you must keep the segments directory).

Topic MergeOptions . { Edit | Attach | Ref-By | Printable | Diffs | r1.1 | More }
Revision r1.1 - 09 Dec 2004 - 11:18 GMT - AlonsoAndres Copyright © 1999-2003 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback.