|
|||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||||
java.lang.Objectnet.nutch.db.DistributedWebDBReader
The WebDBReader implements all the read-only parts of accessing our web database. All the writing ones can be found in WebDBWriter.
| Constructor Summary | |
DistributedWebDBReader(NutchFileSystem nutchfs,
String dbName)
Open a web db reader for the named directory. |
|
| Method Summary | |
void |
close()
Shutdown |
Link[] |
getLinks(MD5Hash md5)
Grab all the links from the given MD5 hash. |
Link[] |
getLinks(UTF8 url)
Get all the hyperlinks that link TO the indicated URL. |
Page |
getPage(String url)
Get Page from the pagedb with the given URL. |
Page[] |
getPages(MD5Hash md5)
Get all the Pages according to their content hash. |
Enumeration |
links()
Return all the links, by target URL |
static void |
main(String[] argv)
The DistributedWebDBReader.main() provides some handy utility methods for looking through the contents of the webdb. |
long |
numLinks()
Return the number of links in our db. |
int |
numMachines()
How many sections (machines) there are in this distributed db. |
long |
numPages()
Return the number of pages we're dealing with. |
boolean |
pageExists(MD5Hash md5)
Test whether a certain piece of content is in the database, but don't bother returning the Page(s) itself. |
Enumeration |
pages()
Iterate through all the Pages, sorted by URL. |
Enumeration |
pagesByMD5()
Iterate through all the Pages, sorted by MD5. |
| Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
| Constructor Detail |
public DistributedWebDBReader(NutchFileSystem nutchfs,
String dbName)
throws IOException,
FileNotFoundException
| Method Detail |
public void close()
throws IOException
close in interface IWebDBReaderIOExceptionpublic int numMachines()
public long numPages()
numPages in interface IWebDBReaderpublic long numLinks()
numLinks in interface IWebDBReader
public Page getPage(String url)
throws IOException
getPage in interface IWebDBReaderIOException
public Page[] getPages(MD5Hash md5)
throws IOException
getPages in interface IWebDBReaderIOException
public boolean pageExists(MD5Hash md5)
throws IOException
pageExists in interface IWebDBReaderIOException
public Enumeration pages()
throws IOException
pages in interface IWebDBReaderIOException
public Enumeration pagesByMD5()
throws IOException
pagesByMD5 in interface IWebDBReaderIOException
public Link[] getLinks(UTF8 url)
throws IOException
getLinks in interface IWebDBReaderIOException
public Link[] getLinks(MD5Hash md5)
throws IOException
getLinks in interface IWebDBReaderIOException
public Enumeration links()
throws IOException
links in interface IWebDBReaderIOException
public static void main(String[] argv)
throws FileNotFoundException,
IOException
FileNotFoundException
IOException
|
|||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||||