net.nutch.analysis.lang
Class LanguageIdentifier
java.lang.Object
net.nutch.analysis.lang.LanguageIdentifier
- All Implemented Interfaces:
- IndexingFilter
- public class LanguageIdentifier
- extends Object
- implements IndexingFilter
- Author:
- Sami Siren
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
LOG
public static final Logger LOG
LanguageIdentifier
public LanguageIdentifier()
getInstance
public static LanguageIdentifier getInstance()
- return handle to singleton instance
main
public static void main(String[] args)
- main method used for testing
- Parameters:
args
-
identify
public String identify(String text)
- Identify language based on submitted content
- Parameters:
text
- text of doc
- Returns:
- ISO code of language (en, fi, sv...) , or null if unknown
identify
public String identify(StringBuffer text)
identify
public String identify(InputStream is)
throws IOException
- Identify language from inputstream
- Parameters:
is
-
- Returns:
-
- Throws:
IOException
filter
public Document filter(Document doc,
Parse parse,
FetcherOutput fo)
throws IndexingException
- Description copied from interface:
IndexingFilter
- Adds fields or otherwise modifies the document that will be indexed for a
parse.
- Specified by:
filter
in interface IndexingFilter
- Throws:
IndexingException
Copyright © 2004 The Nutch Organization.