net.nutch.analysis.lang
Class NGramProfile

java.lang.Object
  extended bynet.nutch.analysis.lang.NGramProfile

public class NGramProfile
extends Object

This class runs a ngram analysis over submitted text, results might be used for automatic language identifiaction. The similarity calculation is at experimental level. You have been warned. Methods are provided to build new language profiles.

Author:
Sami Siren

Constructor Summary
NGramProfile(String name)
          Construct a new ngram profile
 
Method Summary
 void addNGrams(StringBuffer word)
          Add ngrams to table from a single word
 void addNGrams(StringBuffer word, int n)
          add ngram from word, n is submitted
 void addToken(Token t)
          add token to this profile
 void analyze(StringBuffer text)
          analyze a piece of text
static NGramProfile createNgramProfile(String name, InputStream is)
          Creates a new Language profile from (preferably quite large) text file
 String getName()
           
 float getSimilarity(NGramProfile another)
          Calculates a score how well models do compare This is just an experimental implementation, feel free to enhance
 Vector getSorted()
          return sorted vector of ngrams (sort done by count)
 void load(InputStream is)
          Loads a ngram profile from InputStream
static void main(String[] args)
          main method used for testing only
 void save(OutputStream os)
          Writes ngram profile into OutputStream
 void setName(String name)
           
 String toString()
          textual representation of this ngramprofile
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Constructor Detail

NGramProfile

public NGramProfile(String name)
Construct a new ngram profile

Parameters:
name - Name of profile
Method Detail

addToken

public void addToken(Token t)
add token to this profile

Parameters:
t - Token to be added

analyze

public void analyze(StringBuffer text)
analyze a piece of text

Parameters:
text - the text to be analyzed

addNGrams

public void addNGrams(StringBuffer word)
Add ngrams to table from a single word

Parameters:
word -

addNGrams

public void addNGrams(StringBuffer word,
                      int n)
add ngram from word, n is submitted

Parameters:
word -
n -

getSorted

public Vector getSorted()
return sorted vector of ngrams (sort done by count)

Returns:

toString

public String toString()
textual representation of this ngramprofile


getSimilarity

public float getSimilarity(NGramProfile another)
Calculates a score how well models do compare This is just an experimental implementation, feel free to enhance

Parameters:
another -
Returns:

load

public void load(InputStream is)
          throws IOException
Loads a ngram profile from InputStream

Throws:
IOException

createNgramProfile

public static NGramProfile createNgramProfile(String name,
                                              InputStream is)
Creates a new Language profile from (preferably quite large) text file

Parameters:
name - name of profile
is -

save

public void save(OutputStream os)
          throws IOException
Writes ngram profile into OutputStream

Throws:
IOException

main

public static void main(String[] args)
main method used for testing only

Parameters:
args -

getName

public String getName()
Returns:
Returns the name.

setName

public void setName(String name)
Parameters:
name - The name to set.


Copyright © 2004 The Nutch Organization.