Nutch Wiki TWiki > Main > GettingNutchRunningOnDebian TWiki webs:
Main | TWiki | Know | Sandbox
Main . { Changes | Index? | Search | Go }
This page started out as the story of KragenSitaker getting Nutch 0.5 working on Debian Sarge (i.e. the Debian release known as "sarge"). I don't know much about Java or JSP or Servlets or Tomcat or Nutch, so I probably did a bunch of dumb stuff, but I figure that if I document the dumb stuff I did, the problems I had, and how I solved them, it might be helpful to whoever does the same dumb stuff next week. If I'm lucky, maybe someone who knows what they're doing can edit this page to explain better ways of solving these problems.

First, I had to get Java and Tomcat running.

I'm using Tomcat 4.1.30-6 (Debian's 6th version of 4.1.30.)

I installed the Sun JDK

JDK 1.4.2_05-b04 from Sun's site; normally installing proprietary software is not my idea of fun but it seemed like I was probably going to run into enough trouble getting stuff working as it was, without fighting with minority platforms. (I'd already given up on Yellow Dog Linux on a G5 Cube for just this reason.)

I told Debian I'd installed Java

Once I'd run the self-extracting shell script, I used the Debian "equivs" package, with the spec files in the Debian java-common package, to tell Debian I had installed Java, as per the instructions in the Debian Java FAQ, http://www.debian.org/doc/manuals/debian-java-faq/ch11.html.

I removed Kaffe

I'd previously done apt-get install tomcat4, but at this point I removed kaffe, with apt-get remove kaffe, and also kaffe-common, kaffe-pthreads, and libffi2, all of which had gotten installed with Tomcat/Catalina in order to support Kaffe. Since I'd told Debian that I'd installed Sun's Java with the equivs stuff, this didn't result in uninstalling tomcat4.

I got Tomcat running

At this point, for some reason, it wanted to use Kaffe to run tomcat, even though Kaffe was no longer installed. I'd installed the JDK in /usr/local/lib/j2sdk1.4.2_05 and symlinked that to /usr/local/lib/jdk, so I added the line JAVA_HOME=/usr/local/lib/jdk to /etc/default/tomcat4, and then Tomcat was able to run again with /etc/init.d/tomcat4 start. As per Debian defaults, it was running its HTTP server on port 8180.

I circumvented the firewall

The machine was behind a firewall that didn't let through port 8180 by default, so I used ssh -L 8180:themachine:8180 themachine to allow me to point my web browser at http://localhost:8180/.

Next, I had to install Nutch on Tomcat.

My first attempt was to do this through Tomcat's web user interface. Bad idea.

I added a manager user.

I enabled Tomcat's web administration interface by editing /var/lib/tomcat4/conf/tomcat-users.xml to say <user username="tomcat" password="censored" roles="tomcat,manager"/> instead of <user username="tomcat" password="censored" roles="tomcat"/>, and added a <role rolename="manager"/>. Then, after a Tomcat restart, I could go to http://localhost:8180/manager/html to see the list of running servlets.

I uploaded the Nutch war file.

Then I told Tomcat to install file:///home/kragen/pkgs/nutch-0.5/nutch-0.5.war in its cute little software installation form. This caused Nutch to start working at http://localhost:8180/nutch-0.5, which didn't work so well, because Nutch's various index.html pages link to "/search.jsp", not "search.jsp". But the HTML and images were OK.

I installed the Nutch war file by hand.

Then I did it by hand: I renamed /var/lib/tomcat4/webapps/ROOT to

/var/lib/tomcat4/webapps/originally-ROOT, and restarted Tomcat again. This made http://localhost:8180/ no longer do anything useful, although things like http://localhost:8180/manager/html still worked fine.

I copied nutch-0.5.war into place.

Then I copied nutch-0.5.war to /var/lib/tomcat4/webapps/ROOT.war, and restarted Tomcat again. Now the Nutch HTML page loaded at

http://localhost:8180/ but the search still didn't work, because of some random exception.

Then, I had to diagnose the problem with Nutch.

At first, Tomcat was producing a message with almost no useful information in it:

    HTTP Status 500 -
    type Exception report
    message
    description The server encountered an internal error () that prevented 
      it from fulfilling this request.

    exception
    org.apache.jasper.JasperException
       at org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper.java:254)
       at org.apache.jasper.servlet.JspServlet.serviceJspFile(JspServlet.java:295)
    ...
       at org.apache.tomcat.util.net.TcpWorkerThread.runIt(PoolTcpEndpoint.java:584)
       at org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:683)
       at java.lang.Thread.run(Thread.java:534)

    root cause

    javax.servlet.ServletException
       at org.apache.jasper.runtime.PageContextImpl.handlePageException(PageContextImpl.java:536)
       at org.apache.jsp.search_jsp._jspService(search_jsp.java:409)
       at org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:137)
    ...
       at org.apache.tomcat.util.net.TcpWorkerThread.runIt(PoolTcpEndpoint.java:584)
       at org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:683)
       at java.lang.Thread.run(Thread.java:534)

At first I thought there was no Nutch code in this pair of stack traces anywhere, making it very puzzling that Nutch's search.jsp was failing, while the example JSPs shipped with Tomcat (including the /manager/html one) and even Nutch's index.jsp all worked fine. But it turns out that search_jsp.java above, with its _jspService entry point, is generated from Nutch's search.jsp file. Unfortunately it isn't specifying what the error was, just that there was a ServletException? in handlePageException.

Now, on further thought, it seems a little odd that handlePageException should be generating an exception itself. I searched for this problem on the Web, and found a post by Doug Cutting at http://www.mail-archive.com/nutch-general@lists.sourceforge.net/msg00251.html, where he responds to someone else who's having the same problem, and he said:

    What version of Nutch are you running? Have you modified
    search.jsp? In my search_jsp.java (found under Tomcat's 'work'
    directory) the call to handlePageException is at line 488, not
    460, so it looks like your search.jsp is different than mine.

    Tomcat is not providing a good stack trace here. You might look in
    tomcat's logs to see if there's anything more informative
    there. Alternately, insert something like the following at the top
    of search.jsp:

    <% try { %>

    then put something like this at the bottom:

    <% } catch (Throwable t) {
    t.printStackTrace(new PrintWriter(out));
       }
    %>

    Then run 'ant war' to rebuild the war file, move it to the webapps
    directory, removing the old version, restart tomcat and retry.

    This should give you more information about what's going on.

I made this modification and rebuilt; in my case, search.jsp was in pkgs/nutch-0.5/src/web/jsp/search.jsp, and I ran ant war in pkgs/nutch-0.5. This generated

pkgs/nutch-0.5/build/nutch-0.6-dev.war, which I copied over /var/lib/tomcat4/webapps/ROOT.war. It turned out that I had to also rm -rf /var/lib/tomcat4/webapps/ROOT (which was generated automatically by Tomcat) before restarting Tomcat in order to make this take effect.

I granted Nutch logging permission.

Once I did this, it turned out that the immediate problem was that Nutch didn't have permission to log a message telling me what the real problem was, so when handlePageException tried to log that message, it would generate another exception. The traceback I was now generating looked like this:

    java.lang.ExceptionInInitializerError
       at net.nutch.searcher.NutchBean.<clinit>(NutchBean.java:28)
       at org.apache.jsp.search_jsp._jspService(search_jsp.java:66)
       at org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:137)
    ...
       at org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:683)
       at java.lang.Thread.run(Thread.java:534)
    Caused by: java.security.AccessControlException: access denied (java.util.logging.LoggingPermission control)
       at java.security.AccessControlContext.checkPermission(AccessControlContext.java:269)
       at java.security.AccessController.checkPermission(AccessController.java:401)
       at java.lang.SecurityManager.checkPermission(SecurityManager.java:524)
    ...

I hacked around this temporarily by adding /etc/tomcat4/policy.d/04webapps.policy to insert this line inside the "grant" block:

  permission java.util.logging.LoggingPermission "control", "";

(Thanks to "dcostakos" in http://forum.java.sun.com/thread.jsp?forum=31&thread=320254&message=1293401

for that detail.)

I think this line should probably go somewhere else where it's specific to Nutch, rather than to all web apps, but I haven't done that yet.

Then I granted Nutch permission to access files.

Once I did this, I saw the real problem, the one that was causing handlePageException to get invoked in the first place:

    java.security.AccessControlException: access denied (java.io.FilePermission ./search-servers.txt read)
       at java.security.AccessControlContext.checkPermission(AccessControlContext.java:269)
       at java.security.AccessController.checkPermission(AccessController.java:401)
       at java.lang.SecurityManager.checkPermission(SecurityManager.java:524)
       at java.lang.SecurityManager.checkRead(SecurityManager.java:863)
       at java.io.File.exists(File.java:678)
       at net.nutch.searcher.NutchBean.<init>(NutchBean.java:64)
       at net.nutch.searcher.NutchBean.<init>(NutchBean.java:58)
       at net.nutch.searcher.NutchBean.get(NutchBean.java:50)
       at org.apache.jsp.search_jsp._jspService(search_jsp.java:66)

That's very similar to the problem I solved in an earlier step by granting Nutch permission to log messages; I solved it by adding this line next to that one, to grant Nutch permission to access files:

  permission java.io.FilePermission "./*", "read,write,execute,delete";

Note that the file path in question is relative to the current working directory, which means that Nutch's behavior depends in part upon what directory you're in when you restart Tomcat. This is documented but it could be confusing.

Then I gave Nutch permission to read property user.dir.

Once I gave Nutch permission to access files, I ran into the next problem:

    java.security.AccessControlException: access denied (java.util.PropertyPermission user.dir read)
       at java.security.AccessControlContext.checkPermission(AccessControlContext.java:269)
    ...
       at java.io.File.getCanonicalPath(File.java:513)
       at net.nutch.searcher.NutchBean.init(NutchBean.java:78)

So I added this line next to the other permission lines:

  permission java.util.PropertyPermission "user.dir", "read";

Then I had to give Nutch some index segments to chew on.

Nutch died with a NullPointerException? because it had no index segments.

Nutch first reported that there weren't any index segments to read in the current directory with the following exception:

    java.lang.NullPointerException
       at net.nutch.searcher.NutchBean.init(NutchBean.java:82)
       at net.nutch.searcher.NutchBean.<init>(NutchBean.java:68)
       at net.nutch.searcher.NutchBean.<init>(NutchBean.java:58)
       at net.nutch.searcher.NutchBean.get(NutchBean.java:50)
       at org.apache.jsp.search_jsp._jspService(search_jsp.java:66)
       at org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:137)

I restarted Tomcat in a different directory and granted Nutch a bunch more permissions.

So I cd'ed into a directory where I'd previously done a Nutch crawl and restarted Tomcat, and got a different traceback:


    java.lang.ExceptionInInitializerError
       at org.apache.lucene.index.IndexReader.open(IndexReader.java:95)
       at net.nutch.searcher.IndexSearcher.<init>(IndexSearcher.java:56)
       at net.nutch.searcher.NutchBean.init(NutchBean.java:76)
    ...
       at org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:683)
       at java.lang.Thread.run(Thread.java:534)
    Caused by: java.security.AccessControlException: access denied (java.util.PropertyPermission disableLuceneLocks read)
       at java.security.AccessControlContext.checkPermission(AccessControlContext.java:269)
       at java.security.AccessController.checkPermission(AccessController.java:401)
       at java.lang.SecurityManager.checkPermission(SecurityManager.java:524)
       at java.lang.SecurityManager.checkPropertyAccess(SecurityManager.java:1276)
       at java.lang.System.getProperty(System.java:573)
       at java.lang.Boolean.getBoolean(Boolean.java:205)
       at org.apache.lucene.store.FSDirectory.<clinit>(FSDirectory.java:48)

So I added this line near the other permission lines:

  permission java.util.PropertyPermission "disableLuceneLocks", "read"; 

And then got this error:

    java.lang.ExceptionInInitializerError
       at org.apache.lucene.index.IndexReader.open(IndexReader.java:95)
       at net.nutch.searcher.IndexSearcher.<init>(IndexSearcher.java:56)
    ...
    Caused by: java.security.AccessControlException: access denied (java.util.PropertyPermission java.io.tmpdir read)
    ...
       at org.apache.lucene.store.FSDirectory.<clinit>(FSDirectory.java:55)

And added this line:

  permission java.util.PropertyPermission "java.io.tmpdir", "read";

And then got this error:

    java.lang.ExceptionInInitializerError
       at org.apache.lucene.index.IndexReader.open(IndexReader.java:95)
    ...
    Caused by: java.security.AccessControlException: access denied (java.util.PropertyPermission org.apache.lucene.lockdir read)
    ...
       at org.apache.lucene.store.FSDirectory.<clinit>(FSDirectory.java:55)

And added this line:

  permission java.util.PropertyPermission "org.apache.*", "read";

And then got this error:

    java.security.AccessControlException: access denied (java.io.FilePermission /var/lib/tomcat4/temp read)
    ...
       at java.io.File.exists(File.java:678)
       at org.apache.lucene.store.FSDirectory$1.obtain(FSDirectory.java:301)

And then added this line, which I'm pretty sure should have some variable (perhaps ${catalina.home}?) in it somewhere:

  permission java.io.FilePermission "/var/lib/tomcat4/temp", "read,write,execute,delete";

And then got this error:

    java.security.AccessControlException: access denied (java.io.FilePermission /var/lib/tomcat4/temp/lucene-f537d0632d86524af6b916bc13536be9-commit.lock write)

And then added this line:

  permission java.io.FilePermission "/var/lib/tomcat4/temp/*", "read,write,execute,delete";

And then got this error:

    java.security.AccessControlException: access denied (java.io.FilePermission /home/kragen/nutch/crawl.test/index/segments read)

So at this point I lost patience and replaced all the FilePermission? lines (except the ./* one) with this one:

    permission java.io.FilePermission "/-", "read,write,execute,delete";

And then got this error:

    java.lang.ExceptionInInitializerError
       at net.nutch.analysis.NutchAnalysis.compound(NutchAnalysis.java:250)
       at net.nutch.analysis.NutchAnalysis.parse(NutchAnalysis.java:115)
       at net.nutch.analysis.NutchAnalysis.parseQuery(NutchAnalysis.java:39)
       at net.nutch.searcher.Query.parse(Query.java:395)
       at org.apache.jsp.search_jsp._jspService(search_jsp.java:93)
    ...
    Caused by: java.security.AccessControlException: access denied (java.lang.RuntimePermission createClassLoader)
       at java.security.AccessControlContext.checkPermission(AccessControlContext.java:269)
       at java.security.AccessController.checkPermission(AccessController.java:401)
       at java.lang.SecurityManager.checkPermission(SecurityManager.java:524)
       at java.lang.SecurityManager.checkCreateClassLoader(SecurityManager.java:586)
       at java.lang.ClassLoader.<init>(ClassLoader.java:186)
       at java.security.SecureClassLoader.<init>(SecureClassLoader.java:53)
       at java.net.URLClassLoader.<init>(URLClassLoader.java:81)
       at net.nutch.plugin.PluginClassLoader.<init>(PluginClassLoader.java:28)
       at net.nutch.plugin.PluginDescriptor.getClassLoader(PluginDescriptor.java:256)
       at net.nutch.plugin.Extension.getExtensionInstance(Extension.java:127)
       at net.nutch.searcher.QueryFilters.<clinit>(QueryFilters.java:45)

So I added this line:

  permission java.lang.RuntimePermission "createClassLoader", "";

And then got this error:

    java.lang.NoClassDefFoundError: org/apache/coyote/http11/Http11Processor$1
       at org.apache.coyote.http11.Http11Processor.prepareResponse(Http11Processor.java:1513)
       at org.apache.coyote.http11.Http11Processor.action(Http11Processor.java:921)
       at org.apache.coyote.Response.action(Response.java:224)
       at org.apache.coyote.http11.InternalOutputBuffer.doWrite(InternalOutputBuffer.java:605)
       at org.apache.coyote.Response.doWrite(Response.java:586)
       at org.apache.coyote.tomcat4.OutputBuffer.realWriteBytes(OutputBuffer.java:405)
       at org.apache.tomcat.util.buf.ByteChunk.flushBuffer(ByteChunk.java:436)
       at org.apache.coyote.tomcat4.OutputBuffer.doFlush(OutputBuffer.java:354)
       at org.apache.coyote.tomcat4.OutputBuffer.flush(OutputBuffer.java:336)
       at org.apache.coyote.tomcat4.CoyoteWriter.flush(CoyoteWriter.java:117)
       at org.apache.jasper.runtime.JspWriterImpl.flush(JspWriterImpl.java:209)
       at org.apache.jsp.search_jsp._jspService(search_jsp.java:108)

So I restarted Tomcat again, but it didn't help. /usr/share/tomcat4/server/lib/tomcat-http11.jar contains both org/apache/coyote/http11/Http11Processor.class (which is presumably successfully loaded, since it's in the traceback) and org/apache/coyote/http11/Http11Processor$1.class (which it's complaining it can't find.)

But it turned out that if I hit the same URL again, it works fine. So far I'm just living with the first search after each Tomcat restart returning a NoClassDefFoundError?, but I hope I can fix this soon.

Ultimately, I ended up with a few packages installed and a few modifications to stock files.

Here are the versions of the relevant Debian packages:

ii  java-common                        0.22                               Base of all Java packages
ii  java-compiler-dummy                1.0                                Dummy package providing java-compiler
ii  java-virtual-machine-dummy         1.0                                Dummy package providing java-virtual-machine
ii  java1-runtime-dummy                1.0                                Dummy package providing java1-runtime
ii  java2-compiler-dummy               1.0                                Dummy package providing java2-compiler
ii  java2-runtime-dummy                1.0                                Dummy package providing java2-runtime
ii  jikes                              1.21.1-2                           Fast Java compiler adhering to language and VM specifications
ii  junit                              3.8.1.1-2                          Automated testing framework for Java
ii  jython                             2.1.0-18                           Python seamlessly integrated with Java
ii  libreadline-java                   0.8.0.1-2                          GNU readline and BSD editline wrappers for Java
ii  libservlet2.3-java                 4.0-5                              Servlet 2.3 and JSP 1.2 Java classes and documentation
ii  libant1.6-java                     1.6.2-1                            Java based build tool like make -- library
ii  libapache-mod-jk                   1.2.5-2                            Apache 1.3 connector for the Tomcat Java servlet engine
ii  libbcel-java                       5.1-1                              Analyze, create, and manipulate (binary) Java class files
ii  libcommons-beanutils-java          1.6.1-4                            utility for manipulating JavaBeans
ii  libcommons-collections-java        2.1.1-3                            A set of abstract data type interfaces and implementations
ii  libcommons-dbcp-java               1.2.1-1                            Database Connection Pooling Services
ii  libcommons-digester-java           1.5.0.1-3                          Rule based XML Java object mapping tool
ii  libcommons-fileupload-java         1.0-8                              File upload capability to your servlets and web applications
ii  libcommons-lang-java               2.0-6                              Extension of the java.lang package
ii  libcommons-logging-java            1.0.4-2                            The commmon wrapper interface for several logging API
ii  libcommons-modeler-java            1.1-1                              A convenience library to use Java Management Extensions (JMX)
ii  libcommons-pool-java               1.2-2                              A set of objects that implement the pooling pattern for java objects
ii  libcommons-validator-java          1.0.2-7                            ease and speed development and maintenance of validation rules
ii  libgnujaxp-java                    0.0.cvs20040416-6                  free implementation of jaxp api
ii  libjaxp1.2-java                    1.2.01-1                           Java XML parser and transformer APIs (DOM, SAX, JAXP, TrAX)
ii  liblog4j1.2-java                   1.2.8-7                            Logging library for java
ii  libmx4j-java                       2.0.1-2                            An open source implementation of the JMX(TM) technology
ii  liboro-java                        2.0.8-1.1                          Regular expression library for Java
ii  libreadline-java                   0.8.0.1-2                          GNU readline and BSD editline wrappers for Java
ii  libregexp-java                     1.3-1                              regular expression library for Java
ii  libservlet2.3-java                 4.0-5                              Servlet 2.3 and JSP 1.2 Java classes and documentation
ii  libstruts1.1-java                  1.1-2                              Java Framework for MVC web applications
ii  libtomcat4-java                    4.1.30-6                           Java Servlet engine -- core libraries
ii  libxerces2-java                    2.6.2-1                            Validating XML parser for Java with DOM level 3 support
ii  tomcat4                            4.1.30-6                           Java Servlet 2.3 engine with JSP 1.2 support
ii  tomcat4-admin                      4.1.30-6                           Java Servlet engine -- admin web interfaces
ii  tomcat4-webapps                    4.1.30-6                           Java Servlet engine -- documentation and example web applications

Here's the chunk I ended up adding to the top of /etc/tomcat4/policy.d/04webapps.policy:

  // XXX hack for nutch
  permission java.util.logging.LoggingPermission "control", "";
  permission java.io.FilePermission "./*", "read,write,execute,delete"; 
  permission java.util.PropertyPermission "user.dir", "read";
  permission java.util.PropertyPermission "disableLuceneLocks", "read";
  permission java.util.PropertyPermission "java.io.tmpdir", "read";
  permission java.util.PropertyPermission "org.apache.*", "read";
  permission java.io.FilePermission "/-", "read,write,execute,delete"; 
  permission java.lang.RuntimePermission "createClassLoader", "";

That's just inside the grant {...} block.

Topic GettingNutchRunningOnDebian . { Edit | Attach | Ref-By | Printable | Diffs | r1.1 | More }
Revision r1.1 - 23 Nov 2004 - 01:38 GMT - TomBloomfield Copyright © 1999-2003 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback.