This page started out as the story of KragenSitaker getting Nutch 0.5 working on Debian Sarge (i.e. the Debian release known as "sarge"). I don't know much about Java or JSP or Servlets
or Tomcat or Nutch, so I probably did a bunch of dumb stuff, but I figure that if I document
the dumb stuff I did, the problems I had, and how I solved them, it might be helpful to
whoever does the same dumb stuff next week. If I'm lucky, maybe someone who knows what they're
doing can edit this page to explain better ways of solving these problems.
I'm using Tomcat 4.1.30-6 (Debian's 6th version of 4.1.30.)
I installed the Sun JDK
JDK 1.4.2_05-b04 from Sun's site; normally
installing proprietary software is not my idea of fun but it seemed
like I was probably going to run into enough trouble getting stuff
working as it was, without fighting with minority platforms. (I'd
already given up on Yellow Dog Linux on a G5 Cube for just this
reason.)
Once I'd run the self-extracting shell script, I used the Debian
"equivs" package, with the spec files in the Debian java-common
package, to tell Debian I had installed Java, as per the instructions
in the Debian Java FAQ,
http://www.debian.org/doc/manuals/debian-java-faq/ch11.html.
I'd previously done apt-get install tomcat4, but at this point
I removed kaffe, with apt-get remove kaffe, and also kaffe-common,
kaffe-pthreads, and libffi2, all of which had gotten installed with
Tomcat/Catalina in order to support Kaffe. Since I'd told Debian that
I'd installed Sun's Java with the equivs stuff, this didn't result in
uninstalling tomcat4.
At this point, for some reason, it wanted to use Kaffe to run
tomcat, even though Kaffe was no longer installed. I'd installed the
JDK in /usr/local/lib/j2sdk1.4.2_05 and symlinked that to
/usr/local/lib/jdk, so I added the line JAVA_HOME=/usr/local/lib/jdk
to /etc/default/tomcat4, and then Tomcat was able to run again with
/etc/init.d/tomcat4 start. As per Debian defaults, it was running
its HTTP server on port 8180.
The machine was behind a firewall that didn't let through port
8180 by default, so I used ssh -L 8180:themachine:8180 themachine to allow me to
point my web browser at http://localhost:8180/.
I enabled Tomcat's web administration interface by editing /var/lib/tomcat4/conf/tomcat-users.xml to
say
<user username="tomcat" password="censored" roles="tomcat,manager"/>
instead of
<user username="tomcat" password="censored" roles="tomcat"/>, and
added a <role rolename="manager"/>. Then, after a Tomcat restart, I
could go to http://localhost:8180/manager/html to see the list of
running servlets.
Then I told Tomcat to install
file:///home/kragen/pkgs/nutch-0.5/nutch-0.5.war in its cute little
software installation form. This caused Nutch to start working at
http://localhost:8180/nutch-0.5, which didn't work so well, because
Nutch's various index.html pages link to "/search.jsp", not
"search.jsp". But the HTML and images were OK.
Then I did it by hand: I renamed /var/lib/tomcat4/webapps/ROOT to
/var/lib/tomcat4/webapps/originally-ROOT, and restarted Tomcat again.
This made http://localhost:8180/ no longer do anything useful,
although things like http://localhost:8180/manager/html still worked
fine.
Then I copied nutch-0.5.war to /var/lib/tomcat4/webapps/ROOT.war,
and restarted Tomcat again. Now the Nutch HTML page loaded at
http://localhost:8180/ but the search still didn't work, because of
some random exception.
HTTP Status 500 -
type Exception report
message
description The server encountered an internal error () that prevented
it from fulfilling this request.
exception
org.apache.jasper.JasperException
at org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper.java:254)
at org.apache.jasper.servlet.JspServlet.serviceJspFile(JspServlet.java:295)
...
at org.apache.tomcat.util.net.TcpWorkerThread.runIt(PoolTcpEndpoint.java:584)
at org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:683)
at java.lang.Thread.run(Thread.java:534)
root cause
javax.servlet.ServletException
at org.apache.jasper.runtime.PageContextImpl.handlePageException(PageContextImpl.java:536)
at org.apache.jsp.search_jsp._jspService(search_jsp.java:409)
at org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:137)
...
at org.apache.tomcat.util.net.TcpWorkerThread.runIt(PoolTcpEndpoint.java:584)
at org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:683)
at java.lang.Thread.run(Thread.java:534)
At first I thought there was no Nutch code in this pair of stack
traces anywhere, making it very puzzling that Nutch's search.jsp was
failing, while the example JSPs shipped with Tomcat (including the
/manager/html one) and even Nutch's index.jsp all worked fine. But it
turns out that search_jsp.java above, with its _jspService entry
point, is generated from Nutch's search.jsp file. Unfortunately it
isn't specifying what the error was, just that there was a
ServletException? in handlePageException.
Now, on further thought, it seems a little odd that
handlePageException should be generating an exception itself. I
searched for this problem on the Web, and found a post by Doug Cutting
at
http://www.mail-archive.com/nutch-general@lists.sourceforge.net/msg00251.html,
where he responds to someone else who's having the same problem, and
he said:
What version of Nutch are you running? Have you modified
search.jsp? In my search_jsp.java (found under Tomcat's 'work'
directory) the call to handlePageException is at line 488, not
460, so it looks like your search.jsp is different than mine.
Tomcat is not providing a good stack trace here. You might look in
tomcat's logs to see if there's anything more informative
there. Alternately, insert something like the following at the top
of search.jsp:
<% try { %>
then put something like this at the bottom:
<% } catch (Throwable t) {
t.printStackTrace(new PrintWriter(out));
}
%>
Then run 'ant war' to rebuild the war file, move it to the webapps
directory, removing the old version, restart tomcat and retry.
This should give you more information about what's going on.
I made this modification and rebuilt; in my case, search.jsp was in pkgs/nutch-0.5/src/web/jsp/search.jsp,
and I ran ant war in pkgs/nutch-0.5. This generated
pkgs/nutch-0.5/build/nutch-0.6-dev.war, which I copied over
/var/lib/tomcat4/webapps/ROOT.war. It turned out that I had to also
rm -rf /var/lib/tomcat4/webapps/ROOT (which was generated
automatically by Tomcat) before restarting Tomcat in order to make
this take effect.
Once I did this, it turned out that the immediate problem was that
Nutch didn't have permission to log a message telling me what the real
problem was, so when handlePageException tried to log that message, it
would generate another exception. The traceback I was now generating
looked like this:
java.lang.ExceptionInInitializerError
at net.nutch.searcher.NutchBean.<clinit>(NutchBean.java:28)
at org.apache.jsp.search_jsp._jspService(search_jsp.java:66)
at org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:137)
...
at org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:683)
at java.lang.Thread.run(Thread.java:534)
Caused by: java.security.AccessControlException: access denied (java.util.logging.LoggingPermission control)
at java.security.AccessControlContext.checkPermission(AccessControlContext.java:269)
at java.security.AccessController.checkPermission(AccessController.java:401)
at java.lang.SecurityManager.checkPermission(SecurityManager.java:524)
...
I hacked around this temporarily by adding
/etc/tomcat4/policy.d/04webapps.policy to insert this line inside the
"grant" block:
permission java.util.logging.LoggingPermission "control", "";
(Thanks to "dcostakos" in
http://forum.java.sun.com/thread.jsp?forum=31&thread=320254&message=1293401
for that detail.)
I think this line should probably go somewhere else where it's
specific to Nutch, rather than to all web apps, but I haven't done
that yet.
Once I did this, I saw the real problem, the one that was causing
handlePageException to get invoked in the first place:
java.security.AccessControlException: access denied (java.io.FilePermission ./search-servers.txt read)
at java.security.AccessControlContext.checkPermission(AccessControlContext.java:269)
at java.security.AccessController.checkPermission(AccessController.java:401)
at java.lang.SecurityManager.checkPermission(SecurityManager.java:524)
at java.lang.SecurityManager.checkRead(SecurityManager.java:863)
at java.io.File.exists(File.java:678)
at net.nutch.searcher.NutchBean.<init>(NutchBean.java:64)
at net.nutch.searcher.NutchBean.<init>(NutchBean.java:58)
at net.nutch.searcher.NutchBean.get(NutchBean.java:50)
at org.apache.jsp.search_jsp._jspService(search_jsp.java:66)
That's very similar to the problem I solved in an earlier step by granting
Nutch permission to log messages; I solved it by adding this line next
to that one, to grant Nutch permission to access files:
permission java.io.FilePermission "./*", "read,write,execute,delete";
Note that the file path in question is relative to the current working
directory, which means that Nutch's behavior depends in part upon what
directory you're in when you restart Tomcat. This is documented but
it could be confusing.
Once I gave Nutch permission to access files, I ran into the next
problem:
java.security.AccessControlException: access denied (java.util.PropertyPermission user.dir read)
at java.security.AccessControlContext.checkPermission(AccessControlContext.java:269)
...
at java.io.File.getCanonicalPath(File.java:513)
at net.nutch.searcher.NutchBean.init(NutchBean.java:78)
So I added this line next to the other permission lines:
permission java.util.PropertyPermission "user.dir", "read";
Nutch first reported that there weren't any index segments to read
in the current directory with the following exception:
java.lang.NullPointerException
at net.nutch.searcher.NutchBean.init(NutchBean.java:82)
at net.nutch.searcher.NutchBean.<init>(NutchBean.java:68)
at net.nutch.searcher.NutchBean.<init>(NutchBean.java:58)
at net.nutch.searcher.NutchBean.get(NutchBean.java:50)
at org.apache.jsp.search_jsp._jspService(search_jsp.java:66)
at org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:137)
So I cd'ed into a directory where I'd previously done a Nutch
crawl and restarted Tomcat, and got a different traceback:
java.lang.ExceptionInInitializerError
at org.apache.lucene.index.IndexReader.open(IndexReader.java:95)
at net.nutch.searcher.IndexSearcher.<init>(IndexSearcher.java:56)
at net.nutch.searcher.NutchBean.init(NutchBean.java:76)
...
at org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:683)
at java.lang.Thread.run(Thread.java:534)
Caused by: java.security.AccessControlException: access denied (java.util.PropertyPermission disableLuceneLocks read)
at java.security.AccessControlContext.checkPermission(AccessControlContext.java:269)
at java.security.AccessController.checkPermission(AccessController.java:401)
at java.lang.SecurityManager.checkPermission(SecurityManager.java:524)
at java.lang.SecurityManager.checkPropertyAccess(SecurityManager.java:1276)
at java.lang.System.getProperty(System.java:573)
at java.lang.Boolean.getBoolean(Boolean.java:205)
at org.apache.lucene.store.FSDirectory.<clinit>(FSDirectory.java:48)
So I added this line near the other permission lines:
permission java.util.PropertyPermission "disableLuceneLocks", "read";
And then got this error:
java.lang.ExceptionInInitializerError
at org.apache.lucene.index.IndexReader.open(IndexReader.java:95)
at net.nutch.searcher.IndexSearcher.<init>(IndexSearcher.java:56)
...
Caused by: java.security.AccessControlException: access denied (java.util.PropertyPermission java.io.tmpdir read)
...
at org.apache.lucene.store.FSDirectory.<clinit>(FSDirectory.java:55)
And added this line:
permission java.util.PropertyPermission "java.io.tmpdir", "read";
And then got this error:
java.lang.ExceptionInInitializerError
at org.apache.lucene.index.IndexReader.open(IndexReader.java:95)
...
Caused by: java.security.AccessControlException: access denied (java.util.PropertyPermission org.apache.lucene.lockdir read)
...
at org.apache.lucene.store.FSDirectory.<clinit>(FSDirectory.java:55)
And added this line:
permission java.util.PropertyPermission "org.apache.*", "read";
And then got this error:
java.security.AccessControlException: access denied (java.io.FilePermission /var/lib/tomcat4/temp read)
...
at java.io.File.exists(File.java:678)
at org.apache.lucene.store.FSDirectory$1.obtain(FSDirectory.java:301)
And then added this line, which I'm pretty sure should have some
variable (perhaps ${catalina.home}?) in it somewhere:
permission java.io.FilePermission "/var/lib/tomcat4/temp", "read,write,execute,delete";
And then got this error:
java.security.AccessControlException: access denied (java.io.FilePermission /var/lib/tomcat4/temp/lucene-f537d0632d86524af6b916bc13536be9-commit.lock write)
And then added this line:
permission java.io.FilePermission "/var/lib/tomcat4/temp/*", "read,write,execute,delete";
And then got this error:
java.security.AccessControlException: access denied (java.io.FilePermission /home/kragen/nutch/crawl.test/index/segments read)
So at this point I lost patience and replaced all the FilePermission?
lines (except the ./* one) with this one:
permission java.io.FilePermission "/-", "read,write,execute,delete";
And then got this error:
java.lang.ExceptionInInitializerError
at net.nutch.analysis.NutchAnalysis.compound(NutchAnalysis.java:250)
at net.nutch.analysis.NutchAnalysis.parse(NutchAnalysis.java:115)
at net.nutch.analysis.NutchAnalysis.parseQuery(NutchAnalysis.java:39)
at net.nutch.searcher.Query.parse(Query.java:395)
at org.apache.jsp.search_jsp._jspService(search_jsp.java:93)
...
Caused by: java.security.AccessControlException: access denied (java.lang.RuntimePermission createClassLoader)
at java.security.AccessControlContext.checkPermission(AccessControlContext.java:269)
at java.security.AccessController.checkPermission(AccessController.java:401)
at java.lang.SecurityManager.checkPermission(SecurityManager.java:524)
at java.lang.SecurityManager.checkCreateClassLoader(SecurityManager.java:586)
at java.lang.ClassLoader.<init>(ClassLoader.java:186)
at java.security.SecureClassLoader.<init>(SecureClassLoader.java:53)
at java.net.URLClassLoader.<init>(URLClassLoader.java:81)
at net.nutch.plugin.PluginClassLoader.<init>(PluginClassLoader.java:28)
at net.nutch.plugin.PluginDescriptor.getClassLoader(PluginDescriptor.java:256)
at net.nutch.plugin.Extension.getExtensionInstance(Extension.java:127)
at net.nutch.searcher.QueryFilters.<clinit>(QueryFilters.java:45)
So I added this line:
permission java.lang.RuntimePermission "createClassLoader", "";
And then got this error:
java.lang.NoClassDefFoundError: org/apache/coyote/http11/Http11Processor$1
at org.apache.coyote.http11.Http11Processor.prepareResponse(Http11Processor.java:1513)
at org.apache.coyote.http11.Http11Processor.action(Http11Processor.java:921)
at org.apache.coyote.Response.action(Response.java:224)
at org.apache.coyote.http11.InternalOutputBuffer.doWrite(InternalOutputBuffer.java:605)
at org.apache.coyote.Response.doWrite(Response.java:586)
at org.apache.coyote.tomcat4.OutputBuffer.realWriteBytes(OutputBuffer.java:405)
at org.apache.tomcat.util.buf.ByteChunk.flushBuffer(ByteChunk.java:436)
at org.apache.coyote.tomcat4.OutputBuffer.doFlush(OutputBuffer.java:354)
at org.apache.coyote.tomcat4.OutputBuffer.flush(OutputBuffer.java:336)
at org.apache.coyote.tomcat4.CoyoteWriter.flush(CoyoteWriter.java:117)
at org.apache.jasper.runtime.JspWriterImpl.flush(JspWriterImpl.java:209)
at org.apache.jsp.search_jsp._jspService(search_jsp.java:108)
So I restarted Tomcat again, but it didn't help.
/usr/share/tomcat4/server/lib/tomcat-http11.jar contains both
org/apache/coyote/http11/Http11Processor.class (which is presumably
successfully loaded, since it's in the traceback) and
org/apache/coyote/http11/Http11Processor$1.class (which it's
complaining it can't find.)
But it turned out that if I hit the same URL again, it works fine. So far
I'm just living with the first search after each Tomcat restart returning a
NoClassDefFoundError?, but I hope I can fix this soon.
Here are the versions of the relevant Debian packages:
ii java-common 0.22 Base of all Java packages
ii java-compiler-dummy 1.0 Dummy package providing java-compiler
ii java-virtual-machine-dummy 1.0 Dummy package providing java-virtual-machine
ii java1-runtime-dummy 1.0 Dummy package providing java1-runtime
ii java2-compiler-dummy 1.0 Dummy package providing java2-compiler
ii java2-runtime-dummy 1.0 Dummy package providing java2-runtime
ii jikes 1.21.1-2 Fast Java compiler adhering to language and VM specifications
ii junit 3.8.1.1-2 Automated testing framework for Java
ii jython 2.1.0-18 Python seamlessly integrated with Java
ii libreadline-java 0.8.0.1-2 GNU readline and BSD editline wrappers for Java
ii libservlet2.3-java 4.0-5 Servlet 2.3 and JSP 1.2 Java classes and documentation
ii libant1.6-java 1.6.2-1 Java based build tool like make -- library
ii libapache-mod-jk 1.2.5-2 Apache 1.3 connector for the Tomcat Java servlet engine
ii libbcel-java 5.1-1 Analyze, create, and manipulate (binary) Java class files
ii libcommons-beanutils-java 1.6.1-4 utility for manipulating JavaBeans
ii libcommons-collections-java 2.1.1-3 A set of abstract data type interfaces and implementations
ii libcommons-dbcp-java 1.2.1-1 Database Connection Pooling Services
ii libcommons-digester-java 1.5.0.1-3 Rule based XML Java object mapping tool
ii libcommons-fileupload-java 1.0-8 File upload capability to your servlets and web applications
ii libcommons-lang-java 2.0-6 Extension of the java.lang package
ii libcommons-logging-java 1.0.4-2 The commmon wrapper interface for several logging API
ii libcommons-modeler-java 1.1-1 A convenience library to use Java Management Extensions (JMX)
ii libcommons-pool-java 1.2-2 A set of objects that implement the pooling pattern for java objects
ii libcommons-validator-java 1.0.2-7 ease and speed development and maintenance of validation rules
ii libgnujaxp-java 0.0.cvs20040416-6 free implementation of jaxp api
ii libjaxp1.2-java 1.2.01-1 Java XML parser and transformer APIs (DOM, SAX, JAXP, TrAX)
ii liblog4j1.2-java 1.2.8-7 Logging library for java
ii libmx4j-java 2.0.1-2 An open source implementation of the JMX(TM) technology
ii liboro-java 2.0.8-1.1 Regular expression library for Java
ii libreadline-java 0.8.0.1-2 GNU readline and BSD editline wrappers for Java
ii libregexp-java 1.3-1 regular expression library for Java
ii libservlet2.3-java 4.0-5 Servlet 2.3 and JSP 1.2 Java classes and documentation
ii libstruts1.1-java 1.1-2 Java Framework for MVC web applications
ii libtomcat4-java 4.1.30-6 Java Servlet engine -- core libraries
ii libxerces2-java 2.6.2-1 Validating XML parser for Java with DOM level 3 support
ii tomcat4 4.1.30-6 Java Servlet 2.3 engine with JSP 1.2 support
ii tomcat4-admin 4.1.30-6 Java Servlet engine -- admin web interfaces
ii tomcat4-webapps 4.1.30-6 Java Servlet engine -- documentation and example web applications
Here's the chunk I ended up adding to the top of /etc/tomcat4/policy.d/04webapps.policy:
// XXX hack for nutch
permission java.util.logging.LoggingPermission "control", "";
permission java.io.FilePermission "./*", "read,write,execute,delete";
permission java.util.PropertyPermission "user.dir", "read";
permission java.util.PropertyPermission "disableLuceneLocks", "read";
permission java.util.PropertyPermission "java.io.tmpdir", "read";
permission java.util.PropertyPermission "org.apache.*", "read";
permission java.io.FilePermission "/-", "read,write,execute,delete";
permission java.lang.RuntimePermission "createClassLoader", "";
That's just inside the grant {...} block.
|
Revision r1.1 - 23 Nov 2004 - 01:38 GMT - TomBloomfield
|
Copyright © 1999-2003 by the contributing authors.
All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback.
|
| |