developer information

How to Contribute

Contributions are merit-based. Other developers must see contributions in order to evaluate them, suggest improvements, and integrate them into the sourcebase.

Contributors should follow these steps:

  1. Check the nutch developers mailing list to see if anyone is already working on what you are interested in working on. If so, you might want to contact that person to see how far along the work has come.
  2. If it looks like you are not duplicating effort, send a small piece of mail saying you are about to do the work. When future people want to contribute, they will see your letter during step (1) above.
  3. Once you've done some, submit the diffs to the nutch developers mailing list, or attach them to a bug report. We can all then examine the work for quality, relevance, etc. Details like formatting, documentation, and coding conventions are important.
  4. We hope everyone will try to provide good feedback on your work, but honestly everyone's time is very limited. Make it easy for people to examine your work by making it:
    • high-quality;
    • easy-to-read; and
    • easy-to-integrate; and
    • relevant to Nutch's stated goals.
  5. If everything seems right, we'll accept it into the source base and it will become part of Nutch.
  6. Collect glory and good karma. Goto step 1.

Please also read the developer policies page.

Needed contributions

Nutch needs contributions in the following areas (among others). If you think you can help with these, or with something else, please send a message to dev@nutch.org.


Nutch intends to be international. At present, we believe that our indexing works well for western languages. But we need:

  • Translations of the basic Nutch pages (at least search.xml, help.xml and about.xml) into other languages.
  • Testing and development work on improved Asian language support
More information about how to internationalize can be found on the i18n page.

Search Parameter Tuning

Nutch has not yet been tuned for quality. There are ten or twenty knobs that we can twiddle to adjust the ranking formula. We have started developing software to do this tuning automatically, but the current code just contains guesses. With a little tuning we should be able to get results that are competitive with those of major search engines.

Alternate Content Types

Nutch currently only supports HTML content accessed by HTTP. It would be great to add support for PDF files, image search, etc.

    Creative Commons License
Except where otherwise noted,
this site is licensed under a Creative Commons License.
ca | de | en | es | fi | fr | hu | jp | ms | nl | pl | pt | sv | th | zh