Annotated List of Hadoop Tutorials

Official Tutorials

The official Hadoop tutorial by Apache. Tutorial uses Hadoop classes with Java mappers and reducers to calculate word counts from several example books. Tutorial is very thorough and informative – good for first time Hadoop users to introduce all the components and ideas.

The official Hadoop tutorial by Yahoo. Is essentially the same tutorial that is provided by Apache, but has significantly more explanation and visual diagrams. Tutorial uses Hadoop classes with Java mappers and reducers to calculate word counts from several example books. After reading the tutorial by Apache, this is a good document to read through. I think it is the most thorough and informative.

Cloudera’s training videos for using Hadoop. It contains lots of videos with invaluable content, but requires registration and that you use their training virtual machine that runs using VMWare. This tutorial is good for the concepts but not the best do-it-yourself tutorial.

Good References

This supplementary tutorial details getting Hadoop up and running on an Ubuntu Linux system. It is very thorough and informative and is good for Linux users in general. Essentially the same tutorial that is provided by Apache. Uses Hadoop classes with Java mappers and reducers to calculate word counts from several example books.

This tutorial is a step by step instruction manual for converting the Apache tutorial into a Python application using Hadoop streaming. The tutorial uses Hadoop streaming with Python mappers and reducers to calculate word counts from several example books.

This tutorial is a very good resource for administering a Hadoop cluster including basic administration commands and explanations. It provides instructions for checking the status and health of a Hadoop cluster, which I haven’t seen many other examples of. Contains examples that are essentially the same as those provided by Apache.

Unique Examples

This tutorial uses Hadoop streaming with Python mappers and reducers to pull the title tags from an arbitrarily long new line delimited list of URLs. This is a good tutorial with a unique program that steps away from the traditional word counting examples. Great tutorial for developers interesting in creating spiders or web programs using Hadoop and Python.

This tutorial uses Hadoop streaming with Java mappers and reducers to pull the title tags from an arbitrarily long new line delimited list of URLs. This is a good tutorial with a unique program that steps away from the traditional word counting examples. Great tutorial for developers interesting in creating spiders or web programs using Hadoop and Java.

Natural Language Processing

This tutorial is Nitin Madnani and Jimmy Lin’s presentation from PyCon 2010 detailing the use of Hadoop and Python for large scale natural language processing. This is a great resource for starting with Hadoop for NLP including a further reading section. If you think that this resource was helpful, it would behoove you to visit Jimmy Lin’s home page at Jimmy Lin’s Home Page.

This entry was posted in Programming and tagged , , . Bookmark the permalink. Post a comment or leave a trackback: Trackback URL.