MapReduce-MPI WWW Site - MapReduce-MPI Documentation

Getting Started

Once you have downloaded the MapReduce MPI (MR-MPI) library, you should have the tarball mapreduce.tar.gz on your machine. Unpack it with the following commands:

gunzip mapreduce.tar.gz
tar xvf mapreduce.tar 

which should create a mapreduce directory containing the following:

The doc directory contains this documentation. The oink and oinkdoc directories contain the OINK scripting interface to the MR-MPI library and its separate documentation. The examples directory contains a few simple MapReduce programs which call the MR-MPI library. These are documented by a README file in that directory and are discussed below. The mpistubs directory contains a dummy MPI library which can be used to build a MapReduce program on a serial machine. The python directory contains the Python wrapper files needed to call the MR-MPI library from Python. The src directory contains the files that comprise the MR-MPI library. The user directory contains user-contributed MapReduce programs. See the README in that directory for further details.

To build the library for use by a C++ or C program, go to the src directory and type

make 

You will see a list of machine names, each of which has their own Makefile.machine file in the src/MAKE directory. You can choose one of these and attempt to build the MR-MPI library by typing

make machine 

If you are successful, this will produce the file "libmrmpi_machine.a" which can be linked by other programs. If not, you will need to create a src/MAKE/Makefile.machine file compatible with your platform, using one of the existing files as a template.

The only settings in a Makefile.machine file that need to be specified are those for the compiler and the MPI library on your machine. If MPI is not already installed, you can install one of several free versions that work on essentially all platforms. MPICH and OpenMPI are the most common.

Within Makefile.machine you can either specify via -I and -L switches where the MPI include and library files are found, or you can use a compiler wrapper provided with MPI, like mpiCC or mpic++, which will know where those files are.

You can also build the MR-MPI library without MPI, using the dummy MPI library provided in the mpistubs directory. In this case you can only run the library on a single processor. To do this, first build the dummy MPI library, by typing "make" from within the mpistubs directory. Again, you may need to edit mpistubs/Makefile for your machine. Then from the src directory, type "make serial" which uses the src/MAKE/Makefile.serial file.

Both a C++ and C interface are part of the MR-MPI library, so it should be usable from any hi-level language. To use the library from Python, you don't need to build a *.a file from the src directory. Instead, you build it as a dynamic library from the python directory. Instructions are given in the Python interface section.


The MapReduce programs in the examples directory can be built by typing

make -f Makefile.machine 

from within the examples directory, where Makefile.machine is one of the Makefiles in the examples directory. Again, you may need to modify one of the existing ones to create a new one for your machine. Some of the example programs are provided as a C++ program, a C program, as a Python script, or as an OINK input script. Once you have built OINK, the latter can be run as, for example,

oink_linux < in.rmat 

When you run one of the example MapReduce programs or your own, if you get an immediate error about the MRMPI_BIGINT data type, you will need to edit the file src/mrtype.h and re-compile the library. Mrtype.h and the error check insures that your MPI will perform operations on 8-byte unsigned integers as required by the MR-MPI library. For the MPI on most machines, this is satisfied by the MPI data type MPI_UNSIGNED_LONG_LONG. But some machines do not support the "long long" data type, and you may need a different setting for your machine and installed MPI, such as MPI_UNSIGNED_LONG.