INSTALLING MPI ON LINUX CLUSTER :
A simple LAM user's manual
by D. Salvador & S. Le Saint
There are two programs able to run MPI on Linux : LAM and MPICH.
These two programs are available on Internet. The MPICH
version can be recovered on : http://www-unix.mcs.anl.gov/mpi/mpich/.
MPI for Linux runs on PC clusters. A cluster topology
is defined in a lamhosts file that contains a list of the cluster
nodes computers.
Then we have to build the virtual machine. This is done
with the recon and lamboot commands.
In order for LAM to be started on a remote
Linux machine, several requirements have to be fulfilled:
1) The machine must be reachable via the network.
2) The user must be able to remotely execute on the machine with the default remote shell program that was chosen when LAM was configured. This is usually rsh(1), but any remote shell program is acceptable such as ssh(1), etc.). Note that remote host permission must be configured such that the remote shell program will not ask for a password when a command is invoked on remote host.
3) The remote user's shell must have a search path that will locate LAM executables.
4) The remote shell's startup file must not print anything to standard error when invoked non-interactively.
If any of these requirements
is not met for any machine declared in
lamhosts,
LAM will not be able to start. By running recon first,
the user will be able to quickly identify
and correct problems in the setup that would inhibit
LAM from starting.
The
lamboot tool starts the LAM software on each of
the machines specified in the boot schema, lamhosts. The
user may wish to first run the recon(1) tool to verify that LAM can be
started.
Starting LAM is a three step procedure.
In the first step, hboot(1) is invoked
on each of the specified machines. Then each
machine allocates a dynamic port and communicates it
back to lamboot which collects them. In the third step, lamboot gives
each machine the list of machines/ports in
order to form a fully connected topology.
To close the virtual
machine session, we use the wipe command.
The wipe tool removes all traces of the LAM session
on the network.
To
start MPICH or LAM, you have to type out : mpirun -np n with
n
the number of nodes of the cluster.