INSTALLING MPI ON LINUX CLUSTER :
A simple LAM user's manual

by D. Salvador & S. Le Saint







There are two programs  able to run MPI on Linux : LAM and MPICH.

These two programs are available on Internet. The MPICH version can be recovered on : http://www-unix.mcs.anl.gov/mpi/mpich/.
MPI for Linux runs on PC clusters. A cluster topology is defined in a lamhosts file that contains a list of the cluster nodes computers.
Then we have to build the virtual machine. This is done with the recon and lamboot commands.
In  order  for LAM to be started on a remote Linux machine, several requirements have to be fulfilled:

       1)     The machine must be reachable via the network.

       2)     The user must be able to remotely  execute  on  the machine  with the default remote shell program that was chosen when LAM was configured.  This  is  usually  rsh(1),  but  any  remote  shell  program  is acceptable  such  as  ssh(1),  etc.).   Note  that remote host permission must be configured such that the remote shell program will not ask for a password when a command is invoked on remote host.

       3)     The  remote  user's  shell  must have a search path that will locate LAM executables.

       4)     The remote shell's startup file must not print anything  to  standard error when invoked non-interactively.

       If any of these requirements is not met  for  any  machine declared  in lamhosts,  LAM will not be able to start. By running recon first, the user  will  be  able  to  quickly identify  and  correct  problems  in  the setup that would inhibit LAM from starting.
 

       The  lamboot  tool  starts the LAM software on each of the machines specified in the boot schema, lamhosts.  The  user may wish to first run the recon(1) tool to verify that LAM can be started.
 Starting LAM is a three  step  procedure.   In  the  first step,  hboot(1)  is  invoked  on  each  of  the  specified machines.  Then each machine allocates a dynamic port  and communicates  it  back to lamboot which collects them.  In the third step, lamboot gives each  machine  the  list  of machines/ports  in  order to form a fully connected topology.

       To close the virtual machine session, we use the wipe command.
The wipe tool removes all traces of the LAM session  on the network.

          To start MPICH or LAM, you have to type out : mpirun -np n with n the number of nodes of the cluster.
 
 
 
 

exit