EXPERIMENTAL STUDY

The Cluster

The cluster is made of twenty PCs using Linux environment.
Each PC has a 256 MBytes memory, so that the shared virtual memory of the cluster is about 5 GBytes !

The MPI Library

The first act was to install a version of MPI compatible with the Linux operating system. A version called MPICH can be downloaded on a manual.
In fact, another version, called LAM, was already installed on the environment used.

Make the cluster operational

The different steps for creating the virtual machine are described in the same manual.

The parallel program

The program used is very simple, it calculates a 1D wave. The spatial discretization is made of N points, and these points are distributed to the different nodes of the cluster.

The tests

Two different kinds of test have been performed, using the parallel code:

• Increasing the number of nodes for a calculation of constant length (the number of points on the domain is constant).
• Increasing the number of nodes with a constant load in number of points for each added CPU (N_points / N_procs = 500 000).

Results of test#1 : Increasing the number of nodes for a problem of constant length We observe, as expected, a reduction of the computing time by adding CPUs.
The limit of this gain is reached with a number of processors about 18-20. This limit can be explained by the fact that there is still a part of the program which has to be executed on a single CPU. Consequently the corresponding time cannot be reduced.
Of course the number of processors needed to obtain the best speedup depends on the number of points of the problem.

It is now possible to calculate, thanks to Amdahl's law, the fraction of the program that can be run in parallel p :

speedup(2) = T(1)/T(2) = 1.979
speedup(1) = 1

2        speedup(2)-1
==>    p = -- * ----------- = 0.989
1            speedup(2)

This value is near to 100%, and that is due to the simplicity of the program.

The next graph shows the speedup obtained for different numbers of nodes : The dotted line presents the predicted speedup calculated theorically. The experimental results fit well with it : it can be explained by the fact that the program spends a little time in communicating between nodes.

This gain of time can be effective only if the added processors are free from other works, that could interfere on the CPU of the added machine. We can observe that adding a too much loaded CPU can even slow down the whole computation. The following second experiment underlines this phenomenon: As an example, the sixth node seems to be specially loaded, and the view of the processes running at the same time on this CPU is revealing ! back to top

Results of test#2

In this experiment, the number of points which each processor has to treat is constant.
In theory, for a parallel program, the computing time should be the same for different topologies.
As in the previous example, the test has been disturbed by uncontrolled work of other users.
But a statistic approach shows a light increasing of time when adding CPUs. This tendency is generated by the increasing time spent in the communications.