Two test case were used for this work the Bump case and the Pilot 2D case. We tried to run a much larger case (combustion chamber in 3D, Pilot 3D) without success.

This case is a two-dimensional bump developing a shock wave on its extrados. The mesh is made of 10000 cells. The number of time iterations is set to 10000. The best speed-up is obtained with 2 processors. With more than 2 processors the CPU time continues to drop but the time spent in communication between processors increases. As a result the overall user time gets higher.
The memory used decreases with the number of processors. The master processes requires more memory than the slaves. The overall needed memory needed for the computation increases withe the number of processes as the border nodes of each process are duplicated in order to allow the reconnection over all the domain.

This test case seems not severe enough to really describe the performances of the cluster.


This case is a two-dimensional combustion chamber. The mesh is made of 20000 cells. The number of iterations is set to 100. The best speed-up is obtained with 3 processors but the gain between 2 and 3 procs is not significant. The optimum computation would therefore be performed with 2 processors. If we compare these results with the one obtained for the bump, the effect of the larger number of nodes appear as the adavantage of multiple processors is much sharper in this case.

To increase the size of the calculation the same case is run with a 4th order scheme.  The computation time therefore is multiplied by 2. We could expect the optimum to be reached with 4 processors but the 4th order scheme also influences the communication: two neighbours processes have to exchange twice more nodes. That's why the optimum configuration is still the use of 2 nodes.

In order to qualify the performances of the cluster we have compared them with the result of an Origin2000 (8 processors machine) owned by Cerfacs.  The parallel machine spends a very little time in communication thanks to its special architecture. The best speed-up is obtained with 7 procs, the optimum configuration is 3 procs. The computation times are equivalent with the cluster for the same test case.