Two test case were used for this work the Bump case and
the Pilot 2D case. We tried to run a much larger case (combustion chamber
in 3D, Pilot 3D) without success.
This case is a two-dimensional bump developing a shock wave on its extrados. The mesh is made of 10000 cells. The number of time iterations is set to 10000. The best speed-up is obtained with 2 processors. With more than 2 processors the CPU time continues to drop but the time spent in communication between processors increases. As a result the overall user time gets higher.
The memory used decreases with the number of processors. The master processes requires more memory than the slaves. The overall needed memory needed for the computation increases withe the number of processes as the border nodes of each process are duplicated in order to allow the reconnection over all the domain.
This test case seems not severe enough to really describe the performances of the cluster.
This case is a two-dimensional combustion chamber. The
mesh is made of 20000 cells. The number of iterations is set to 100. The
best speed-up is obtained with 3 processors but the gain between 2 and
3 procs is not significant. The optimum computation would therefore be
performed with 2 processors. If we compare these results with the one obtained
for the bump, the effect of the larger number of nodes appear as the adavantage
of multiple processors is much sharper in this case.
To increase the size of the calculation the same case is run with a 4th order scheme. The computation time therefore is multiplied by 2. We could expect the optimum to be reached with 4 processors but the 4th order scheme also influences the communication: two neighbours processes have to exchange twice more nodes. That's why the optimum configuration is still the use of 2 nodes.
In order to qualify the performances of the cluster we
have compared them with the result of an Origin2000 (8 processors machine)
owned by Cerfacs. The parallel machine spends a very little time
in communication thanks to its special architecture. The best speed-up
is obtained with 7 procs, the optimum configuration is 3 procs. The computation
times are equivalent with the cluster for the same test case.