Opened 10 years ago

Last modified 5 years ago

#383 new defect

Accelerate multiprofile processing on HPC

Reported by: Ian Culverwell Owned by: idculv,cburrows
Priority: normal Milestone: Whenever
Component: ROPP (all) Version: 7.1
Keywords: HPC Cc:

Description

Some of the multifile tests in the Test Folder, like MT-IO-03, IT-FM-07 and IT-PP-01, take ages to run on the HPC. Why? It shouldn't take hours to process ~500 profiles on a supercomputer, especially when it takes minutes on a linux box. I/O?

Look into it. Possible solutions: compiler options, netCDF 'chunking' options?

Change history (2)

comment:1 by cburrows, 8 years ago

Milestone: 9.010.0

Although this was originally a problem on the Met Office IBM supercomputer, it is still an issue on the replacement - a Cray.

One consideration was that the jobs were not being submitted via the 'PBS' submission system, but were being run via ssh which may not be the most efficient way. A test of this showed the same behaviour by submitting the job either way. Furthermore (for the example of IT-1DVAR-OP), it was seen that the first ~60 profiles were processed very quickly, but subsequent profiles suddenly became processed extremely slowly. The file being appended to is very small ~1MB so it is unlikely that chunking would help. Reviewing the test folder timings for the various integration tests at version 9.0, it seems the HPC is approximately the same speed as Linux when the input multifiles contain less than 50-100 profiles, but much slower for files with more profiles. Perhaps the Lustre file system is penalising multiple file open/close commands in quick succession?? The Met Office HPC optimisation team thinks not:

"Not penalising per se (it doesn't deliberately throttle), but the metadata load from seeks will be increasing and this will harm performance. Lustre isn't unique in this regard - we had applications with exactly this sort of behaviour and subsequent performance impact on the NEC a number of years ago. Open the file once, keep it open until the end; that should help."

This would need some restructuring of the ROPP tools to modify the calls to the low-level dependency (netCDF) routines. As it is quite unlikely that ROPP would be used on a supercomputer to process large mutifiles in this way (users are more likely to embed the subroutines in larger software packages, thus avoiding the I/O issue) this ticket is being deferred to ROPP10.

comment:2 by Ian Culverwell, 5 years ago

Milestone: 10.0Whenever
Note: See TracTickets for help on using tickets.