Opened 6 years ago
Closed 3 years ago
#554 closed task (wontfix)
Removed repeated synchronization when writing files [1 d]
Reported by: | Ian Culverwell | Owned by: | Ian Culverwell |
---|---|---|---|
Priority: | normal | Milestone: | ROPP9.1 carry over |
Component: | ropp_io | Version: | 11.0 |
Keywords: | Cc: | Stig Syndergaard |
Description
https://trac.romsaf.org/ropp/changeset/5427/ropp_src/branches/dev/Share/dmi_trunk_9.0/ropp_io/ncdf/ncdf_putgetvar_acode.m4 https://trac.romsaf.org/ropp/changeset/5427/ropp_src/branches/dev/Share/dmi_trunk_9.0/ropp_io/ncdf/ncdf_putgetvar_schar.m4 https://trac.romsaf.org/ropp/changeset/5427/ropp_src/branches/dev/Share/dmi_trunk_9.0/ropp_io/ncdf/ncdf_putgetvar_scode.m4 https://trac.romsaf.org/ropp/changeset/5427/ropp_src/branches/dev/Share/dmi_trunk_9.0/ropp_io/ncdf/ncdf_putvar.f90
Original changeset:
Removed repeated synchronization when writing files: https://trac.romsaf.org/ropp/changeset/5281
Attachments (6)
Change history (20)
comment:1 by , 5 years ago
comment:2 by , 5 years ago
I don't see the 60-fold speed-up that Stig reports. When I run ropp_fm_bg2ro_1d through the 500 profiles of the ROPP test folder file IT-FM-07.nc, five times, I find:
ROPP9.1: IT-FM-07_cntl1.out:Wed Oct 2 08:16:37 BST 2019 IT-FM-07_cntl1.out:Wed Oct 2 08:18:03 BST 2019 = 86 secs ROPP10.0: IT-FM-07_test1.out:Wed Oct 2 08:18:03 BST 2019 IT-FM-07_test1.out:Wed Oct 2 08:19:28 BST 2019 = 85 secs
No significant difference.
When I do the same on the 1000 profiles of the ROPP test folder file IT-FM-04.nc, five times, I find:
ROPP9.1: Wed Oct 2 08:02:33 BST 2019 Wed Oct 2 08:06:11 BST 2019 = 371 - 153 = 218 secs ROPP10.0: Wed Oct 2 08:06:11 BST 2019 Wed Oct 2 08:09:51 BST 2019 = 591 - 371 = 220 secs
Again, no difference. See the attached test1.sh for details.
comment:3 by , 5 years ago
When I run ropp_pp_occ_tool through the 50 profiles in the ROPP test folder file IT-PP-02.nc, 5 times, I find
IT-PP-02_cntl1.out:Wed Oct 2 08:57:35 BST 2019 IT-PP-02_cntl1.out:Wed Oct 2 09:06:51 BST 2019 = 556 secs IT-PP-02_test1.out:Wed Oct 2 09:06:51 BST 2019 IT-PP-02_test1.out:Wed Oct 2 09:13:43 BST 2019 = 412 secs
The removal of the file synchronisation makes it run about 25% faster.
comment:5 by , 5 years ago
These results are not exactly overwhelming evidence for the change. Ask Stig for his timings. Perhaps it's a Ubuntu thing? Hold off implementing in ROPP10 until we have agreement to proceed from the ROPP GG - there may be drawbacks to removing file synchronisation.
comment:6 by , 5 years ago
I'm not sure I had recompiled the PP and FM routines. When I do so:
unix:> grep Oct IT-FM-07_cntl1.out IT-FM-07_test1.out IT-FM-07_cntl1.out:Wed Oct 2 11:00:33 BST 2019 IT-FM-07_cntl1.out:Wed Oct 2 11:02:02 BST 2019 = 89 secs IT-FM-07_test1.out:Wed Oct 2 11:19:05 BST 2019 IT-FM-07_test1.out:Wed Oct 2 11:20:11 BST 2019 = 66 secs 25% faster. unix:> grep Oct IT-FM-04_????1.out IT-FM-04_cntl1.out:Wed Oct 2 08:02:33 BST 2019 IT-FM-04_cntl1.out:Wed Oct 2 08:06:11 BST 2019 = 218 secs IT-FM-04_test1.out:Wed Oct 2 11:22:53 BST 2019 IT-FM-04_test1.out:Wed Oct 2 11:25:13 BST 2019 = 140 secs 36% faster. unix:> grep Oct IT-PP-02_????1.out IT-PP-02_cntl1.out:Wed Oct 2 08:57:35 BST 2019 IT-PP-02_cntl1.out:Wed Oct 2 09:06:51 BST 2019 = 556 secs IT-PP-02_test1.out:Wed Oct 2 11:31:30 BST 2019 IT-PP-02_test1.out:Wed Oct 2 11:38:20 BST 2019 = 410 secs 26% faster.
It's still not great.
comment:7 by , 5 years ago
(Actual figures, after doing
aclocal -I m4 --force automake -a -c autoconf
as required because the macros changed:
unix:> grep Oct IT-FM-07_cntl1.out IT-FM-07_test1.out IT-FM-07_cntl1.out:Wed Oct 2 11:00:33 BST 2019 IT-FM-07_cntl1.out:Wed Oct 2 11:02:02 BST 2019 = 89 secs IT-FM-07_test1.out:Wed Oct 2 12:07:26 BST 2019 IT-FM-07_test1.out:Wed Oct 2 12:08:31 BST 2019 = 65 secs 27% faster. unix:> grep Oct IT-FM-04_????1.out IT-FM-04_cntl1.out:Wed Oct 2 08:02:33 BST 2019 IT-FM-04_cntl1.out:Wed Oct 2 08:06:11 BST 2019 = 218 secs IT-FM-04_test1.out:Wed Oct 2 12:11:07 BST 2019 IT-FM-04_test1.out:Wed Oct 2 12:13:26 BST 2019 = 139 secs 36% faster. unix:> grep Oct IT-PP-02_????1.out IT-PP-02_cntl1.out:Wed Oct 2 08:57:35 BST 2019 IT-PP-02_cntl1.out:Wed Oct 2 09:06:51 BST 2019 = 556 secs IT-PP-02_test1.out:Wed Oct 2 12:17:41 BST 2019 IT-PP-02_test1.out:Wed Oct 2 12:24:30 BST 2019 = 409 secs 26% faster.
No difference.)
comment:8 by , 5 years ago
Having rebuilt the libs with these, leave them be for the moment. Subsequent testing of other code changes will then highlight if they cause problems. But take them out of the ROPP100_prototype code for now.
comment:9 by , 5 years ago
I could not reproduce the results with ROPP 9. But I was able to reproduce the slowing down with ROPP 8.1, which is where I found the problem originally. To make a similar test as above using ropp_fm_bg2ro_1d, I generated a multifile with 1210 profiles and modified test.1sh to run with those. Running the loop (only twice instead of 5 times) first using dmi_trunk_9.0 based on ROPP 9.0, and then the old dmi_trunk_8.1 based on ROPP 8.1, with the repeated synchronization reintroduced as a test, I got:
IT-FM-DMI_cntl1.out:Mon Feb 3 11:25:28 CET 2020 IT-FM-DMI_cntl1.out:Mon Feb 3 11:26:51 CET 2020 = 83 sec IT-FM-DMI_test1.out:Mon Feb 3 11:26:51 CET 2020 IT-FM-DMI_test1.out:Mon Feb 3 11:55:46 CET 2020 = 1735 sec A factor of 21.
The factor of 60 was originally observed using the ropp_pp_occ_tool, I did not try that here. Note that ROPP 8.1 used the netcdf-4.1.3 library, whereas ROPP 9.0 uses netcdf-fortran-4.4.3 and netcdf-c-4.4.0. Most likely the problem is related to the netcdf-4.1.3 library, and thus not an issue in more recent versions of ROPP.
comment:10 by , 5 years ago
Thank you for doing the experiment. I agree it seems likely that the problem arises from use of the old version of the netCDF library.
I am instinctively reluctant to change code without good reason (because you never know how many users might be affected by a change), so I would prefer to leave this change out of the prototype code for now. To be discussed, so I'll leave the ticket open. Meanwhile, I will upload time-stamped versions of the modified files to this ticket, in case we want to resurrect the change.
by , 5 years ago
Attachment: | ncdf_putgetvar_scode.m4_01102019 added |
---|
ncdf_putgetvar_scode.m4_01102019
by , 5 years ago
Attachment: | ncdf_putgetvar_schar.m4_01102019 added |
---|
ncdf_putgetvar_schar.m4_01102019
by , 5 years ago
Attachment: | ncdf_putgetvar_acode.m4_01102019 added |
---|
ncdf_putgetvar_acode.m4_01102019
comment:11 by , 5 years ago
Cc: | added |
---|
comment:12 by , 5 years ago
Resolution: | → fixed |
---|---|
Status: | new → closed |
Stig writes:
I would be okay with leaving it as is now that it is no longer a problem in newer versions of ROPP. -Stig
Closing the ticket.
comment:13 by , 3 years ago
Resolution: | fixed |
---|---|
Status: | closed → reopened |
Version: | 9.0 → 11.0 |
comment:14 by , 3 years ago
Resolution: | → wontfix |
---|---|
Status: | reopened → closed |
Interesting. (Up-to-date reference: https://www.unidata.ucar.edu/software/netcdf/fortran/docs/f90_datasets.html#f90-nf90_sync). Factor of 60 speed-up is well worth having. I suggest we implement this (the removal of the call to
nf90_sync
), but do not introduceNF90_SHARE
flag in the read/writes. This will need careful documentation in the User Guide and Change Log.