Opened 6 years ago

Closed 3 years ago

#554 closed task (wontfix)

Removed repeated synchronization when writing files [1 d]

Reported by: Ian Culverwell Owned by: Ian Culverwell
Priority: normal Milestone: ROPP9.1 carry over
Component: ropp_io Version: 11.0
Keywords: Cc: Stig Syndergaard

Description

Attachments (6)

test1.sh (1.6 KB ) - added by Ian Culverwell 5 years ago.
test1.sh
test2.sh (1.7 KB ) - added by Ian Culverwell 5 years ago.
test2.sh
ncdf_putvar.f90_01102019 (322.1 KB ) - added by Ian Culverwell 5 years ago.
ncdf_putvar.f90_01102019
ncdf_putgetvar_scode.m4_01102019 (8.7 KB ) - added by Ian Culverwell 5 years ago.
ncdf_putgetvar_scode.m4_01102019
ncdf_putgetvar_schar.m4_01102019 (6.1 KB ) - added by Ian Culverwell 5 years ago.
ncdf_putgetvar_schar.m4_01102019
ncdf_putgetvar_acode.m4_01102019 (10.0 KB ) - added by Ian Culverwell 5 years ago.
ncdf_putgetvar_acode.m4_01102019

Download all attachments as: .zip

Change history (20)

comment:1 by Ian Culverwell, 5 years ago

Interesting. (Up-to-date reference: https://www.unidata.ucar.edu/software/netcdf/fortran/docs/f90_datasets.html#f90-nf90_sync). Factor of 60 speed-up is well worth having. I suggest we implement this (the removal of the call to nf90_sync), but do not introduce NF90_SHARE flag in the read/writes. This will need careful documentation in the User Guide and Change Log.

comment:2 by Ian Culverwell, 5 years ago

I don't see the 60-fold speed-up that Stig reports. When I run ropp_fm_bg2ro_1d through the 500 profiles of the ROPP test folder file IT-FM-07.nc, five times, I find:

ROPP9.1: 
IT-FM-07_cntl1.out:Wed Oct  2 08:16:37 BST 2019
IT-FM-07_cntl1.out:Wed Oct  2 08:18:03 BST 2019
 = 86 secs

ROPP10.0:
IT-FM-07_test1.out:Wed Oct  2 08:18:03 BST 2019
IT-FM-07_test1.out:Wed Oct  2 08:19:28 BST 2019
 = 85 secs

No significant difference.

When I do the same on the 1000 profiles of the ROPP test folder file IT-FM-04.nc, five times, I find:

ROPP9.1: 
Wed Oct  2 08:02:33 BST 2019
Wed Oct  2 08:06:11 BST 2019
= 371 - 153 = 218 secs

ROPP10.0:
Wed Oct  2 08:06:11 BST 2019
Wed Oct  2 08:09:51 BST 2019
 = 591 - 371 = 220 secs

Again, no difference. See the attached test1.sh for details.

by Ian Culverwell, 5 years ago

Attachment: test1.sh added

test1.sh

by Ian Culverwell, 5 years ago

Attachment: test2.sh added

test2.sh

comment:3 by Ian Culverwell, 5 years ago

When I run ropp_pp_occ_tool through the 50 profiles in the ROPP test folder file IT-PP-02.nc, 5 times, I find

IT-PP-02_cntl1.out:Wed Oct  2 08:57:35 BST 2019
IT-PP-02_cntl1.out:Wed Oct  2 09:06:51 BST 2019
 = 556 secs

IT-PP-02_test1.out:Wed Oct  2 09:06:51 BST 2019
IT-PP-02_test1.out:Wed Oct  2 09:13:43 BST 2019
 = 412 secs

The removal of the file synchronisation makes it run about 25% faster.

comment:4 by Ian Culverwell, 5 years ago

Latter test made with the attached test script test2.sh.

comment:5 by Ian Culverwell, 5 years ago

These results are not exactly overwhelming evidence for the change. Ask Stig for his timings. Perhaps it's a Ubuntu thing? Hold off implementing in ROPP10 until we have agreement to proceed from the ROPP GG - there may be drawbacks to removing file synchronisation.

comment:6 by Ian Culverwell, 5 years ago

I'm not sure I had recompiled the PP and FM routines. When I do so:

unix:> grep Oct IT-FM-07_cntl1.out IT-FM-07_test1.out
IT-FM-07_cntl1.out:Wed Oct  2 11:00:33 BST 2019
IT-FM-07_cntl1.out:Wed Oct  2 11:02:02 BST 2019 = 89 secs
IT-FM-07_test1.out:Wed Oct  2 11:19:05 BST 2019
IT-FM-07_test1.out:Wed Oct  2 11:20:11 BST 2019 = 66 secs
25% faster.

unix:> grep Oct IT-FM-04_????1.out
IT-FM-04_cntl1.out:Wed Oct  2 08:02:33 BST 2019
IT-FM-04_cntl1.out:Wed Oct  2 08:06:11 BST 2019 = 218 secs
IT-FM-04_test1.out:Wed Oct  2 11:22:53 BST 2019
IT-FM-04_test1.out:Wed Oct  2 11:25:13 BST 2019 = 140 secs
36% faster.

unix:> grep Oct IT-PP-02_????1.out
IT-PP-02_cntl1.out:Wed Oct  2 08:57:35 BST 2019
IT-PP-02_cntl1.out:Wed Oct  2 09:06:51 BST 2019 = 556 secs
IT-PP-02_test1.out:Wed Oct  2 11:31:30 BST 2019
IT-PP-02_test1.out:Wed Oct  2 11:38:20 BST 2019 = 410 secs
26% faster.

It's still not great.

comment:7 by Ian Culverwell, 5 years ago

(Actual figures, after doing

aclocal -I m4 --force
automake -a -c
autoconf

as required because the macros changed:

unix:> grep Oct IT-FM-07_cntl1.out IT-FM-07_test1.out
IT-FM-07_cntl1.out:Wed Oct  2 11:00:33 BST 2019
IT-FM-07_cntl1.out:Wed Oct  2 11:02:02 BST 2019 = 89 secs
IT-FM-07_test1.out:Wed Oct  2 12:07:26 BST 2019
IT-FM-07_test1.out:Wed Oct  2 12:08:31 BST 2019 = 65 secs
27% faster.

unix:> grep Oct IT-FM-04_????1.out
IT-FM-04_cntl1.out:Wed Oct  2 08:02:33 BST 2019
IT-FM-04_cntl1.out:Wed Oct  2 08:06:11 BST 2019 = 218 secs
IT-FM-04_test1.out:Wed Oct  2 12:11:07 BST 2019
IT-FM-04_test1.out:Wed Oct  2 12:13:26 BST 2019 = 139 secs
36% faster.

unix:> grep Oct IT-PP-02_????1.out
IT-PP-02_cntl1.out:Wed Oct  2 08:57:35 BST 2019
IT-PP-02_cntl1.out:Wed Oct  2 09:06:51 BST 2019 = 556 secs
IT-PP-02_test1.out:Wed Oct  2 12:17:41 BST 2019
IT-PP-02_test1.out:Wed Oct  2 12:24:30 BST 2019 = 409 secs
26% faster.

No difference.)

comment:8 by Ian Culverwell, 5 years ago

Having rebuilt the libs with these, leave them be for the moment. Subsequent testing of other code changes will then highlight if they cause problems. But take them out of the ROPP100_prototype code for now.

comment:9 by Stig Syndergaard, 5 years ago

I could not reproduce the results with ROPP 9. But I was able to reproduce the slowing down with ROPP 8.1, which is where I found the problem originally. To make a similar test as above using ropp_fm_bg2ro_1d, I generated a multifile with 1210 profiles and modified test.1sh to run with those. Running the loop (only twice instead of 5 times) first using dmi_trunk_9.0 based on ROPP 9.0, and then the old dmi_trunk_8.1 based on ROPP 8.1, with the repeated synchronization reintroduced as a test, I got:

IT-FM-DMI_cntl1.out:Mon Feb  3 11:25:28 CET 2020
IT-FM-DMI_cntl1.out:Mon Feb  3 11:26:51 CET 2020 = 83 sec
IT-FM-DMI_test1.out:Mon Feb  3 11:26:51 CET 2020
IT-FM-DMI_test1.out:Mon Feb  3 11:55:46 CET 2020 = 1735 sec
A factor of 21.

The factor of 60 was originally observed using the ropp_pp_occ_tool, I did not try that here. Note that ROPP 8.1 used the netcdf-4.1.3 library, whereas ROPP 9.0 uses netcdf-fortran-4.4.3 and netcdf-c-4.4.0. Most likely the problem is related to the netcdf-4.1.3 library, and thus not an issue in more recent versions of ROPP.

comment:10 by Ian Culverwell, 5 years ago

Thank you for doing the experiment. I agree it seems likely that the problem arises from use of the old version of the netCDF library.

I am instinctively reluctant to change code without good reason (because you never know how many users might be affected by a change), so I would prefer to leave this change out of the prototype code for now. To be discussed, so I'll leave the ticket open. Meanwhile, I will upload time-stamped versions of the modified files to this ticket, in case we want to resurrect the change.

by Ian Culverwell, 5 years ago

Attachment: ncdf_putvar.f90_01102019 added

ncdf_putvar.f90_01102019

by Ian Culverwell, 5 years ago

ncdf_putgetvar_scode.m4_01102019

by Ian Culverwell, 5 years ago

ncdf_putgetvar_schar.m4_01102019

by Ian Culverwell, 5 years ago

ncdf_putgetvar_acode.m4_01102019

comment:11 by Ian Culverwell, 5 years ago

Cc: Stig Syndergaard added

comment:12 by Ian Culverwell, 5 years ago

Resolution: fixed
Status: newclosed

Stig writes:

I would be okay with leaving it as is now that it is no longer a problem in newer versions of ROPP.
-Stig

Closing the ticket.

comment:13 by Ian Culverwell, 3 years ago

Resolution: fixed
Status: closedreopened
Version: 9.011.0

comment:14 by Ian Culverwell, 3 years ago

Resolution: wontfix
Status: reopenedclosed
Note: See TracTickets for help on using tickets.