Opened 8 years ago
Last modified 8 years ago
#480 new enhancement
Consider converting undulation files egm96.dat and corrcoef.dat to netCDF format
Reported by: | Ian Culverwell | Owned by: | Ian Culverwell |
---|---|---|---|
Priority: | normal | Milestone: | Whenever |
Component: | ROPP (all) | Version: | 8.0 |
Keywords: | undulation | Cc: |
Description
The zipping and unzipping of ropp_pp/data/egm96.dat and ropp_pp/data/corrcoef.dat have caused some unnecessary trouble during the testing of ROPP9.0. Do they really have to be text files?
Unzipped, they are quite big files:
idculv@eld037:> ls -ltr ropp_pp/data/*.dat -rw-r--r-- 1 idculv satsense 5292621 Jul 26 16:08 corrcoef.dat -rw-r--r-- 1 idculv satsense 5292378 Jul 26 16:08 egm96.dat
Zipped up, their sizes are
idculv@eld037:> ls -ltr ropp_pp/data/*.dat.gz -rw-r--r-- 1 idculv satsense 1656792 Jul 26 16:08 corrcoef.dat.gz -rw-r--r-- 1 idculv satsense 1802837 Jul 26 16:08 egm96.dat.gz
So zipping these test files is generally a good idea.
But corrcoef.dat contains 65341 lines like
idculv@eld037:> more ropp_pp/data/corrcoef.dat 0 0 -0.502745269977262810D+01 0.000000000000000000D+00 1 0 0.362815722250429060D+00 0.000000000000000000D+00 1 1 -0.104348574331318722D+01 -0.204625738821275838D+01 2 0 -0.318495437163588324D+01 0.000000000000000000D+00 2 1 0.196089895083435536D+00 -0.207358978500757085D+01
(These are just the coefficients in a spherical harmonic expansion.) (Note the huge amount of blank space stored at the end of each line!)
Storing these numbers at double precision in a netCDF file would need ~ 65341 * 2 * 8 = 1045456 bytes, which is 50% smaller than corrcoef.dat.gz.
Similarly, egm96.dat has 65338 lines like
idculv@eld037:> more ropp_pp/data/egm96.dat 2 0 -0.484165371736E-03 0.000000000000E+00 0.35610635E-10 0.00000000E+00 2 1 -0.186987635955E-09 0.119528012031E-08 0.10000000E-29 0.10000000E-29 2 2 0.243914352398E-05 -0.140016683654E-05 0.53739154E-10 0.54353269E-10 3 0 0.957254173792E-06 0.000000000000E+00 0.18094237E-10 0.00000000E+00
The netCDF equivalent would occupy ~ 65338 * 4 * 8 = 2090816 bytes, which is only 15% bigger than egm96.dat.gz.
Overall, the unzipped netCDF files would occupy about 10% less space than the zipped text files. More to the point, we wouldn't need to keep zipping and unzipping them. In addition, the files would be in a standard format, which ROPP users could take and use for their own ends.
Obviously we would need to make sure that the proposed data formatting change wouldn't affect the results. In particular, attention should be paid to the data in corrcoeff.dat. 18 significant digits are claimed for the two fields. This is (just) more than are allowed in a netCDF double precision variable and a Fortran REAL(KIND=KIND(1.D0)) variable. So we would need to check that the apparent degradation of accuracy in the netCDF files did not in fact have any impact in the Fortran code.
Attachments (2)
Change history (7)
comment:1 by , 8 years ago
comment:2 by , 8 years ago
In fact, it turns out that we don't even read the last two columns of egm96.dat, so they can be removed from the netCDF versions. Further, we can put all the information in one netCDF file, say und_coeffs.nc (attached, as is its Fortran generation program und_dat2nc.f90). This is about 2.1 MB in size:
-rw-r--r-- 1 idculv satsense 2091788 Jan 3 11:26 und_coeffs.nc
This is 60% of the combined size of the gzipped ascii files egm96.dat.gz and corrcoef.dat.gz.
(Zipping up und_coeffs.nc makes practically no difference to its size. In any case, there are already files in the ROPP distribution (eg ropp_1dvar/data/IT-1DVAR-01_b.nc at 2.7 MB) that are bigger than this. So we could live with an unzipped und_coeffs.nc.)
comment:3 by , 8 years ago
(Why are the last hc
and cc
so much smaller than the others?
idculv@eld037:> tail -5 egm96.dat 360 356 0.437069296408E-10 -0.104331448796E-09 0.50033977E-10 0.50033977E-10 360 357 -0.628042366728E-11 0.106635915741E-09 0.50033977E-10 0.50033977E-10 360 358 0.709604781531E-10 0.691761006753E-10 0.50033977E-10 0.50033977E-10 360 359 0.183971631467E-10 -0.310123632209E-10 0.50033977E-10 0.50033977E-10 360 360 -0.447516389678E-24 -0.830224945525E-10 0.50033977E-10 0.50033977E-10 idculv@eld037:> tail -5 corrcoef.dat 360 356 -0.554638277560201478D-02 -0.138393269720064329D-02 360 357 -0.119289584478972659D-02 -0.178910411538898653D-02 360 358 -0.444423721432539963D-02 -0.621882922713206093D-02 360 359 -0.738649105727355159D-02 0.547503707694727275D-03 360 360 0.205445744870012291D-17 0.725246966198756643D-02
)
comment:4 by , 8 years ago
How are we to read the data in the proposed netCDF file? The netCDF reading routines are in ropp_io, and the undulation calculating routine, SUBROUTINE Datum_HMSL, is in ropp_utils/coordinates/earth.f90. As a matter of design principle, ropp_utils does not depend on ropp_io. I suppose we would just have to use the 'native' netCDF library, by means of
USE netcdf ... CALL check(nf90_open(ofile, nf90_nowrite, ncid)) CALL check(nf90_inq_varid(ncid, 'hc', hc_varid)) CALL check(nf90_get_var(ncid, hc_varid, hc)) CALL check(nf90_close(ncid))
(as for the attached und_dat2nc.f90), rather than the ROPP netCDF routines
USE ncdf ... CALL ncdf_open(file) CALL ncdf_getvar('hc', hc) CALL ncdf_close()
In the first case we would have to augment the link statement with -lnetcdf -lnetcdff
. I think this is probably OK - potential ROPP users would struggle to do anything without any netCDF routines. If they really want to use Datum_HMSL
without netCDF, we can point them to the earlier code and datasets.
Actually doing the software engineering of this might require thinking caps to be put on. For starters, I think we'd need to include something like
CM_CHECK_MODULE(netcdf) AM_CONDITIONAL(HAVE_NETCDF, test x$HAVE_MODULE_netcdf = xyes) if test x$HAVE_MODULE_netcdf = xno ; then AC_MSG_WARN([]) AC_MSG_WARN([PACKAGE NETCDF NOT FOUND]) AC_MSG_WARN([THIS PACKAGE REQUIRES NETCDF TO BE INSTALLED FIRST.]) AC_MSG_WARN([*** NOTE: ***]) AC_MSG_WARN([*** Users wishing to install ROPP_IO must first have ***]) AC_MSG_WARN([*** the NETCDF package installed before building ***]) AC_MSG_WARN([*** this package. See ROPP Release Notes or ROPP ***]) AC_MSG_WARN([*** User Guide for further details. ***]) AC_MSG_WARN([]) AC_MSG_ERROR([Module NETCDF not found]) fi
and
CM_CHECK_LIB(netcdf, nf_open, -lnetcdff -lnetcdf) AM_CONDITIONAL(HAVE_NETCDF, test x$HAVE_LIBRARY_netcdf = xyes) if test x$HAVE_LIBRARY_netcdf = xno ; then AC_MSG_WARN([]) AC_MSG_WARN([LIBRARY NETCDF (Fortran) NOT FOUND]) AC_MSG_WARN([THIS PACKAGE REQUIRES NETCDF TO BE INSTALLED FIRST.]) AC_MSG_WARN([*** NOTE: ***]) AC_MSG_WARN([*** Users wishing to install ROPP_IO must first have ***]) AC_MSG_WARN([*** the NETCDF package installed before building ***]) AC_MSG_WARN([*** this package. See ROPP Release Notes or ROPP ***]) AC_MSG_WARN([*** User Guide for further details. ***]) AC_MSG_WARN([]) AC_MSG_ERROR([Library NETCDF not found]) fi
in ropp_utils/configure.ac, as is done in ropp_io/configure.ac.
comment:5 by , 8 years ago
(Note, finally for now, that the output of und_dat2nc.f90, namely
From egm96.dat: hc(1:10) = 0.00000000000000000000E+00 0.00000000000000000000E+00 0.00000000000000000000E+00 -0.48416537173599999769E-03 -0.18698763595500000226E-09 0.24391435239799998455E-05 0.95725417379199996818E-06 0.20299888218400000784E-05 0.90462776860499997739E-06 0.72107265705700001092E-06 From corrcoef.dat: cc(1:10) = -0.50274526997726285416E+01 0.36281572225042907354E+00 -0.10434857433131872195E+01 -0.31849543716358832413E+01 0.19608989508343552255E+00 0.25007644753435060991E+01 0.45974797592494542897E+01 -0.11930287232961291066E+00 0.28147693676854443900E+01 0.43486814175551880002E+00 From und_coeffs.nc: hc(1:10) = 0.00000000000000000000E+00 0.00000000000000000000E+00 0.00000000000000000000E+00 -0.48416537173599999769E-03 -0.18698763595500000226E-09 0.24391435239799998455E-05 0.95725417379199996818E-06 0.20299888218400000784E-05 0.90462776860499997739E-06 0.72107265705700001092E-06 egm96.dat - und_coeffs.nc: hc(1:10) = 0.00000000000000000000E+00 0.00000000000000000000E+00 0.00000000000000000000E+00 0.00000000000000000000E+00 0.00000000000000000000E+00 0.00000000000000000000E+00 0.00000000000000000000E+00 0.00000000000000000000E+00 0.00000000000000000000E+00 0.00000000000000000000E+00 From und_coeffs.nc: cc(1:10) = -0.50274526997726285416E+01 0.36281572225042907354E+00 -0.10434857433131872195E+01 -0.31849543716358832413E+01 0.19608989508343552255E+00 0.25007644753435060991E+01 0.45974797592494542897E+01 -0.11930287232961291066E+00 0.28147693676854443900E+01 0.43486814175551880002E+00 egm96.dat - und_coeffs.nc: cc(1:10) = 0.00000000000000000000E+00 0.00000000000000000000E+00 0.00000000000000000000E+00 0.00000000000000000000E+00 0.00000000000000000000E+00 0.00000000000000000000E+00 0.00000000000000000000E+00 0.00000000000000000000E+00 0.00000000000000000000E+00 0.00000000000000000000E+00
suggests that in fact the numerical precision of the .dat files are exactly preserved in the .nc file, at least as far as a Fortran program that reads the data at double precision is concerned. This would appear to answer the final concern of the original posting.)
For the record, some dummy files of the right dimensions have sizes: