﻿id	summary	reporter	owner	description	type	status	priority	milestone	component	version	resolution	keywords	cc
480	Consider converting undulation files egm96.dat and corrcoef.dat to netCDF format	Ian Culverwell	Ian Culverwell	"The zipping and unzipping of '''ropp_pp/data/egm96.dat''' and '''ropp_pp/data/corrcoef.dat''' have caused some unnecessary trouble during the testing of ROPP9.0.  Do they really have to be text files?

Unzipped, they are quite big files:
{{{
idculv@eld037:> ls -ltr ropp_pp/data/*.dat 
-rw-r--r-- 1 idculv satsense 5292621 Jul 26 16:08 corrcoef.dat
-rw-r--r-- 1 idculv satsense 5292378 Jul 26 16:08 egm96.dat
}}}

Zipped up, their sizes are
{{{
idculv@eld037:> ls -ltr ropp_pp/data/*.dat.gz           
-rw-r--r-- 1 idculv satsense 1656792 Jul 26 16:08 corrcoef.dat.gz
-rw-r--r-- 1 idculv satsense 1802837 Jul 26 16:08 egm96.dat.gz
}}}

So zipping these test files is generally a good idea.

But '''corrcoef.dat''' contains 65341 lines like
{{{
idculv@eld037:> more ropp_pp/data/corrcoef.dat 
   0   0  -0.502745269977262810D+01   0.000000000000000000D+00                  
   1   0   0.362815722250429060D+00   0.000000000000000000D+00                  
   1   1  -0.104348574331318722D+01  -0.204625738821275838D+01                  
   2   0  -0.318495437163588324D+01   0.000000000000000000D+00                  
   2   1   0.196089895083435536D+00  -0.207358978500757085D+01                  
}}}

(These are just the coefficients in a spherical harmonic expansion.)  (Note the huge amount of blank space stored at the end of each line!)

Storing these numbers at double precision in a netCDF file would need ~ 65341 * 2 * 8 = 1045456 bytes, which is 50% smaller than '''corrcoef.dat.gz'''.  

Similarly, '''egm96.dat''' has 65338 lines like
{{{
idculv@eld037:> more ropp_pp/data/egm96.dat         
   2   0 -0.484165371736E-03  0.000000000000E+00  0.35610635E-10  0.00000000E+00
   2   1 -0.186987635955E-09  0.119528012031E-08  0.10000000E-29  0.10000000E-29
   2   2  0.243914352398E-05 -0.140016683654E-05  0.53739154E-10  0.54353269E-10
   3   0  0.957254173792E-06  0.000000000000E+00  0.18094237E-10  0.00000000E+00
}}}

The netCDF equivalent would occupy ~ 65338 * 4 * 8 = 2090816 bytes, which is only 15% bigger than '''egm96.dat.gz'''.

Overall, the unzipped netCDF files would occupy about 10% less space than the zipped text files.  More to the point, we wouldn't need to keep zipping and unzipping them.  In addition, the files would be in a standard format, which ROPP users could take and use for their own ends.

Obviously we would need to make sure that the proposed data formatting change wouldn't affect the results.  In particular, attention should be paid to the data in '''corrcoeff.dat'''.  18 significant digits are claimed for the two fields.  This is (just) more than are allowed in a netCDF double precision variable ''and'' a Fortran REAL(KIND=KIND(1.D0)) variable.  So we would need to check that the apparent degradation of accuracy in the netCDF files did not in fact have any impact in the Fortran code. 

"	enhancement	new	normal	Whenever	ROPP (all)	8.0		undulation	
