Opened 17 years ago

Closed 15 years ago

#122 closed enhancement (fixed)

Rationalise missing data values

Reported by: Dave Offiler Owned by: Dave Offiler
Priority: normal Milestone: 3.0
Component: ropp_io Version: 1.0
Keywords: Missing data value, range check Cc:

Description

Currently, ropp_io_init() intialises the ROPP structures with hard-coded missing data values (MDV). These values vary with the parameter, and were designed to fit the field widths of the ROPP text formatted files, whilst being clearly 'special', out-of-range values. As such the MDV can be -9.0, -99.0, -999.0 etc. Testing for MDV is currently a hit-and-miss affair because of this parameter dependency and also because files not created with ROPP may have completely different MDVs. (This is the case with GFZ ROPP-format text files today, where for instance Temp_sigma MDV in ROPP is -999.0 but GFZ set missing values to -9.0.)

Definition and use of MDVs needs to be rationalised by implementing the following requirements:

1) The only place where MDVs are actually defined is in the hard-coded ropp_io_init() routine. MDVs should be more openly defined, for instance as new structures complementing the valid range structures. In principle, the MDV default values so defined could be over-ridden by the user.

2) The default MDVs should be documented, e.g. in the ROPP Interface File Format document parameter tables.

3) Use of MDVs for initialisation and testing should then use the structure values and not assumed hard-coded values.

4) Create a user-callable routine specifically to check for valid parameter values (using the range structures) and to substute the parameter-dependent MDV (from the above MDV structures) for out-of-range parameter values. NB some parameters come in sets and may need to be checked in combination; it may not be appropropriate to check them individually - e.g. POD (X,Y,Z) is all zero for POD MDV, but zero is valid for any single POD component.

5) Call this check routine after reading or before writing a file, or before thinning a profile. It should be possible for a user to optionally suppress this default behaviour.

6) MDVs should be protected from units conversions.

7) All parameter attributes should be consistently saved to the netCDF file and read back.

The above should be implemented in roughly that order of prority. Some (e.g. 1,2,3) should be targetted for V1.1 release; the others could be implemented in later v1.x releases depending on the complexity of the implementaton, resources and pressure to release v1.1, e.g. for critical bug fixes elsewhere.

For later releases, consideration should be given to the possibility of defining a single common (parameter-independent) MDV for internal and netCDF file use, reserving parameter-dependent MDVs only for text-formatted I/O. The check routine could have an option flag in the argument list to switch between the two MDV sets.

Indeed, we could (and eventually should) go futher and remove direct support for text files from the generic ROPP read/write routines, and perhaps provide only a simple, stand-alone text/netCDF converter. At some point, the text file should be deprecated and then become unsupported entirely in favour of the netCDF format. Text in CDL for manual inspection can be provided by ncdump (and ncgen can re-import CDL).

Change history (7)

comment:1 by Dave Offiler, 16 years ago

Issue of MDVs has been rationalised thus (see numbering above) for release v1.1:

1) MDVs are defined in ropp_io_types.f90. Essentially, there are two : ropp_io_mdfv=-9999.9 and ropp_io_zero=0.0. Values are intialised in ropp_io_init() to one of these two MDVs appropriate to the data type - e.g. POD triplets are all set to zero. The exception is to date/time values which are initialised to values which can still be printed (or internally converted to strings) without generating '*'. All numeric variables have valid ranges which are included in the main ROPP structure.

2) The next issue of the ROPP Interface File document will show the MDV for every parameter.

3) Parameter ranges in the main ROPP structure are used for range checking.

4) Added new routine ropp_io_rangecheck(). All numeric values ranges checked using the ROPP structure ranges and MDV substituted consistently with initialisation. In addition, all string variables are checked for valid character set(s) and all profiles checked for valid basic coordinate (time/height) values and any invalid levels removed.

5) Called (a) by ropp_io_thin() before actual thinning; (b) by ropp2ropp after processing and (c) by ropp_io_write() prior to writing. The latter has an optional flag in the argument list to suppress range checking. This is used by the (new) test2ropp tool to allow deliberately invalid data to be output for testing other tools' use of ropp_io_rangecheck().

6) Units conversion only on parameter value and associated range values

7) Range and units for all numeric variables saved to netCDF file.

All suggested improvements have been implemented.

A common MDV (-9999.9) has been introduced for almost all numeric variables. The exceptions are for vectors (e.g. POD and Georef triplets) and those commonly internally converted to strings (notably date/time). This value required some minor adjustment to a few field widths in the ROPP text-based file formatting, which in all but one case was automatically back-compatible (remaining one fixed to be so). When the text file if finally dead (see below), this MDV can be defined to be an even more negative number (e.g. -9999999.9 - like the BUFR interface) for which invalid even for individual POD components.

For release v1.1, text files will be declared as deprecated; all tools except ropp2ropp no longer support the -t switch to write text files. ropp2ropp will follow for a future release (v1.2?). Tools can still read text files though (to be removed in a future release (v2.0?). A new gfz2ropp tool has been developed to assist GFZ to move away from ROPP text files as interface to ropp2bufr.

Leaving this ticket open until: a) a standalone text2ropp tool is in place b) writing of text files is no longer supported c) MDV defined value is finalised

comment:2 by Huw Lewis, 16 years ago

Added check in ncdf_getvar and ncdf_putvar routines to perform unit conversion on valid data only. All data is in fact converted, but those data points read in as missing (MDV) are reset to ropp_io_mdfv value after conversion. See [1569].

Ticket left open until: a) a standalone text2ropp tool is in place b) writing of text files is no longer supported c) MDV defined value is finalised

comment:3 by (none), 16 years ago

Milestone: 2.0

Milestone 2.0 deleted

comment:4 by Huw Lewis, 16 years ago

Milestone: 2.0

comment:5 by Huw Lewis, 16 years ago

Milestone: 2.03.0

Standalone text2ropp tools planned for ROPP-3, as part of rationalisation of all read/write code after formal removal of text support for both reading and writing.

Ticket remains open. Moving milestone back to v3.0.

comment:6 by Huw Lewis, 16 years ago

Type: defectenhancement

comment:7 by Huw Lewis, 15 years ago

Resolution: fixed
Status: newclosed

text2ropp tool now available as part of ROPP-3 distribution [2105], and all main read/write tools only work with flavours of netCDF format files.

All MDV are defined in ropp_io_types (allowing flexibility for use to adapt these as required).

Ticket closed as fixed.

Note: See TracTickets for help on using tickets.