Opened 15 years ago

Closed 12 years ago

#170 closed enhancement (fixed)

Range check missing data treatment

Reported by: Huw Lewis Owned by: Dave Offiler
Priority: normal Milestone: 6.1
Component: ropp_io Version: 3.0
Keywords: range check Cc: dave.offiler@…

Description

Input from Axel....

And, back at the old thinner discussion, even if ropp is not able to 
process one occultation in a file, it should still output it, even
if all are set to missing. Thus one always has the same number of
occultations going in and going out. For me that is a major
simplification.

HL replied: "This should now be the case for the ROPP thinner. Are you now saying this in the context of the missing profiles from your FM test?"

From Axel...

no, I am not noticing anything wrong with the thinner yet, but we 
had this discussion already with the thinner, where the old ROPP
just ignored profiles when there was nothing to thin, while I said 
that in these cases I want an occultation with all missing data. So 
nothing to worry about in the thinner. But if the forward propagator
 is doing the same thing, then I'd like to have that changed.

The ropp_io_rangecheck currently 'zaps' an output Level if all variables are missing. This can lead to problems, particularly if there is missing data in the first profile of a multifile for example. It may also be confusing to users when the input structure is not compatible with the output. I believe these checks in ropp_io_rangecheck were implemented for efficiency? We do not save considerable space in not outputting missing data to file.

We should consider removing this 'zapping' approach to missing data, or have two levels of range checking, one where users can specify to only keep valid data in the output file and one where users can specify to output all data (missing and valid) to guarantee the same number of profiles and same number of data in the input and output files of ropp_io, ropp_fm and ropp_1dvar applications.

Attachments (2)

FirstOccNoData.nc (2.1 MB ) - added by Ian Culverwell 13 years ago.
Ranchk.doc (97.0 KB ) - added by Ian Culverwell 13 years ago.
Ranchk.doc

Change history (20)

comment:1 by Huw Lewis, 15 years ago

Milestone: 4.04.1

No update to report at ROPP-4 (though see also #183 and #174). Ticket moved to milestone 4.1.

comment:2 by Huw Lewis, 14 years ago

Milestone: 4.15.0
Owner: changed from Huw Lewis to Dave Offiler
Status: newassigned

Need to consider further ahead of ROPP-5.

comment:3 by Dave Offiler, 14 years ago

We probably need a sub-flag which allows range checking (and setting out-of-range values to 'missing') but then skips the setting of Npoints to zero if all values for the critical parameters are missing. This might be automatic on detecting that the output is multifile and this is the first profile. Other cases are non-critical.

comment:4 by Dave Offiler, 13 years ago

Milestone: 5.05.1

comment:5 by Ian Culverwell, 13 years ago

Implement Axel's -no_ranchk and -impactalt options to ropp_io.f90. These can be effected by calling ropp2ropp, ucar2ropp and gfz2ropp with the appropriate options. (Actually, -impactalt missing from gfz2ropp as it doesn't do any thinning.) Man pages and subroutines, and IO UG, updated.

-impactalt means that the thinning is on (IP - ROC - undulation), rather than (IP - ROC) as is the default.

-no_ranchk simply disables any range checking, which means that no profiles are trunk out, hence ensuring the same number of profiles in as out. It also ensures that any old rubbish is passed. A better solution is Dave's sub-keyword. I think this can be postponed to ROPP6.0, unless inspiration strikes sooner.

comment:6 by Ian Culverwell, 13 years ago

Should have added: Axel generated a 3-profile multifile, with missing data in 1st and 3rd files. When passed through the original ropp2ropp we get:

ropp2ropp -m FirstOccNoData.nc -o !FirstOccNoData_out.nc

--------------------------------------------------------

ROPP-to-ROPP Tool

13:11UT 15-Aug-2011

--------------------------------------------------------

INFO (from ropp2ropp): Reading /data/nwp1/idculv/ROPP/ropp-5.1/ropp_src/Tests/Ranchk/FirstOccNoData.nc

INFO (from ropp2ropp): Profile 1 : OC_20110524050753_META_G025_EUME

INFO (from ropp2ropp): Writing /data/nwp1/idculv/ROPP/ropp-5.1/ropp_src/Tests/Ranchk/FirstOccNoData_out.nc

INFO (from ropp2ropp): Profile 2 : OC_20110524050819_META_G026_EUME

INFO (from ropp2ropp): Writing /data/nwp1/idculv/ROPP/ropp-5.1/ropp_src/Tests/Ranchk/FirstOccNoData_out.nc

ERROR: Variable not found: lat_tp

FATAL ERROR (from ropp_io_write_ncdf_put): NetCDF: Variable not found

But with the -no_ranchk option:

ropp2ropp -m -no_ranchk FirstOccNoData.nc -o !FirstOccNoData_out.nc

--------------------------------------------------------

ROPP-to-ROPP Tool

13:11UT 15-Aug-2011

--------------------------------------------------------

INFO (from ropp2ropp): Reading /data/nwp1/idculv/ROPP/ropp-5.1/ropp_src/Tests/Ranchk/FirstOccNoData.nc

INFO (from ropp2ropp): Profile 1 : OC_20110524050753_META_G025_EUME

INFO (from ropp2ropp): Writing /data/nwp1/idculv/ROPP/ropp-5.1/ropp_src/Tests/Ranchk/FirstOccNoData_out.nc

INFO (from ropp2ropp): Profile 2 : OC_20110524050819_META_G026_EUME

INFO (from ropp2ropp): Writing /data/nwp1/idculv/ROPP/ropp-5.1/ropp_src/Tests/Ranchk/FirstOccNoData_out.nc

INFO (from ropp2ropp): Profile 3 : OC_20110524050914_META_G015_EUME

INFO (from ropp2ropp): Writing /data/nwp1/idculv/ROPP/ropp-5.1/ropp_src/Tests/Ranchk/FirstOccNoData_out.nc

INFO (from ropp2ropp): 3 profiles processed

And the output file looks as expected - missing data in 1st and 3rd profs, sensible (same as input) in 2nd.

by Ian Culverwell, 13 years ago

Attachment: FirstOccNoData.nc added

comment:7 by Dave Offiler, 13 years ago

Axel's solution to merely skip the range check altogether is heavy-handed, and in fact could potentially cause problems for the thinner, which assumes this has been done. Unless you know what you're doing and accept the consequences, range checking should be a strong default (if not non-optional).

What Axel requires is a guaranteed profile size; profiles should still be range checked. It's the current practice of zeroing the profile Npoints that needs changing.

Instead, I've devised a more flexible solution:

1) In each Level structure, add a new LOGICAL :: Missing flag. This flag shall indicate whether that profile part has any valid data (True) or the profile contains no valid data (False), whether Npoints is zero or not. If Npoints=0, Missing shall be set T.

2) When an RO structure (or any individual Level) is initialized via ropp_io_init(), set Npoints=0 (as now) and Missing=T

3) When an RO structure is read in via ropp_io_read(), Npoints is set to the profile length (as now) and Missing=F. If a profile is created other than via the ROPP netCDF reader interface, the user must re-set this flag F if Npoints > 0 (This needs to be documented in the User Guide).

4) On running ropp_io_rangecheck() [in normal usage, non-optional]

  • after range checking any Level, any invalid/missing coordinate (time, impact parameter, altitude, etc) is removed from the profile. In the extreme case, this may lead to no remaining data and Npoints=0. This behaviour is unchanged in the new scheme, except that Lev2c is now treated consistently (ie the surface geopotential is considered the vertical coordinate, so will be removed if invalid/missing).
  • after range checking, if all observed (ie non-coordinate) data in the (remaining) profile is invalid/missing, instead of setting Npoints=0 as now, set Missing=T

5) It is now up to the calling application to decide whether to write out a full profile or not. For instance:

  • the default behaviour of ropp_io_write() suppresses output of a level profile if Npoints=0. This is unchanged. In the new scheme, range checking missing data no longer zeros Npoints, so a full profile (of missing data) will now be output to the netCDF file. This will occur for all the ROPP tools which write to netCDF via this routine.
  • the BUFR encoder, on the other hand, should not encode profiles that are all missing, previously relying on Npoints; this application now need to inspect Missing instead.

6) The ropp_io_write() routine retains the existing ranchk flag to disable range checking, since this is required by test2ropp so that deliberately invalid data can pass unscathed into the netCDF file for testing of other applications. However, I've commented in the headers that this flag is for testing, and should not normally be used (or if it is, it should be set T).

7) With Ian's agreement, I've changed the command line option to thin on impact altitudes from -impactalt to -i for consistency with other flags and general Unix style (long options should have two dashes). The internal operation of this option is unchanged.

8) The -no_ranchk command line option has been removed from all tools except ropp2ropp, and here is renamed to --no-ranchk (again. for consistency with Unix style). This option shall not be documented as a user-option (ie to not appear in the tool's help output, man page or User Guide). Use of this option is intended for developer testing only or by cognoscenti if there is a genuine reason to by-pass range checking.

ToDo

  • update ropp2bufr to use Missing instead of Npoints
  • consider whether the new behaviour (keep missing profiles intact) should be non-optional, optional but default or optional but non-default. Add command line options accordingly
  • test all tools with appropriate data
  • check in-line RoboDoc comments are consistent; update man pages; document Missing flag in I/O UG

comment:8 by Dave Offiler, 13 years ago

Update (to ROPP_IO):

  • ropp2bufr now uses the new Missing flag to set Npoints=0 before encoding that sub-profile, so the old behaviour is effectively unchanged (viz. profiles with no valid observations are not encoded)
  • In-line help text, Robodoc headers and man pages updated (references to the [hidden] range checking option removed)
  • test2ropp upgraded to support 2 additional modes to test the new flagging scheme:
    • BADPROF: generate valid coordinates, but invalid observations
    • MISPROF: generate valid coordinates, but missing observations

The intent with the new test2ropp modes was that a user could generate a dummy profile with MISPROF and with the at least the largest number of samples in their collection. This dummy file could then be input to any ROPP tool as the first file, thus setting up the output netCDF dimensions, ensuring that all subsequent 'real' profiles are always smaller (or at least no larger). Even if the first user-profile happened to be smaller than usual (or even all missing), the dummy profile would protect the dimension. Pre-running the dummy profile through ropp2ropp with the same thinner that is being used for the real profiles would ensure the same number of output samples and with the same height/altitude values.

However, this trick doesn't work with Axel's example netCDF file:

  • this file has all parameters missing (Axel's email mentions cases only where BA is missing) so all samples are removed, and Npoints=0 anyway.
  • this file contains non-core parameters which test2ropp knows nothing about, and are unknown when outputting occultations which have such arbitrary new parameters, causing ropp_io_write() to bail out.

In this case, using the -no-ranchk flag seems the only option. But this also raises another basic netCDF limitation that new parameters can't be introduced after the first profile in a multifile sequence.

While testing, we also came across another issue related to range checking only after unit conversion at a lower level on read. A separate Ticket to be raised.

Chances to ROPP_IO checked in as [2998].

ToDo: check through the tools in the other ROPP modules.

comment:9 by Dave Offiler, 13 years ago

All tools in all modules checked to:

  • modify the command-line flag from -no_ranchk to -no-ranchk for compatibily and better fitting Unix standards (though it should ideally be --no-ranchk)
  • remove the fact of this this option from the help output when using -h
  • removed the fat of this option from the RoboDoc headers

These changes will be checked in as part of a general code tidy across all modules.

In addition to -no-ranchk, ropp2ropp also has a -no-zapem (disable 'Zero if All Profile Elements Missing') flag. The default action is that if the profile Missing flag is set (having performed a range check), set Npoints=0 (i.e. zapem); the new flag disables this behaviour. Obviously if range check is disabled, this flag is redundant.

Leaving this ticket open until Axel confirms that the new ranchk/zapem behaviour meets his requirements (or not).

comment:10 by Ian Culverwell, 13 years ago

I've retested Axel's example file with the latest version of ropp2ropp. We get:

No options ============================================================ Executing /data/nwp1/idculv/ROPP/ropp-5.1/ropp_src/branches/dev/Share/ROPP5.1_prototype/ropp_io/tools/ropp2ropp -m /data/nwp1/idculv/ROPP/ropp-5.1/ropp_src/Tests/Ranchk/FirstOccNoData.nc -o /data/nwp1/idculv/ROPP/ropp-5.1/ropp_src/Tests/Ranchk/FirstOccNoData_none_out.nc ============================================================


ROPP-to-ROPP generic netCDF tool


INFO (from ropp2ropp): Reading /data/nwp1/idculv/ROPP/ropp-5.1/ropp_src/Tests/Ranchk/FirstOccNoData.nc INFO (from ropp2ropp): Profile 1 : OC_20110524050753_META_G025_EUME INFO (from ropp2ropp): Writing /data/nwp1/idculv/ROPP/ropp-5.1/ropp_src/Tests/Ranchk/FirstOccNoData_none_out.nc INFO (from ropp2ropp): Profile 2 : OC_20110524050819_META_G026_EUME INFO (from ropp2ropp): Writing /data/nwp1/idculv/ROPP/ropp-5.1/ropp_src/Tests/Ranchk/FirstOccNoData_none_out.nc ERROR: Variable not found: lat_tp

FATAL ERROR (from ropp_io_write_ncdf_put): NetCDF: Variable not found

-no-ranchk ============================================================ Executing /data/nwp1/idculv/ROPP/ropp-5.1/ropp_src/branches/dev/Share/ROPP5.1_prototype/ropp_io/tools/ropp2ropp -m --no-ranchk /data/nwp1/idculv/ROPP/ropp-5.1/ropp_src/Tests/Ranchk/FirstOccNoData.nc -o /data/nwp1/idculv/ROPP/ropp-5.1/ropp_src/Tests/Ranchk/FirstOccNoData_--no-ranchk_out.nc ============================================================


ROPP-to-ROPP generic netCDF tool


WARNING (from ropp2ropp): Range checking is disabled INFO (from ropp2ropp): Reading /data/nwp1/idculv/ROPP/ropp-5.1/ropp_src/Tests/Ranchk/FirstOccNoData.nc INFO (from ropp2ropp): Profile 1 : OC_20110524050753_META_G025_EUME INFO (from ropp2ropp): Writing /data/nwp1/idculv/ROPP/ropp-5.1/ropp_src/Tests/Ranchk/FirstOccNoData_--no-ranchk_out.nc INFO (from ropp2ropp): Profile 2 : OC_20110524050819_META_G026_EUME INFO (from ropp2ropp): Writing /data/nwp1/idculv/ROPP/ropp-5.1/ropp_src/Tests/Ranchk/FirstOccNoData_--no-ranchk_out.nc INFO (from ropp2ropp): Profile 3 : OC_20110524050914_META_G015_EUME INFO (from ropp2ropp): Writing /data/nwp1/idculv/ROPP/ropp-5.1/ropp_src/Tests/Ranchk/FirstOccNoData_--no-ranchk_out.nc INFO (from ropp2ropp): 3 profiles processed

-no-zapem ============================================================ Executing /data/nwp1/idculv/ROPP/ropp-5.1/ropp_src/branches/dev/Share/ROPP5.1_prototype/ropp_io/tools/ropp2ropp -m --no-zapem /data/nwp1/idculv/ROPP/ropp-5.1/ropp_src/Tests/Ranchk/FirstOccNoData.nc -o /data/nwp1/idculv/ROPP/ropp-5.1/ropp_src/Tests/Ranchk/FirstOccNoData_--no-zapem_out.nc ============================================================


ROPP-to-ROPP generic netCDF tool


INFO (from ropp2ropp): Reading /data/nwp1/idculv/ROPP/ropp-5.1/ropp_src/Tests/Ranchk/FirstOccNoData.nc INFO (from ropp2ropp): Profile 1 : OC_20110524050753_META_G025_EUME INFO (from ropp2ropp): Writing /data/nwp1/idculv/ROPP/ropp-5.1/ropp_src/Tests/Ranchk/FirstOccNoData_--no-zapem_out.nc INFO (from ropp2ropp): Profile 2 : OC_20110524050819_META_G026_EUME INFO (from ropp2ropp): Writing /data/nwp1/idculv/ROPP/ropp-5.1/ropp_src/Tests/Ranchk/FirstOccNoData_--no-zapem_out.nc ERROR: Variable not found: lat_tp

FATAL ERROR (from ropp_io_write_ncdf_put): NetCDF: Variable not found

-no-ranchk -no-zapem ============================================================ Executing /data/nwp1/idculv/ROPP/ropp-5.1/ropp_src/branches/dev/Share/ROPP5.1_prototype/ropp_io/tools/ropp2ropp -m --no-ranchk --no-zapem /data/nwp1/idculv/ROPP/ropp-5.1/ropp_src/Tests/Ranchk/FirstOccNoData.nc -o /data/nwp1/idculv/ROPP/ropp-5.1/ropp_src/Tests/Ranchk/FirstOccNoData_both_out.nc ============================================================


ROPP-to-ROPP generic netCDF tool


WARNING (from ropp2ropp): Range checking is disabled INFO (from ropp2ropp): Reading /data/nwp1/idculv/ROPP/ropp-5.1/ropp_src/Tests/Ranchk/FirstOccNoData.nc INFO (from ropp2ropp): Profile 1 : OC_20110524050753_META_G025_EUME INFO (from ropp2ropp): Writing /data/nwp1/idculv/ROPP/ropp-5.1/ropp_src/Tests/Ranchk/FirstOccNoData_both_out.nc INFO (from ropp2ropp): Profile 2 : OC_20110524050819_META_G026_EUME INFO (from ropp2ropp): Writing /data/nwp1/idculv/ROPP/ropp-5.1/ropp_src/Tests/Ranchk/FirstOccNoData_both_out.nc INFO (from ropp2ropp): Profile 3 : OC_20110524050914_META_G015_EUME INFO (from ropp2ropp): Writing /data/nwp1/idculv/ROPP/ropp-5.1/ropp_src/Tests/Ranchk/FirstOccNoData_both_out.nc INFO (from ropp2ropp): 3 profiles processed

The 1st and 3rd files are exactly the same (2 profiles, the 1st and second in the input file, but no meteorological data, only occ_ids etc).

The 2nd and 4th files are exactly the same (3 profiles, 1st and last missing, as in the input file, middle one untouched).

I've asked Axel to (re-)confirm that he's happy with this. In particular, is he happy that the -no-ranchk option has been removed from

comment:11 by Ian Culverwell, 13 years ago

... ucar2ropp and gfz2ropp.

comment:12 by Ian Culverwell, 13 years ago

Axel said to go with what we've got at 5.1. He'll then test this, and if it's not good enough, we'll do something else for ROPP6.0.

comment:13 by Ian Culverwell, 13 years ago

Resolution: fixed
Status: assignedclosed

Axel found a couple of profiles where it failed: one because of a mistake in ropp_io_thin; one because of a mistake in the coding of the --no-ranchk option. Both now corrected.

No further problems, so close ticket (for now).

comment:14 by Ian Culverwell, 13 years ago

Milestone: 5.16.1
Resolution: fixed
Status: closedreopened

Further discussion with Axel reveals that he needs both --no-ranchk and --no-zapem. (See attached Ranchk.doc for a discussion of the interaction between them.) Ideally, he'd like --no-zapem to imply --no-ranchk. We'd like to keep the two options separate in general, as we feel they do differ in effect:

  • --no-ranchk works on a point-by-point basis, filtering out dependent variables whose corresponding independent variables (coordinates) are missing/invalid;
  • --no-zapem works on a profile-by-profile basis, removing whole levels of data (eg Lev2a, Lev1b etc) from a profile if it has no valid data (dependent variables).

Solution: a third option, which implies --no-ranchk and --no-zapem.

Too late for ROPP6.0 now, so (with EUM's permission) reopen ticket and assign this bit of work to ROPP6.1.

by Ian Culverwell, 13 years ago

Attachment: Ranchk.doc added

Ranchk.doc

comment:15 by kmk, 13 years ago

We have been repeatedly banging our heads against a related issue with the forward propagator tool. The way the ropp_io_rangecheck tests for existence of the level 2b block is by looking for existence of a valid vector of geopotential heights. %geop. This means that if there are no geopotential heights in the 2b block the forward propagator interprets this to mean that there is no 2b block. This is not so good since you can have a perfectly good level 2b block consisting of pressures, temperatures, temperatures and humidities. This is how our .bgr files are currently constructed and in fact ropp_fm_bg2ro_1d will process them correctly if run with the -no_ranchk flag. AND fill in the %GEOP variable. So geop is an unfortunate choice of variable to test for existence of the 2b block.

comment:16 by kmk, 13 years ago

After investigating this further I have to walk back some of what I've said. I think probably ROPP is entirely reasonable on this point and the problems is that our offline bgr -files are not well-formed. Let me explain:

All ROPP uses geop for is to find the value of lev2b%npoints. This happens in ropp_io_read_ncdf_get. If the geop variable is there but has missing values (-9999000) this works perfectly fine. I tried this, and it works. However, the way we've constructed the offline bgr files the geop values are unset (ncdf fill values). ROPP apparently does not understand this and the count of entries in geop comes out with lev2b%npoints = 0. Subsequently rangecheck will then understand this to mean that the 2b block is not present (because ropp_io_rangecheck checks for the value of lev2b%npoints).

So. I think the error is at our end. We will look into it a little more.

comment:17 by Ian Culverwell, 12 years ago

Incorporated a largely[*] undocumented --both option in ropp2ropp. This implies --no-ranchk and --no-zapem. Mainly for Axel's benefit. In ROPP6.1.

[*] Actually amend the usage subroutine so that ropp2ropp -h says

...
   -v version information
   --no-ranchk disable range-checking (not recommended)
   --no-zapem  disable zeroing of empty profiles (not recommended)
   --both: --no-ranchk and --no-zapem (not recommended)
...

Thus the syntax is visible without examining the code.

comment:18 by Ian Culverwell, 12 years ago

Resolution: fixed
Status: reopenedclosed

No problems in test folder, so closing ticket.

Note: See TracTickets for help on using tickets.