Opened 10 years ago

Closed 8 years ago

#392 closed defect (fixed)

ifort14 anomalies

Reported by: Ian Culverwell Owned by: Ian Culverwell
Priority: normal Milestone: 9.0
Component: ropp_1dvar Version: 7.1
Keywords: ifort14 Cc:

Description

ROPP8.0 beta reviewer Dave Offiler (UKMO) says

- ROPP_1DVAR: test t_1dvar_GRAS_05_bangle fails (ifort):

    Testing 1D-Var using GRAS bending angles with colocated ECMWF backgrounds
 
    Running t_1dvar_GRAS_05_bangle (1DVAR GRAS; default) ...
 
    ERROR (from ropp_io_fields_compare):  prof1%Lev1b%bangle_sigma differs from 
    prof2%Lev1b%bangle_sigma (max|diff| =  0.12532E-05 >  0.10000E-05)
 
    ERROR (from ropp_io_fields_compare):  prof1%Lev1b%bangle differs from 
    prof2%Lev1b%bangle (max|diff| =  0.75710E-05 >  0.10000E-05)
    INFO (from ropp_1dvar_compare):     2 elements of IT-1DVAR-05_bangle.1.nc 
    and ../data/IT-1DVAR-05_bangle.1_reference.nc differ significantly
    ****************************
    ********** *FAIL* **********
    ****************************
    ... examine t_1dvar_GRAS_05_bangle.log for details
       
  The first case looks a marginal excess difference, but the second is more 
  significant at 7x the threshold. The log shows only INFO messages; it would 
  be helpful if the profile number having the max. diff were output. 
  [severity: medium/high?]

  
- ROPP_1DVAR: test t_1dvar_GRAS_05_bangle fails (ifort):

    Running t_1dvar_GRAS_05newop_refrac (1DVAR GRAS; new interp) ...
 
    ERROR (from ropp_io_fields_compare):  prof1%Lev2a%refrac_sigma differs 
    from prof2%Lev2a%refrac_sigma (max|diff| =  0.12619E-01 >  0.10000E-02)
 
    ERROR (from ropp_io_fields_compare):  prof1%Lev2b%press differs from 
    prof2%Lev2b%press (max|diff| =  0.12473E+00 >  0.10000E+00)
 
    ERROR (from ropp_io_fields_compare):  prof1%Lev2b%geop differs from 
    prof2%Lev2b%geop (max|diff| =  0.10156E+01 >  0.10000E+01)
    INFO (from ropp_1dvar_compare):     3 elements of IT-1DVAR-05newop_refrac.1.nc and 
    ../data/IT-1DVAR-05newop_refrac.1_reference.nc differ significantly
    ****************************
    ********** *FAIL* **********
    ****************************
    ... examine t_1dvar_GRAS_05newop_refrac.log for details

   All cases (and especially the last) appear to be only marginally exceeding 
   the threshold. The log shows only INFO messages; it would be helpful if the
   profile number having the max. diff were output.
  [severity: medium/high?]
  
  
- ROPP_1DVAR: gfortran, g95 & sunf95 tests *all* PASS. so some compiler-
  dependency here. Diff'ing the gfortran, g95 & sunf95 logs shows them to be
  identical (apart from just one value, 0.0001 different). But while the ifort
  log has many differences, all are rather small and almost all in the last  1-2
  decimal places - no significant deviations such as additional  iterations, for
  instance, except that one profile in the second case has one  more iteration
  with ifort. But it's still to be noted that ifort is the odd compiler out of
  the four tested, and this needs futher investigation to tell whether the
  differences are significant in practice for the 1D-Var  output or not.

This is either an ifort14 bug, or a deep-rooted numerical deep rooted numerical ill-conditioning (like #301 and #349) which has only come to light with this compiler --- which would not be a ten minute job to fix.

Fixing the log file to show which profile was causing the trouble would be relatively easy, however.

Attachments (4)

392_bangle_diff.png (20.0 KB ) - added by Ian Culverwell 8 years ago.
392_bangle_diff.png
392_spec_hum_diff.png (21.2 KB ) - added by Ian Culverwell 8 years ago.
392_spec_hum_diff.png
392_bangle_diff2.png (21.2 KB ) - added by Ian Culverwell 8 years ago.
392_bangle_diff2.png
392_spec_hum_diff2.png (21.4 KB ) - added by Ian Culverwell 8 years ago.
392_spec_hum_diff2.png

Download all attachments as: .zip

Change history (9)

comment:1 by Ian Culverwell, 8 years ago

The fix to the 'compare' scripts to get the log file to show individual profile comparisons was made at r4642.

The code was in fact already in place, but the msg_diag output was being suppressed because msg_mode_read is initialised as .FALSE., which means msg_mode is initialised to Normal (==> don't print msg_diag). AFTER this, msg_mode_read = .TRUE., and msg_mode can be set to what you want, by calling message(msg_info/msg_diag/msg_error/etc), but I was setting it BEFORE.

The solution is not to CALL message_get_routine(routine) at the top of the routine, but to CALL message(msg_noin, '') at least once, which sets msg_mode_read = .TRUE., as required, before setting msg_MODE = VerboseMode.

comment:2 by Ian Culverwell, 8 years ago

I get the same results with ifort14 at ROPP9.0, but the differences aren't signalled by ropp_io/ropp/ropp_io/compare_fields.f90 because the tolerances on just these fields were relaxed at r4995. (These changes were in fact inherited from r4938 and r4943 of https://trac.romsaf.org/ropp/browser/ropp_src/branches/dev/Share/cb_wopt_test/ropp_io/ropp/ropp_io_fields_compare.f90.

These relaxed tolerances are only applied to ropp_1dvar tests, which might be expected to suffer more variation than (eg) ropp_fm, where stricter tolerances apply. And even then the differences are not so catastrophic. The max difference in bangle is still only ~ 1e-5 rad, and appears to be associated with a spike in q of ~ 0.004 g/kg.

bangle 392_bangle_diff.png

spec hum 392_spec_hum_diff.png

I think we can live with this.

by Ian Culverwell, 8 years ago

Attachment: 392_bangle_diff.png added

392_bangle_diff.png

by Ian Culverwell, 8 years ago

Attachment: 392_spec_hum_diff.png added

392_spec_hum_diff.png

comment:3 by Ian Culverwell, 8 years ago

For ifort12, however, IT-1DVAR-05_bangle.1.nc = ../data/IT-1DVAR-05_bangle.1_reference.nc exactly. So why is there a difference between ifort12 and ifort14? Optimisation! When I run ifort14 without optimisation (-O2 --> -O0 in mini-config scripts), the differences wrt ifort12 are much reduced:

bangle 392_bangle_diff2.png

spec hum 392_spec_hum_diff2.png

And when I run ifort12 without optimisation, the differences between the two completely disappear.

Sadly, the recommended/supported compiler ifort16 mirrors the behaviour of ifort14 and 15 in this respect: optimising the compiler changes the results in a nearly significant way.

There's not really much we can do about this, except to log it in the Release Notes, which has been done at r5135. At ROPP10.0, we may decide to use ifort16(-O2) to generate the reference dataset. So defer the ticket rather than close it.

by Ian Culverwell, 8 years ago

Attachment: 392_bangle_diff2.png added

392_bangle_diff2.png

by Ian Culverwell, 8 years ago

Attachment: 392_spec_hum_diff2.png added

392_spec_hum_diff2.png

comment:4 by Ian Culverwell, 8 years ago

Milestone: 9.010.0

Deferring ticket to ROPP10.0 for reasons stated above.

comment:5 by Ian Culverwell, 8 years ago

Milestone: 10.09.0
Resolution: fixed
Status: newclosed

Actually, gfortran(O2) and nagfor (O2) are very close to ifort12(O2), so scrub that last idea: it's ifort14+(O2) that's anomalous: we should keep the reference file (=ifort12(O2)'s) as it is. That being the case, we should revert the ticket to ROPP9.0, and then close it (as 'fixed' - we understand what's going on).

Note: See TracTickets for help on using tickets.