Opened 5 years ago
Last modified 3 years ago
#661 assigned defect
Old bug in ropp_fm_state2state_ecmwf_ad.f90
Reported by: | Stig Syndergaard | Owned by: | Ian Culverwell |
---|---|---|---|
Priority: | normal | Milestone: | ROPP possible |
Component: | ropp_fm | Version: | 9.0 |
Keywords: | Cc: | ssy@… |
Description
The adjoint (AD) of the interpolation coefficients in ropp_fm_state2state_ecmwf_ad.f90 includes the following line:
p_hlv_ad(1:n_hlv-1) = p_hlv_ad(1:n_hlv-1) - alpha_ad/del_p*ln_prflv
The tangent linear (TL) of the interpolation coefficients in ropp_fm_state2state_ecmwf_tl.f90 includes this line:
alpha_tl = del_p_tl*p_hlv(2:n_hlv)/(del_p*del_p)*ln_prflv & - ln_prflv_tl*(p_hlv(2:n_hlv)/del_p) & - p_hlv_tl(2:n_hlv)/del_p*ln_prflv
These two are inconsistent. The tangent linear is correct according to the expression in eq. (3.9) in Simmons and Burridge (1981): https://journals.ametsoc.org/doi/pdf/10.1175/1520-0493%281981%29109%3C0758%3AAEAAMC%3E2.0.CO%3B2
The correct line in ropp_fm_state2state_ecmwf_ad.f90 should therefore be:
p_hlv_ad(2:n_hlv) = p_hlv_ad(2:n_hlv) - alpha_ad/del_p*ln_prflv
which can be verified by writing matrices on a piece of paper or simply following the rules of how to write AD code based on TL code.
A fix has been implemented in the dmi_trunk_9.0 at r6115, although we do not at DMI use the AD code in the 1Dvar (since we are using the LEVMARQ method and the AD code is only used with the MINROPP method). It is unclear what other users could be affected. The bug has been in ROPP from day 1 as far as I can see (from looking at the svn history). If the code originally comes from ECMWF, it would probably be a good idea for some to check if such code there has the same issue.
Before the fix (actually unfixing it), the following results were obtained using the test code made by Joe here:
AD: x%non_ideal, x%new_ref_op, x%new_bangle_op = F F F Checking REFRACTIVITY AD FASCOD_TRO norm1, norm2, norm1/norm2, (norm1-norm2)/norm2 = 364528929.066 364528929.045 1.0000000 0.57491E-10 Checking BENDING ANGLE AD FASCOD_TRO norm1, norm2, norm1/norm2, (norm1-norm2)/norm2 = -9.025 -9.025 1.0000000 -0.12613E-09 Checking REFRACTIVITY AD FASCOD_MLS norm1, norm2, norm1/norm2, (norm1-norm2)/norm2 = 325588949.035 325588949.019 1.0000000 0.49311E-10 Checking BENDING ANGLE AD FASCOD_MLS norm1, norm2, norm1/norm2, (norm1-norm2)/norm2 = 75.279 75.279 1.0000000 0.16198E-09 Checking REFRACTIVITY AD FASCOD_MLW norm1, norm2, norm1/norm2, (norm1-norm2)/norm2 = 431837535.061 431837535.253 1.0000000 -0.44447E-09 Checking BENDING ANGLE AD FASCOD_MLW norm1, norm2, norm1/norm2, (norm1-norm2)/norm2 = -79.704 -79.704 1.0000000 0.59795E-11 Checking REFRACTIVITY AD FASCOD_SAS norm1, norm2, norm1/norm2, (norm1-norm2)/norm2 = 372208695.410 372208695.450 1.0000000 -0.10889E-09 Checking BENDING ANGLE AD FASCOD_SAS norm1, norm2, norm1/norm2, (norm1-norm2)/norm2 = 25.611 25.611 1.0000000 0.57009E-10 Checking REFRACTIVITY AD FASCOD_SAW norm1, norm2, norm1/norm2, (norm1-norm2)/norm2 = 409038225.382 409038225.393 1.0000000 -0.26912E-10 Checking BENDING ANGLE AD FASCOD_SAW norm1, norm2, norm1/norm2, (norm1-norm2)/norm2 = 8.828 8.828 1.0000000 0.17541E-09 **************************** ********** PASS ********** ****************************
With the fix implemented, the test results are:
AD: x%non_ideal, x%new_ref_op, x%new_bangle_op = F F F Checking REFRACTIVITY AD FASCOD_TRO norm1, norm2, norm1/norm2, (norm1-norm2)/norm2 = 364528929.066 364528929.066 1.0000000 0.16351E-15 Checking BENDING ANGLE AD FASCOD_TRO norm1, norm2, norm1/norm2, (norm1-norm2)/norm2 = -9.025 -9.025 1.0000000 -0.84639E-14 Checking REFRACTIVITY AD FASCOD_MLS norm1, norm2, norm1/norm2, (norm1-norm2)/norm2 = 325588949.035 325588949.035 1.0000000 -0.36613E-15 Checking BENDING ANGLE AD FASCOD_MLS norm1, norm2, norm1/norm2, (norm1-norm2)/norm2 = 75.279 75.279 1.0000000 0.15102E-14 Checking REFRACTIVITY AD FASCOD_MLW norm1, norm2, norm1/norm2, (norm1-norm2)/norm2 = 431837535.061 431837535.061 1.0000000 0.00000E+00 Checking BENDING ANGLE AD FASCOD_MLW norm1, norm2, norm1/norm2, (norm1-norm2)/norm2 = -79.704 -79.704 1.0000000 -0.30310E-14 Checking REFRACTIVITY AD FASCOD_SAS norm1, norm2, norm1/norm2, (norm1-norm2)/norm2 = 372208695.410 372208695.410 1.0000000 0.16014E-15 Checking BENDING ANGLE AD FASCOD_SAS norm1, norm2, norm1/norm2, (norm1-norm2)/norm2 = 25.611 25.611 1.0000000 -0.69360E-15 Checking REFRACTIVITY AD FASCOD_SAW norm1, norm2, norm1/norm2, (norm1-norm2)/norm2 = 409038225.382 409038225.382 1.0000000 0.00000E+00 Checking BENDING ANGLE AD FASCOD_SAW norm1, norm2, norm1/norm2, (norm1-norm2)/norm2 = 8.828 8.828 1.0000000 0.54330E-14 **************************** ********** PASS ********** ****************************
Thus an improvement from about 10-9 to 10-14. The change does not seem to break anything: doing 'make test' in dmi_trunk_9.0 gives the same results before and after the fix:
************************** SUMMARY OF ROPP_FM TEST RESULTS *************************** -------------------------------------------------------------------------------------- | Test name | Description | Run? | PASS? | -------------------------------------------------------------------------------------- | t_fascod_1 | FM FASCOD 1D | Run | *FAIL* | | t_fascod_tl_1 | FM_TL FASCOD 1D | Run | PASS | | t_fascod_ad_1 | FM_AD FASCOD 1D | Run | PASS | | t_fascod_2 | FM FASCOD 1D -comp | Run | *FAIL* | | t_fascod_tl_2 | FM_TL FASCOD 1D -comp | Run | PASS | | t_fascod_ad_2 | FM_AD FASCOD 1D -comp | Run | PASS | | t_fascod_3 | FM FASCOD 1D -new_op | Run | *FAIL* | | t_fascod_tl_3 | FM_TL FASCOD 1D -new_op | Run | PASS | | t_fascod_ad_3 | FM_AD FASCOD 1D -new_op | Run | PASS | | t_fascod_4 | FM FASCOD 1D -comp -new_op | Run | *FAIL* | | t_fascod_tl_4 | FM_TL FASCOD 1D -comp -new_op | Run | PASS | | t_fascod_ad_4 | FM_AD FASCOD 1D -comp -new_op | Run | PASS | | t_twodop_1 | FM TWOD | Run | PASS | | t_twodtl_1 | FM_TL TWOD | Run | PASS | | t_twodad_1 | FM_AD TWOD | Run | PASS | | t_twodop_2 | FM TWOD -comp | Run | PASS | | t_twodtl_2 | FM_TL TWOD -comp | Run | PASS | | t_twodad_2 | FM_AD TWOD -comp | Run | PASS | | t_iono_1 | FM L1 and L2 | Run | *FAIL* | | t_iono_tl_1 | FM_TL L1 and L2 | Run | PASS | | t_iono_ad_1 | FM_AD L1 and L2 | Run | PASS | | t_fm_1D_1 | FM 1D; default options | Run | *FAIL* | | t_fm_1D_2 | FM 1D; compress factors | Run | *FAIL* | | t_fm_2D_1 | FM 2D; default options | Run | PASS | | t_fm_2D_2 | FM 2D; compress factors | Run | PASS | | t_fm_iono_1 | FM iono; L_neutral | Run | *FAIL* | | t_fm_iono_2 | FM iono; L1 and L2 | Run | *FAIL* | --------------------------------------------------------------------------------------
The failures here might be because we in the dmi_trunk generally have different codes than in the official ROPP. This will often be the case, and 'make test' in the dmi_trunk cannot generally be expected to pass all checks when comparisons are made to test files from the official ROPP. The failures in the summary above have therefore not been investigated.
Change history (5)
comment:1 by , 5 years ago
Cc: | added |
---|---|
Owner: | changed from | to
Status: | new → assigned |
comment:2 by , 4 years ago
comment:3 by , 4 years ago
Stig points out that the basic ROPP FM code calculates the virtual temperature as
Tvflv = (1.0_wp + 0.61_wp * x%shum) * x%temp
whereas the TL and AD codes use the more accurate and precise expression
Tvflv = (1.0_wp + ((R_vap/R_dry)-1.d0) * x%shum) * x%temp
(Note that (R_vap/R_dry)-1.d0 = 0.60776661.)
When I change the ROPP-10 code to use 0.61_wp consistently, t_fascod_ad returns
AD: x%non_ideal, x%new_ref_op, x%new_bangle_op = F F F Checking REFRACTIVITY AD FASCOD_TRO norm1, norm2, norm1/norm2, (norm1-norm2)/norm2 = 273939521.134 273939520.843 1.0000000 0.10638E-08 Checking BENDING ANGLE AD FASCOD_TRO norm1, norm2, norm1/norm2, (norm1-norm2)/norm2 = 48.870 48.870 1.0000000 0.61733E-09 Checking REFRACTIVITY AD FASCOD_MLS norm1, norm2, norm1/norm2, (norm1-norm2)/norm2 = 338089562.019 338089560.020 1.0000000 0.59126E-08 Checking BENDING ANGLE AD FASCOD_MLS norm1, norm2, norm1/norm2, (norm1-norm2)/norm2 = 35.912 35.912 1.0000000 -0.58612E-11 Checking REFRACTIVITY AD FASCOD_MLW norm1, norm2, norm1/norm2, (norm1-norm2)/norm2 = 398849571.217 398849569.310 1.0000000 0.47816E-08 Checking BENDING ANGLE AD FASCOD_MLW norm1, norm2, norm1/norm2, (norm1-norm2)/norm2 = -30.640 -30.640 1.0000000 0.70200E-10 Checking REFRACTIVITY AD FASCOD_SAS norm1, norm2, norm1/norm2, (norm1-norm2)/norm2 = 370274972.979 370274969.405 1.0000000 0.96533E-08 Checking BENDING ANGLE AD FASCOD_SAS norm1, norm2, norm1/norm2, (norm1-norm2)/norm2 = 8.061 8.061 1.0000000 0.22001E-08 Checking REFRACTIVITY AD FASCOD_SAW norm1, norm2, norm1/norm2, (norm1-norm2)/norm2 = 465697377.480 465697374.694 1.0000000 0.59820E-08 Checking BENDING ANGLE AD FASCOD_SAW norm1, norm2, norm1/norm2, (norm1-norm2)/norm2 = -31.519 -31.519 1.0000000 -0.10083E-08 **************************** ********** PASS ********** ****************************
So |norm1 - norm2| is much smaller, but probably still not close enough for an adjoint test (despite formally passing the tests, which have an over-generous tolerance.) And the norms themselves are still very different from DMI's. Stig points out that there are a number of other differences in the DMI ropp_fm_state2state_ecmwf.f90, ropp_fm_state2state_ecmwf_ad.f90, and ropp_fm_state2state_ecmwf_tl.f90 codes, which might explain both aspects.
comment:4 by , 3 years ago
I have returned to this in connection with the carried over ROPP9.1 ticket #552.
Firstly, I agree with Stig's bug-fix. The TL code says
alpha_tlk = ... - (p_tlk+1/2/delpk) lnprk
so the AD should say
p_adk+1/2 = p_adk+1/2 - (alpha_adk/delpk) lnprk
(i.e. using the same partial derivative d alphak / d pk+1/2).
Thus, alpha_ad1 (i.e. alpha_ad(1)
) must affect p_ad3/2, (i.e. p_hlv_ad(2)
) not p_ad1/2 (i.e. p_hlv_ad(1)
). This is what Stig's bug-fix does.
Second, I have merged Joe's improved testing (see above) into t_fascod_tl.f90
and t_fascod_ad.f90
so that we can test the TL and AD parts of the state2state code.
Results
Before applying the fix, but after applying the 0.61 --> R_vap/R_dry - 1
change in r6857 (see #552), I get
Checking REFRACTIVITY AD FASCOD_TRO norm1, norm2, norm1/norm2, (norm1-norm2)/norm2 = 3.1637604822E+08 3.1637604728E+08 1.0000000030E+00 2.9695360567E-09 Checking BENDING ANGLE AD FASCOD_TRO norm1, norm2, norm1/norm2, (norm1-norm2)/norm2 = 8.0574685688E+00 8.0574685261E+00 1.0000000053E+00 5.2955585688E-09 Checking REFRACTIVITY AD FASCOD_MLS norm1, norm2, norm1/norm2, (norm1-norm2)/norm2 = 3.3404940907E+08 3.3404940644E+08 1.0000000079E+00 7.8674657683E-09 Checking BENDING ANGLE AD FASCOD_MLS norm1, norm2, norm1/norm2, (norm1-norm2)/norm2 = 3.2199743088E+00 3.2199742756E+00 1.0000000103E+00 1.0305770588E-08 Checking REFRACTIVITY AD FASCOD_MLW norm1, norm2, norm1/norm2, (norm1-norm2)/norm2 = 4.8208942921E+08 4.8208942467E+08 1.0000000094E+00 9.4211442201E-09 Checking BENDING ANGLE AD FASCOD_MLW norm1, norm2, norm1/norm2, (norm1-norm2)/norm2 = 5.7456951622E+00 5.7456951340E+00 1.0000000049E+00 4.9117830834E-09 Checking REFRACTIVITY AD FASCOD_SAS norm1, norm2, norm1/norm2, (norm1-norm2)/norm2 = 3.9902983097E+08 3.9902982824E+08 1.0000000068E+00 6.8478866746E-09 Checking BENDING ANGLE AD FASCOD_SAS norm1, norm2, norm1/norm2, (norm1-norm2)/norm2 = 6.6242770785E+00 6.6242770526E+00 1.0000000039E+00 3.9098533034E-09 Checking REFRACTIVITY AD FASCOD_SAW norm1, norm2, norm1/norm2, (norm1-norm2)/norm2 = 4.6235178545E+08 4.6235178218E+08 1.0000000071E+00 7.0806510165E-09 Checking BENDING ANGLE AD FASCOD_SAW norm1, norm2, norm1/norm2, (norm1-norm2)/norm2 = 9.7714764435E+00 9.7714764697E+00 9.9999999732E-01 -2.6796780574E-09 **************************** ********** *FAIL* ********** ****************************
The allowed tolerance on |(norm1-norm2)/norm2| is 1.0e-9, so it fails the test.
When I implement Stig's fix, I get
Checking REFRACTIVITY AD FASCOD_TRO norm1, norm2, norm1/norm2, (norm1-norm2)/norm2 = 3.1637604822E+08 3.1637604822E+08 1.0000000000E+00 1.3187866647E-15 Checking BENDING ANGLE AD FASCOD_TRO norm1, norm2, norm1/norm2, (norm1-norm2)/norm2 = 8.0574685688E+00 8.0574685688E+00 1.0000000000E+00 -9.0388972410E-15 Checking REFRACTIVITY AD FASCOD_MLS norm1, norm2, norm1/norm2, (norm1-norm2)/norm2 = 3.3404940907E+08 3.3404940907E+08 1.0000000000E+00 1.7843062480E-16 Checking BENDING ANGLE AD FASCOD_MLS norm1, norm2, norm1/norm2, (norm1-norm2)/norm2 = 3.2199743088E+00 3.2199743088E+00 1.0000000000E+00 -1.3791700407E-16 Checking REFRACTIVITY AD FASCOD_MLW norm1, norm2, norm1/norm2, (norm1-norm2)/norm2 = 4.8208942921E+08 4.8208942921E+08 1.0000000000E+00 -1.8545722380E-15 Checking BENDING ANGLE AD FASCOD_MLW norm1, norm2, norm1/norm2, (norm1-norm2)/norm2 = 5.7456951622E+00 5.7456951622E+00 1.0000000000E+00 -9.1203109951E-15 Checking REFRACTIVITY AD FASCOD_SAS norm1, norm2, norm1/norm2, (norm1-norm2)/norm2 = 3.9902983097E+08 3.9902983097E+08 1.0000000000E+00 4.4812172035E-16 Checking BENDING ANGLE AD FASCOD_SAS norm1, norm2, norm1/norm2, (norm1-norm2)/norm2 = 6.6242770785E+00 6.6242770785E+00 1.0000000000E+00 -1.4078326278E-14 Checking REFRACTIVITY AD FASCOD_SAW norm1, norm2, norm1/norm2, (norm1-norm2)/norm2 = 4.6235178545E+08 4.6235178545E+08 1.0000000000E+00 9.0241354431E-16 Checking BENDING ANGLE AD FASCOD_SAW norm1, norm2, norm1/norm2, (norm1-norm2)/norm2 = 9.7714764435E+00 9.7714764435E+00 1.0000000000E+00 1.7270051329E-14 **************************** ********** PASS ********** ****************************
This is obviously a lot better, although some of the fractional bending angle differences are ~ ten times bigger than we saw before, which were typically around 10-15. In fact the largest frac error we now see is ~10-13:
AD: x%non_ideal, x%new_ref_op, x%new_bangle_op = T T T ... Checking REFRACTIVITY AD FASCOD_MLS norm1, norm2, norm1/norm2, (norm1-norm2)/norm2 = 3.3429767302E+08 3.3429767302E+08 1.0000000000E+00 -3.5659622897E-16 Checking BENDING ANGLE AD FASCOD_MLS norm1, norm2, norm1/norm2, (norm1-norm2)/norm2 = 3.3204994934E+00 3.3204994934E+00 1.0000000000E+00 1.0552219245E-13
To some extent these errors depend on the size and nature of the perturbations dx1 and dx2, which they shouldn't. But that's not enough to explain it. It turns out that the key thing is that the new tests correctly compare <HT(Hdx2) | dx1> with <Hdx2 | Hdx1> (i.e. different dxs), whereas the old tests make a more limited comparison of <HT(Hdx) | dx> with <Hdx | Hdx> (i.e. the same dx). If I set dx2=dx1 in the new test, the above fractional errors are much reduced:
Checking REFRACTIVITY AD FASCOD_MLS norm1, norm2, norm1/norm2, (norm1-norm2)/norm2 = 4.4222453757E+08 4.4222453757E+08 1.0000000000E+00 -6.7391833460E-16 Checking BENDING ANGLE AD FASCOD_MLS norm1, norm2, norm1/norm2, (norm1-norm2)/norm2 = 1.3248373286E+02 1.3248373286E+02 1.0000000000E+00 -8.5811922165E-16
Just to 'complete the square', if I use different dxs in the old AD test (which bypassed the state2state code), I get
Checking REFRACTIVITY AD FASCOD_MLS norm1, norm2, norm1/norm2, (norm1-norm2)/norm2 = 3.7118604835E+08 3.7118604835E+08 1.0000000000E+00 -1.9269467171E-15 Checking BENDING ANGLE AD FASCOD_MLS norm1, norm2, norm1/norm2, (norm1-norm2)/norm2 = 9.0808972758E-01 9.0808972758E-01 1.0000000000E+00 -1.1394555268E-13
cf
Checking REFRACTIVITY AD FASCOD_MLS norm1, norm2, norm1/norm2, (norm1-norm2)/norm2 = 3.1626024824E+08 3.1626024824E+08 1.0000000000E+00 1.8846707769E-16 Checking BENDING ANGLE AD FASCOD_MLS norm1, norm2, norm1/norm2, (norm1-norm2)/norm2 = 2.3714973614E+02 2.3714973614E+02 1.0000000000E+00 8.3892973802E-16
originally.
So I think we have to accept, for now, larger fractional differences if we use different dxs in the test. I can't think of a good reason why this should be so, given that the AD test should really be exact up to rounding error. I suggest that it points to a small infelicity somewhere in the bangle_1d_ad code (since the refracs seem OK). That's another story. But since the old and new test codes display the same behaviour when they both use two dxs (and when they use just one) it's clear that the state2state bugfix is not the cause.
In summary, I'm happy with this change, which will be implemented in ROPP-11. For the record, the core tests are OK:
************************** SUMMARY OF ROPP_FM TEST RESULTS *************************** -------------------------------------------------------------------------------------- | Test name | Description | Run? | PASS? | -------------------------------------------------------------------------------------- | t_fascod_1 | FM FASCOD 1D | Run | PASS | | t_fascod_tl_1 | FM_TL FASCOD 1D | Run | PASS | | t_fascod_ad_1 | FM_AD FASCOD 1D | Run | PASS | | t_fascod_2 | FM FASCOD 1D -comp | Run | PASS | | t_fascod_tl_2 | FM_TL FASCOD 1D -comp | Run | PASS | | t_fascod_ad_2 | FM_AD FASCOD 1D -comp | Run | PASS | | t_fascod_3 | FM FASCOD 1D -new_op | Run | PASS | | t_fascod_tl_3 | FM_TL FASCOD 1D -new_op | Run | PASS | | t_fascod_ad_3 | FM_AD FASCOD 1D -new_op | Run | PASS | | t_fascod_4 | FM FASCOD 1D -comp -new_op | Run | PASS | | t_fascod_tl_4 | FM_TL FASCOD 1D -comp -new_op | Run | PASS | | t_fascod_ad_4 | FM_AD FASCOD 1D -comp -new_op | Run | PASS | | t_twodop_1 | FM TWOD | Run | PASS | | t_twodtl_1 | FM_TL TWOD | Run | PASS | | t_twodad_1 | FM_AD TWOD | Run | PASS | | t_twodop_2 | FM TWOD -comp | Run | PASS | | t_twodtl_2 | FM_TL TWOD -comp | Run | PASS | | t_twodad_2 | FM_AD TWOD -comp | Run | PASS | | t_iono_1 | FM L1 and L2 | Run | PASS | | t_iono_tl_1 | FM_TL L1 and L2 | Run | PASS | | t_iono_ad_1 | FM_AD L1 and L2 | Run | PASS | | t_fm_1D_1 | FM 1D; default options | Run | PASS | | t_fm_1D_2 | FM 1D; compress factors | Run | PASS | | t_fm_2D_1 | FM 2D; default options | Run | PASS | | t_fm_2D_2 | FM 2D; compress factors | Run | PASS | | t_fm_iono_1 | FM iono; L_neutral | Run | PASS | | t_fm_iono_2 | FM iono; L1 and L2 | Run | PASS | --------------------------------------------------------------------------------------
Revised testing scripts committed at r6869. Changes to ropp_fm_state2state_ecmwf*.f90 will be committed as part of the work to improve the ECMWF pressure level calculations - see #552. I don't plan to add any more to this ticket.
This is weird.
ropp_fm_state2state_ecmwf_ad
is not touched in the usual testing script, but when I use Joe's testing script above, which does call it, I findBEFORE FIX:
AFTER FIX:
Huge fractional differences in the norms, which are not reduced by the fix. Moreover, the individual norms are significantly (~50%) different from Stig and Joe's above. Strange. (All 4 fascod tests fail; all the others pass.)
(I think it's a good idea, however, to test <HTdy|dx> = <dy|Hdx>, where dy is independent of dx, as Joes's script does, rather than <Hdx|Hdx>=<dx|HTHdx>, as the existing test script does. However, the two norms should agree to within a few bits, as they just depend on (AB)C=A(BC) etc, so I think the threshold for comparison should be tighter than 1e-9, not looser.)