From: Stig Syndergaard [ssy@dmi.dk]
Sent: 23 October 2013 13:20
To: Culverwell, Ian
Subject: Re: ropp_pp_preprocess_grasrs madness

Follow Up Flag: Follow up
Flag Status: Red

Attachments: IT-PP-05_prof3_cntl1.out; IT-PP-05_prof3_cntl.out;
IT-PP-05_prof3_nopurple_cntl1.out; IT-PP-05_prof3_nopurple_cntl.out;
IT-PP-05_prof3_cntl2.out; IT-PP-05_prof3_nopurple_cntl2.out;
IT-PP-05_prof3_cntl3.out; IT-PP-05_prof3_nopurple_cntl3.out;
IT-PP-05_prof3_cntl4.out; IT-PP-05_prof3_nopurple_cntl4.out;
ropp_pp_cutoff_amplitude.f90; ropp_pp_preprocess_grasrs.f90;
ropp_pp_utils.f90; residual_xleo.png; residual_yleo.png;
residual_zleo.png; residual_xgns.png; residual_ygns.png;
residual_zgns.png

Hi Ian,

Sorry it took me so long to get back to you on this. I may have found out what the trouble is, also in the cases where you run different compilers, but it's a bit of a long story. Take your time chewing your way through this email - I know it's a long one.

I did the following:

1) I used dmi_trunk_6.1 for my tests (but I don't think that matters)

2) I used the ifort compiler (this probably explains differences between yours and my results, but in essence it's not so important which compiler I used).

3) IT-PP-05_prof3_cntl1.out:
First I ran ropp_pp_occ_tool on IT-PP-05_prof3.nc using the config file gras_pp_dmi.cf that you provided (and with -d). Resulting IT-PP-05_prof3_cntl1.out file is attached. Apart from the extra print statements it is similar to your IT-PP-05_prof3_cntl.out (also attached for reference). There are some differences at the cm level, which I consider a result of compiler/platform differences, but they are small in this case.

4) IT-PP-05_prof3_nopurple_cntl1.out:
I then did the same with IT-PP-05_prof3_nopurple.nc as input.
IT-PP-05_prof3_nopurple_cntl1.out file is attached (Your IT-PP-05_prof3_nopurple_cntl.out attached for reference). Much bigger differences starting with DPO etc. I suppose these must also be due to different compiler/platforms. I cannot explain why our different compilers/platforms choose to make a much bigger difference on the decimated (nopurple) case. But I accept that as a fact (and a possible reason follows below). Comparing my IT-PP-05_prof3_cntl1.out and IT-PP-05_prof3_nopurple_cntl1.out files, differences are only at the cm level, but as you said, it is not clear why there should be any difference at all. So I decided to track down the reason.

5) ropp_pp_cutoff_amplitude.f90:
First I found what I think is a small bug in ropp_pp_cutoff_amplitude, In the nopurple_cntl run there are (correctly) 154144 samples after the 
CL+RS merge, but 154145 (one too many) in the cntl run. The reason is in
line 116 and 134 in the attached modified ropp_pp_cutoff_amplitude.f90 file (the out-commented lines in section 3.1), where I believe it should be imax = MIN(i-1, imax) (for setting) and imin = MAX(i+1, imin) (for rising). Otherwise the first bad sample with lcf = 45 and phase_L1 = ropp_MDFV gets included.

6) IT-PP-05_prof3_cntl2.out and IT-PP-05_prof3_nopurple_cntl2.out:
The output after modifying ropp_pp_cutoff_amplitude is attached as IT-PP-05_prof3_cntl2.out and IT-PP-05_prof3_nopurple_cntl2.out. Still I get the same differences in Pmax etc. further downstream, so I don't think the bug in ropp_pp_cutoff_amplitude is the culprit, just a minor thing. It did make some minor differences to the *.nc output (not attached), which can be understood from the next paragraph.

7) IT-PP-05_prof3_cntl3.out and IT-PP-05_prof3_nopurple_cntl3.out: 
ropp_pp_cutoff_amplitude is called twice, first time from ropp_pp_preprocess with all lcf = 0, so that data are not cut because of the lcf flag first time around. This means that the number of samples going into ropp_pp_preprocess_GRASRS is larger in the cntl run than in the nopurple_cntl run (and in the cntl run it is one sample larger with the bug in ropp_pp_cutoff_amplitude than without the bug). Data are processed in ropp_pp_preprocess_GRASRS according the lcf, so one should think that it didn't matter how many samples were in ro_data. But the orbits are fitted in ropp_pp_preprocess_GRASRS using the whole sample size, regardless of the lcf, and that's where the differences come from. 
ropp_pp_cutoff_amplitude is called a second time after the call to ropp_pp_preprocess_GRASRS. After that the sample sizes are the same in the cntl and nopurple_cntl runs. But interpolated orbits (and from thereon merged and interpolated phases, amplitudes, roc, etc) are slightly different in the cntl and nopurple_cntl runs. If I fix the number of merged samples to be only those with good lcf (see commented lines in section 4.2 and 6.1 of the attached ropp_pp_preprocess_grasrs.f90), then I get IT-PP-05_prof3_cntl3.out and IT-PP-05_prof3_nopurple_cntl3.out. They are identical. But this is not the solution to the problem; fixing the sample size is only possible here because I know that the samples that have been left out in the nopurple run are at the end of the profile. Anyways it would not solve the issue with getting different results with different compilers.

I almost stopped there and was gonna write you a (shorter) email with the above findings, when it occurred to me that the compiler issues might be due to uncertainty in solving for the inverse of a 5x5 matrix when fitting the orbit positions in ropp_pp_preprocess_GRASRS (by the call to ropp_pp_regression).

8) png files:
The matrix inversion in ropp_pp_regression has some uncertainty to it. 
The uncertainty is relatively large when fitting orbit positions (here with a 5 degree polynomial), because position samples are large and almost equal, and therefore the solution regression coefficients (coeff_vleo and coeff_vgns) are a bit uncertain. When used in the following interpolation, they result in a small orbit bias. To illustrate, I used ropp_pp_polynomial to reconstruct the positions on the original grid from coeff_vleo and coeff_vgns (intermediate code mods not attached). The png plots show the residuals when I subtract the polynomial result from the original positions for all three coordinates for both leo and gns, This is what I call '1st residual' in the plots (red for the cntl run and blue for the nopurple_cntl run; I'll get back the '2nd' residual' below). Biases are a few to several mm. This shouldn't matter so much (and it doesn't in my runs), but as you see, the bias is generally larger for the nopurple run. I think it is a bit coincidental which one is larger (cntl or nopurple_cntl), but it shows that there can be unforeseen differences for different runs, and my thought is that maybe there can be larger differences with different compilers.

9) ropp_pp_preprocess_grasrs.f90 and ropp_pp_utils.f90:
To fix this possible instability, I added a new subroutine to the attached ropp_pp_utils.f90, which I named ropp_pp_residual_regression. 
In the attached ropp_pp_preprocess_grasrs.f90 it is called in section
6.1 right after the call to ropp_pp_regression. It repeats the regression on the residual positions. In ropp_pp_residual_regression the residual coefficients are then added to the already found coeff_vleo and coeff_vgns, making them more accurate. A subsequent call to ropp_pp_polynomial (intermediate code mods not attached) using these updated coefficients gives an unbiased fit to the original positions. 
The resulting residual is shown in the png plots as the '2nd residual'. 
Still the leo positions are not fitted precisely with a 5 degree polynomial, but there is no overall bias, and the '2nd residual' is almost the same for the cntl and nopurple_cntl runs (the small difference is alright since it is a different number of samples that are fitted in the two cases). The gns positions are fitted perfectly with a
5 degree polynomial (the '2nd residual' for both cntl and nopurple_cntl is less than 1e-4 mm noise when I zoom in; not shown).

10) IT-PP-05_prof3_cntl4.out and IT-PP-05_prof3_nopurple_cntl4.out:
Running with these modifications (using the attached ropp_pp_preprocess_grasrs.f90 and ropp_pp_utils.f90) I get IT-PP-05_prof3_cntl4.out and IT-PP-05_prof3_nopurple_cntl4.out. Still some differences, a bit smaller, but not a lot. I think these remaining differences can be explained by the differences in the '2nd residuals' 
of the leo. My hope is that if you implement the suggested changes in your codes (as a try) you won't see much difference, no matter which compiler you use. But you will probably also have to call ropp_pp_residual_regression in other places in ropp_pp when orbit positions are being fitted, since uncertainty in the matrix inversion could be/probably will be in those places as well (I haven't done that, but that's why I made it as a subroutine that can be plugged in other places as well). Such calls should be placed right after calls to ropp_pp_regression.

Well, thanks for reading to the end of this mega mail. Hope all this helps. Let me know what you find if you try this out.

See you soon at the team meeting.
-Stig


On 2013-09-30 10:29, Culverwell, Ian wrote:
> Thanks, Stig.
>
> Ian.
>
>> -----Original Message-----
>> From: Stig Syndergaard [mailto:ssy@dmi.dk]
>> Sent: 30 September 2013 09:26
>> To: Culverwell, Ian
>> Subject: Re: ropp_pp_preprocess_grasrs madness
>>
>> Hi Ian,
>>
>> Thanks for all this. Yes, I noticed I got a 'beta-tester'
>> email. For a moment I was a little perplexed, but I guessed you would 
>> follow up with another email, and you did :-) I'll have a look over 
>> the next days. I'll let you know what I find.
>>
>> Best,
>> -Stig
>>
>> On 2013-09-30 10:12, Culverwell, Ian wrote:
>>> Hi Stig,
>>>
>>> Rather than clog up your mailbox, I've put the following
>> files under
>>> the MISCELLANEOUS/ROPP_PP_OCC_TOOL folder of
>>>
>>> http://www.romsaf.org/ropp_beta/files.php,
>>>
>>> which you should be able to see now that I've made you a
>> beta tester
>>> ;-)
>>>
>>>
>>> 1) IT-PP-05_prof3.nc = input to ropp_pp_occ_tool
>>>
>>> 2) IT-PP-05_prof3_cntl.nc = output, running ifort12 on a linux box.
>>>
>>> 3) IT-PP-05_prof3_cntl.out = diagnostic output of the
>> above.  (You'll
>>> see I've added some print statements to the code.)
>>>
>>> There are analagous files called
>> "IT-PP-05_prof3_nopurple.nc" etc, which are the same except that the 
>> missing data portions (the purple and orange regions of 
>> https://trac.romsaf.org/ropp/attachment/ticket/301/phase_L1.gi
>> f) have been removed from the input file.
>>>
>>> I've also included the script I use to run it, test4.sh,
>> and the standard MSIS_coeff.nc and gras_pp.cf files.
>>>
>>> Removing the missing data points doesn't have a very big
>> effect on the results, but I don't see why there should be any.  As I 
>> said before, the first big difference is in the calculation of DP0 in 
>> ropp_pp_DCT, although we do end up with slightly different numbers of 
>> points (154144 cf 154145) on exit from ropp_pp_cutoff_amplitude.  
>> However, I tried artificially reducing the larger one to 154144, and 
>> it didn't stop the DP0s from diverging later.
>>>
>>> This problem isn't urgent but it is indicative of some
>> weirdness in ropp_pp_occ_tool which it would be nice to understand 
>> (and eliminate). I'll keep working on it in the background.
>>>
>>> Let's not worry about the xlf95-ifort differences for now!
>>>
>>> Ian.
>>>
>>>
>>>> -----Original Message-----
>>>> From: Culverwell, Ian
>>>> Sent: 27 September 2013 15:27
>>>> To: 'Stig Syndergaard'
>>>> Cc: Culverwell, Ian
>>>> Subject: RE: ropp_pp_preprocess_grasrs madness
>>>>
>>>> Hi Stig,
>>>>
>>>> Thanks - yes, please don't let it spoil your weekend - I certainly 
>>>> won't!
>>>>
>>>> I'll send you some more stuff next week.
>>>>
>>>> Have a good weekend,
>>>> Ian.
>>>>
>>>>> -----Original Message-----
>>>>> From: Stig Syndergaard [mailto:ssy@dmi.dk]
>>>>> Sent: 27 September 2013 14:55
>>>>> To: Culverwell, Ian
>>>>> Subject: Re: ropp_pp_preprocess_grasrs madness
>>>>>
>>>>> Hi Ian,
>>>>>
>>>>> Thanks for this. I'm laughing loud... no not really, but
>>>> smiling over
>>>>> your way of putting it ;-)
>>>>>
>>>>> I can't say that I've seen something like this, I haven't really 
>>>>> looked at the raw sampling processing, but I think I've seen much 
>>>>> crazy behavior if missing data enters the processing.
>> That was just
>>>>> with the CL data. I don't know why the phases are set to -1 in 
>>>>> ropp_pp_preprocess_grasrs as you show, that looks odd.
>>>>>
>>>>> If you send me the input data, I can take a closer look
>> at the code
>>>>> next week to see if I can understand what's going on (but
>> only with
>>>>> ifort on linux). Seems like you are close to finding the
>> problem by
>>>>> tracking why DPO gets different values.
>>>>>
>>>>> I don't have any immediate thoughts, but will let you
>> know if that
>>>>> should happen.... not over the weekend though, I hope to
>> laugh over
>>>>> less brain-wringing things :-)
>>>>>
>>>>> Have a nice weekend!
>>>>> -Stig
>>>>>
>>>>> On 2013-09-27 14:49, Culverwell, Ian wrote:
>>>>>> Hi Stig,
>>>>>>
>>>>>> In case you felt like a laugh on a Friday afternoon, I
>>>>> thought I'd send you the attached, which shows the effect
>>>> of removing
>>>>> the _missing phase_L1 data_ (both ropp_MDFV and netCDF
>> missing data
>>>>> indicator) from the problematic 3rd profile in the GRAS 2007 raw 
>>>>> sampling dataset that we were looking at earlier.
>>>>>>
>>>>>> One might have hoped that removing missing data would have
>>>>> no impact on processed bangle/refrac.  One would apparently
>>>> be wrong.
>>>>> This is just madness!  Even worse, the bottom panel shows that 
>>>>> removing this data has the opposite effect on a linux box
>>>> with ifort
>>>>> than on a HPC with xlf95.
>>>>>>
>>>>>> It's possible I've cocked this up, but I've checked and
>>>>> checked and I don't think so.
>>>>>>
>>>>>> The big differences start appearinf in ropp_pp_DCT: the
>>>>> file with missing phase_L1 data has DP0 =   2067.095; the one
>>>>> without has DP0 =   2225.461.  After this, everything is
>> different.
>>>>>>
>>>>>> Are you surprised by this?  Have you seen anything like it
>>>>> before?  There's a ROPP ticket on the need for better
>> treatment of
>>>>> missing data in ropp_pp, and, if confirmed, this result
>>>> lends weight
>>>>> to it.
>>>>>>
>>>>>> For the record, ropp_pp_preprocess_grasrs handles
>> missing data by:
>>>>>>
>>>>>>      WHERE(ro_data%Lev1a%phase_L1(:) == ropp_MDFV)
>>>>>>        ro_data%Lev1a%phase_L1(:) = -1.0_wp
>>>>>>      ENDWHERE
>>>>>>
>>>>>>      WHERE(ro_data%Lev1a%phase_L2(:) == ropp_MDFV)
>>>>>>        ro_data%Lev1a%phase_L2(:) = -1.0_wp
>>>>>>      ENDWHERE
>>>>>>
>>>>>> I don't know why.
>>>>>>
>>>>>> Any thoughts?
>>>>>>
>>>>>> Thanks,
>>>>>> Ian.
>>>>>>
>>>>>
>>>>> --
>>>>> __________________________________________________________________
>>>>> Stig Syndergaard, PhD
>>>>> Danish Meteorological Institute              Phone: +45 39 157 408
>>>>> Lyngbyvej 100                                  Fax: +45 39 157 460
>>>>> DK-2100 Copenhagen                              E-mail: ssy@dmi.dk
>>>>> __________________________________________________________________
>>>>>
>>
>> --
>> __________________________________________________________________
>> Stig Syndergaard, PhD
>> Danish Meteorological Institute              Phone: +45 39 157 408
>> Lyngbyvej 100                                  Fax: +45 39 157 460
>> DK-2100 Copenhagen                              E-mail: ssy@dmi.dk
>> __________________________________________________________________
>>

--
__________________________________________________________________
Stig Syndergaard, PhD
Danish Meteorological Institute              Phone: +45 39 157 408
Lyngbyvej 100                                  Fax: +45 39 157 460
DK-2100 Copenhagen                              E-mail: ssy@dmi.dk
__________________________________________________________________