Opened 11 years ago
Closed 5 years ago
#396 closed defect (fixed)
make test bombing out
| Reported by: | Ian Culverwell | Owned by: | Ian Culverwell | 
|---|---|---|---|
| Priority: | normal | Milestone: | 10.0 | 
| Component: | ROPP (all) | Version: | 7.1 | 
| Keywords: | Cc: | 
Description
ROPP8.0 beta reviewer Dave Offiler (UKMO) reports:
- ROPP_1DVAR: t_1dvar_iono_bangle seg-faults with ifort15, aborting the tests so that the overall PASS/FAIL table is not generated. Ignoring the cause of this failure for the moment, the 'make test' needs to include the option to continue processing in case of any one test failure.
Agreed.
Change history (5)
comment:1 by , 9 years ago
comment:2 by , 9 years ago
I suspect it is a memory issue.  Retrievals using L1 and L2 need an obs vector that is twice as long as usual.  Axel had the same problems, which were fixed by setting ulimit -S -s unlimited.  A note to this effect was added to the ROPP8.0 Release Notes.
My default stack size limit (ulimit -s) is 10240.  If I reduce this to 1024, I get a seg fault with test_1dvar_iono.sh:
idculv@eld037:> test_1dvar_iono.sh
 
Running t_1dvar_iono_bangle (1DVAR L1 and L2) ...
test_1dvar_iono.sh: line 54: 17521 Segmentation fault      (core dumped) ./$EXEC ${EXTRA_CMD} -d -y $OBFILE1 --obs-corr $OBCOV -b $BGFILE1 --bg-corr $BGCOV -c $CONFIG -o $OFILE >> $LOGFILE
but the other 1dvar tests still run OK, eg
idculv@eld037:> test_1dvar_COSMIC_bangle.sh Running t_1dvar_COSMIC_04_bangle (1DVAR COSMIC; default) ... **************************** ********** PASS ********** **************************** ... examine t_1dvar_COSMIC_04_bangle.log for details Running t_1dvar_COSMIC_04comp_bangle (1DVAR COSMIC; comp factors) ... **************************** ********** PASS ********** **************************** ... examine t_1dvar_COSMIC_04comp_bangle.log for details Running t_1dvar_COSMIC_04newop_bangle (1DVAR COSMIC; new interp) ... **************************** ********** PASS ********** **************************** ... examine t_1dvar_COSMIC_04newop_bangle.log for details
comment:3 by , 9 years ago
| Milestone: | 9.0 → 10.0 | 
|---|
Some experimentation on my linux desktop suggests we need a stack size of at least 9300 kilobytes. We could therefore put a check on ulimit -s in the script, and bail out with a 'not run' result if the stack size is likely to be too small.  Defer until ROPP10.0. 
comment:4 by , 9 years ago
Something like
if [[ $(ulimit -s) != "unlimited" ]] ; then
  if [[ $(ulimit -s) -lt 10000 ]] ; then
    echo "Stack size $(ulimit -s) insufficient to run test ... increase to at least 10000 KiB and rerun"
    exit
  fi
fi
would probably do it.
(While we're there, fix the typo in the robodoc-compliant header of test_1dvar_iono.sh:
#****s* fm/test_1dvar_iono.sh
should read
#****s* 1dvar/test_1dvar_iono.sh
)
comment:5 by , 5 years ago
| Resolution: | → fixed | 
|---|---|
| Status: | new → closed | 
We run into the same problem with the new laptops, which have a stack size of 8192 kB. The problem has been fixed by adding a line
ulimit -S -s unlimited # This test needs a stack size over 8192 Kibytes.
to the top of ropp_1dvar/tests/test_1dvar_iono.sh.
Closing ticket.


This test runs fine with ifort15 at ROPP9.0:
idculv@eld037:> test_1dvar_iono.sh Running t_1dvar_iono_bangle (1DVAR L1 and L2) ... ---------------------------------------------------------------------- ROPP 1DVAR File Comparison Tool ---------------------------------------------------------------------- ... (from ropp_1dvar_compare): Comparing anl20090401_000329_M02_2030337800_N0007_YYYY_iono.nc and ../data/anl20090401_000329_M02_2030337800_N0007_YYYY_iono_reference.nc: the results of running test t_1dvar_iono_bangle (1DVAR L1 and L2) ... (from ropp_1dvar_compare): Both files contain 1 profiles ... (from ropp_1dvar_compare): No significant differences between anl20090401_000329_M02_2030337800_N0007_YYYY_iono.nc and ../data/anl20090401_000329_M02_2030337800_N0007_YYYY_iono_reference.nc **************************** ********** PASS ********** **************************** ... examine t_1dvar_iono_bangle.log for detailsThere are tiny (<~1e-12) differences between the result and the reference (ifort12)?
(This is with a slightly more recent version of ifort15 (v15.0.2 20150121) than Dave's (v15.0.1 20141023 ).)
I rebuilt ROPP80_prototype with ifort15 (note that there were no ifort15 mini config scripts at ROPP8.0, so this was officially an unsupported compiler at that release, so I don't know why Dave was testing it), and again it was OK (apart from the same optimisation issue dealt with in #392).