Opened 10 years ago

Closed 4 years ago

#396 closed defect (fixed)

make test bombing out

Reported by: Ian Culverwell Owned by: Ian Culverwell
Priority: normal Milestone: 10.0
Component: ROPP (all) Version: 7.1
Keywords: Cc:

Description

ROPP8.0 beta reviewer Dave Offiler (UKMO) reports:

- ROPP_1DVAR: t_1dvar_iono_bangle seg-faults with ifort15, aborting the tests so
  that the overall PASS/FAIL table is not generated. Ignoring the cause of this
  failure for the moment, the 'make test' needs to include the option to 
  continue processing in case of any one test failure.

Agreed.

Change history (5)

comment:1 by Ian Culverwell, 8 years ago

This test runs fine with ifort15 at ROPP9.0:

idculv@eld037:> test_1dvar_iono.sh                                                                                                                 
 
Running t_1dvar_iono_bangle (1DVAR L1 and L2) ...

----------------------------------------------------------------------
                    ROPP 1DVAR File Comparison Tool
----------------------------------------------------------------------

... (from ropp_1dvar_compare):  Comparing anl20090401_000329_M02_2030337800_N0007_YYYY_iono.nc and ../data/anl20090401_000329_M02_2030337800_N0007_YYYY_iono_reference.nc: the results of running test t_1dvar_iono_bangle (1DVAR L1 and L2)
... (from ropp_1dvar_compare):  Both files contain    1 profiles
... (from ropp_1dvar_compare):  No significant differences between anl20090401_000329_M02_2030337800_N0007_YYYY_iono.nc and ../data/anl20090401_000329_M02_2030337800_N0007_YYYY_iono_reference.nc
****************************
**********  PASS  **********
****************************
... examine t_1dvar_iono_bangle.log for details

There are tiny (<~1e-12) differences between the result and the reference (ifort12)?

(This is with a slightly more recent version of ifort15 (v15.0.2 20150121) than Dave's (v15.0.1 20141023 ).)

I rebuilt ROPP80_prototype with ifort15 (note that there were no ifort15 mini config scripts at ROPP8.0, so this was officially an unsupported compiler at that release, so I don't know why Dave was testing it), and again it was OK (apart from the same optimisation issue dealt with in #392).

comment:2 by Ian Culverwell, 8 years ago

I suspect it is a memory issue. Retrievals using L1 and L2 need an obs vector that is twice as long as usual. Axel had the same problems, which were fixed by setting ulimit -S -s unlimited. A note to this effect was added to the ROPP8.0 Release Notes.

My default stack size limit (ulimit -s) is 10240. If I reduce this to 1024, I get a seg fault with test_1dvar_iono.sh:

idculv@eld037:> test_1dvar_iono.sh
 
Running t_1dvar_iono_bangle (1DVAR L1 and L2) ...
test_1dvar_iono.sh: line 54: 17521 Segmentation fault      (core dumped) ./$EXEC ${EXTRA_CMD} -d -y $OBFILE1 --obs-corr $OBCOV -b $BGFILE1 --bg-corr $BGCOV -c $CONFIG -o $OFILE >> $LOGFILE

but the other 1dvar tests still run OK, eg

idculv@eld037:> test_1dvar_COSMIC_bangle.sh
 
Running t_1dvar_COSMIC_04_bangle (1DVAR COSMIC; default) ...
****************************
**********  PASS  **********
****************************
... examine t_1dvar_COSMIC_04_bangle.log for details
 
 
Running t_1dvar_COSMIC_04comp_bangle (1DVAR COSMIC; comp factors) ...
****************************
**********  PASS  **********
****************************
... examine t_1dvar_COSMIC_04comp_bangle.log for details
 
 
Running t_1dvar_COSMIC_04newop_bangle (1DVAR COSMIC; new interp) ...
****************************
**********  PASS  **********
****************************
... examine t_1dvar_COSMIC_04newop_bangle.log for details

comment:3 by Ian Culverwell, 8 years ago

Milestone: 9.010.0

Some experimentation on my linux desktop suggests we need a stack size of at least 9300 kilobytes. We could therefore put a check on ulimit -s in the script, and bail out with a 'not run' result if the stack size is likely to be too small. Defer until ROPP10.0.

comment:4 by Ian Culverwell, 8 years ago

Something like

if [[ $(ulimit -s) != "unlimited" ]] ; then
  if [[ $(ulimit -s) -lt 10000 ]] ; then
    echo "Stack size $(ulimit -s) insufficient to run test ... increase to at least 10000 KiB and rerun"
    exit
  fi
fi

would probably do it.

(While we're there, fix the typo in the robodoc-compliant header of test_1dvar_iono.sh:

#****s* fm/test_1dvar_iono.sh

should read

#****s* 1dvar/test_1dvar_iono.sh

)

comment:5 by Ian Culverwell, 4 years ago

Resolution: fixed
Status: newclosed

We run into the same problem with the new laptops, which have a stack size of 8192 kB. The problem has been fixed by adding a line

ulimit -S -s unlimited  # This test needs a stack size over 8192 Kibytes.

to the top of ropp_1dvar/tests/test_1dvar_iono.sh.

Closing ticket.

Note: See TracTickets for help on using tickets.