Opened 12 years ago
Closed 11 years ago
#297 closed enhancement (fixed)
Tools should return a status code to the calling shell
Reported by: | Dave Offiler | Owned by: | Dave Offiler |
---|---|---|---|
Priority: | normal | Milestone: | 8.0 |
Component: | ROPP (all) | Version: | 5.0 |
Keywords: | exit, status code | Cc: |
Description
DMI request that ROPP should have a consistent status code return from both routines and shell-callable tools. The most obvious candidate for the former is to return a status code from the ropp_io_rangecheck() subroutine.
Details TDB, but as a minimum a non-zero value should indicate that some parameter was flagged as out-of-range and set to 'missing'. Additional info on the Level(s) affected (BA, Ref...) would be useful.
Requirements for what info should be returned needs to be established with DMI & EUM (Axel)
Change history (4)
comment:1 by , 11 years ago
Milestone: | 7.0 → 8.0 |
---|
comment:2 by , 11 years ago
Component: | ropp_io → ROPP (all) |
---|---|
Keywords: | exit added; range check removed |
Status: | new → accepted |
Summary: | RangeCheck should return a status code → Tools should return a status code to the calling shell |
Discussions with Stig clarified that DMI require a consistent shell-level exit code. Hence it is not necessary to provide subroutine-level exit codes as the original title of this ticket ("RangeCheck should return a status exit code") would suggest.
Since all non-trivial status conditions are (or should be) passed to the message module, the simplest way to achieve the requirement would be to save the highest status code passed to ropp_message() - whether the message is enabled or disabled from being output - as a global variable in that module, and call the EXIT() routine with that variable at the end of the main program. Where EXIT() is called earlier - e.g. some unrecoverable condition like a missing input file - a set of fixed parameters could be defined in the message module. Standard conditions are suggested: 0=OK, 1=WARNING, 2=ERROR, 3=FATAL ERROR.
This concept has been implemented in the ground-based GWV code, and found to work very well. However, the ROPP code is significantly more complex and there are many more tools involved, so even this relatively simple approach will not be a trivial amount of work.
Noting that the IBM AIX xlf95 has a bug which prevents EXIT() from returning the expected value, a wrapper interface similar to the one needed for NAG compilers (nag_interfaces.f90) will be needed for this platform to retain full portability for this method of passing an exit code. (This bug has been reported to IBM, but as this is a POSIX extension, not an ISO-standard Fortran intrinsic, their response was a "won't fix". Their own version, exit_() does work, and a wrapper xlf_interfaces.f90 has been successfully tested with the GWV code on the HPC.)
As an alternative to CALL EXIT(), the ISO-standard STOP n where 'n' is a non-negative integer value, or STOP message where message is a fixed character literal could be used. However, it appears that while this syntax is standard, there is no actual requirement for the compiler implementation to do anything with it, and some OS/compiler combinations do not pass the value of n to the shell (or it's equivalent). Also, the text output by the STOP command is compiler-dependent, so it would not be feasible to parse the output for a particular message text.
A new branch do_exitcodes has been created to deal with this ticket.
comment:3 by , 11 years ago
Stig confirms (email 27 Mar):
Sorry that it took us so long to get back to you on this. I talked to Hallgeir about it today, and we agree that what you suggest is exactly what we need. It is not so important to us whether we have the 0,1,2,3 as you suggest, or something slightly different, but your suggestion is fine with us. The important thing is that it will allow us to control easily what we want our processing to do on such exits (bailing out or setting a flag, or some such thing).
While waiting for Stig's reply, I have anyway implemented the proposed changes to the ROPP_UTILS module in my dedicated branch, including the xlf_interfaces.f90 wrapper. The t_version.f90 test program has been modified to finish with a CALL EXIT(MSG_EXIT_STATUS) and tested with all the defined message types with ifort12 on Linux and xlf95 on AIX; the exit codes at shell level (variable $?) in all cases were as expected.
The test program is now left at the default info message type which will result in a zero exit code assuming the test completes normally.
With Stig's OK above, the exit code scheme should now be applied consistently across all the ROPP modules' tools. This will involve adding a similar exit call at the normal end point of the program (if there isn't one already) with the saved exit status code as its argument, and identifying & modifying appropriately any other instances where an exit call might be made. Ideally, forced exits should only be made in the top-level tool code, but subroutines will be scanned too.
Devising a test procedure which will force a non-info test message and hence a non-zero exit code will be a further step. The ROPP GG might advise on whether such a fail-test should be part of the usual Test Folder system.
comment:4 by , 11 years ago
Resolution: | → fixed |
---|---|
Status: | accepted → closed |
Created new script t_toolexit.sh (at ropp_src root level) to automate simple tests of tools' exit code values.
All tools in all modules have been modified to use the proposed exit value method as implemented in ropp_utils noted above. Having edited, recompiled & manually test-run each tool in turn, auto-tested by re-building with top-level build_ropp ifort ropp_<mod> (configure/make/make install/make test sequence) then running t_toolexit.sh in each directory where executables are built. All tests PASS.
NB: Only tested with ifort (v12 on RHEL6 & v14 on OpenSUSE 13.1). Not anticipating issues with other compilers when merged with v8.0 prototype branch for more comprehensive portability testing , so closing this ticket as fixed.
Need more though on what the return code should be: simple 0 (pass) & 1 (fail) or more codes indicating type of failure? Not for V7.0