Opened 16 years ago

Closed 16 years ago

#152 closed task (fixed)

Building on 64-bit platforms

Reported by: Dave Offiler Owned by: Dave Offiler
Priority: normal Milestone: 2.0
Component: ROPP (all) Version: 2.0beta
Keywords: 64-bit, IA64, AMD64 Cc:

Description

This ticket is a follow-up (to Ticket #133) on the generic issue of building ROPP and its dependencies consistently on 64-bit architectures. Ticket #133 solved the specific NEC problem (ucar2ropp - and anything else using udunits - crashes with a segmentation fault), but this may or may not be applicable to other systems and/or compilers.

Josep Aparicio's ROPP v2.0beta testing flagged up a segmentation fault when running ucar2ropp compiled with GFortran (unspecified architecture, but assumed to be 64-bit); I can reproduce this on my personal AMD64-based Linux system with ifort ('bus error' in this case)' and GFortran, so this confirms that the issue is not limited to the NEC, and so needs a general solution.

Need to:

(a) investigate solutions to these systems and

(b) extend to generic 64-bit machines by auto-detecting 64-bit and setting appropriate compiler flags where possible (ideally within the configure macros, else provide special versions of the configure mini-scripts?)

(c) BUFR will need a straight forward custom job (or replace with ECMWF if this supports 64-bit already?)

Change history (3)

comment:1 by Dave Offiler, 16 years ago

Making all F90 compile with 64-bit default INTEGERs (and setting -DL64 for the BUFR C code) is probably not a generic solution; while gcc generates 64-bit pointers, ints are still 32-bit (and long are 64-bit). Hence if F90 INTEGERS are defaulted to 64bit, F90/C interfaces using standard integer types INTEGER/int) will be mis-matched. Trying this on the AMD64 with GFortran allowed ucar2ropp to work ok, but BUFR now fails :-(.

There does not seem to be any way to make C ints 64-bit: one assumes that the C standard is int = 4byte, long = 8 byte, always. [NB this is true on the NEC too, using the simple program:

#include <stdio.h>
int main()
{
  int i;
  long l;
  printf("Pointer: %ld  Int: %ld  Long: %ld\n", 
          sizeof(char*), sizeof(i), sizeof(l));
}

so the solution in Ticket 133 shouldn't have worked...]

Trying a new tack: just make the interfacing integers in udunits.f90 explicitly 8-byte and leave the F90 compiler default (and BUFR C) as 32-bit. To Be Reported...

comment:2 by Dave Offiler, 16 years ago

A quick'n'dirty test by editing udunits.f90 and changing the line defining the integer variables holding the C pointers from the lower-level udunits library:

  • change integer to integer*4 and re-test on 32-bit desktop (OK)
  • change integer to integer*8 and test on home AMD64 platform (GFortran) - all ropp_io tests ran OK. (NB: using the same configure mini-scripts as normal)

Manually editing this file according to the platform isn't of course viable for users of the ROPP package, so this needs to be automated. Created an SVN branch DO_64bit to play with this issue.

The udunits package contains an m4 macro which in effect runs a simpler version of the C-code (above) to print to stdout '4' or '8', being the result of the sizeof(char*) call and prefixes integer* into the variable UD_POINTER. This macro has been extracted into a stand-alone file ropp_io/m4/ac_fortran_ptr.m4 and the prefix made uppercase. This macro is built into the aclocal.m4 file by (re-)running the aclocal -I m4 --force command and thence into configure by autoconf.

ropp_io/udunits/udunits.m4 then edited to INTEGER*4 as an explicit default 32-bit size, and make run to regenerate udunits.f90.

Finally, ropp_io/udunits/Makefile.am modified to add a new target edit_pointer_size which forces, via a pair of sed commands, any INTEGER*4 or INTEGER*8 (one will always be redundant) to the translation of UD_POINTER. This ensures that (in normal circumstances) the provided *4 will be changed to *8 when ROPP_IO is built on a 64-bit machine, but will revert to *4 on a 32-bit machine should the udunits.f90 file have been created from udunits.m4 on a 64-bit.

Running automake to re-generate udunits/Makefile.in and then configure in the usual way will create a new udunits/Makefile which, when applied directly or via the master Makefile will generate the correct Fortran integer size for the current machine. This has been tested on 32-bit Desktop and home AMD64 and works as expected.

[At the time of writing, it has not been tested on the NEC TX, but is expected to do the on-the-fly edit just the same; in this case the 'build everything in 64-bit' approach tested earlier should still work because the default integer was 64-bit anyway, and is now just made explicit. What does need to be tested is if this new approach works on the NEC without the 'all-64' set up.]

This new system has been tested on home AMD64 by building all dependencies (netCDF, udunits, BUFR) from scratch, then ropp_utils and ropp_io using ifort (v10.1), g95 and gfortran (all 64-bit versions; g95 with 32-bit default ints) and then running the ropp_io make test suite.

In all cases, all tests completed with PASS (except the text-->netCDF which always shows a nominal FAIL due to precision limits in the text format). Previously, ucar2ropp was crashing (original reason for this Ticket) or solutions which cured this caused BUFR conversion problems.

So problem sorted? No - still need to test for no impact on PP, FM & 1DVAR.

PP & FM test ok with all 3 compilers on AMD64, as does 1DVAR with gfortran (so we can now support Josep's scenario). However, with both ifort and g95 the 1dvar standalone tool totally freezes the machine (g95 on starting, ifort after the 2nd profile) requiring a total power-off and re-boot. This is not supposed to occur with Linux! So more investigation required, but probably unrelated to the original problem of this Ticket, but still a general 64-bit issue.

comment:3 by Dave Offiler, 16 years ago

Milestone: 3.02.0
Resolution: fixed
Status: newclosed

Tried again, adding some print statements to ropp_1dvar_refrac.f90 to see which was the problem section - ran perfectly! Removed prints & still OK. (While I was editing this file, I took the opportunity to tidy it up somewhat e.g. coding trivial sub-routines in-line and removing the call to messages).

Wiped local build (compiler-specific) directories and rebuild all dependency packages (with build_deps) & ropp modules (with build_ropp) using ifort, g95 and gfc. Checked resulting logs, and all builds & tests show OK.

Finally, tried this solution on NEC TX without forcing all 64-bit (i.e. reverting solution as per #133); this again gives SEGFAULT when running the ucar2ropp test, so reverting to #133 solution for efc configures, which works again (with the INTEGER*8 auto-edit in place).

During testing, it seems that NEC Linux doesn't support 'sed -i', so a work-around with a temporary file has been included in udunits/Makefile.am. This is now included in the DO_64bit branch.

This ticket now closed. Branch merged with trunk as [1982]

Note: See TracTickets for help on using tickets.