Reading writing fortran direct access unformatted files with different compilers

deepak picture deepak · Sep 21, 2015 · Viewed 7.9k times · Source

I have a section in a program that writes a direct-access binary file as follows:

open (53, file=filename, form='unformatted', status='unknown',
& access='direct',action='write',recl=320*385*8)
write (53,rec=1) ulat
write (53,rec=2) ulng
close(53)

This program is compiled with ifort. However, I cannot reconstruct the data correctly if I read the data file from a different program compiled with gfortran. If the program reading the data is also compiled in ifort, then I can correctly reconstruct the data. Here's the code reading the data file:

OPEN(53, FILE=fname, form="unformatted", status="unknown", access="direct", action="read", recl=320*385*8)
READ(53,REC=2) DAT

I do not understand why this is happening? I can read the first record correctly with both compilers, it's the second record that I cannot reconstruct properly if I mix the compilers.

Answer

casey picture casey · Sep 21, 2015

Ifort and gfortran do not use the same block size for record length by default. In ifort, the value of recl in your open statement is in 4-byte blocks, so your record length isn't 985,600 bytes, it is 3,942,400 bytes long. That means the records are written at intervals of 3.9 million bytes.

gfortran uses a recl block size of 1 byte and your record length is 985,600 byes. When you read the first record, everything works, but when you read the second record you look at 985,600 bytes into the file but the data is at 3,942,400 bytes into the file. This also means you are wasting a ton of data in the file, as you are using only 1/4 of its size.

There are a couple ways to fix this:

  • In ifort specify recl in 4-byte blocks, e.g. 320*385*2 instead of *8
  • In ifort, use the compile flag -assume byterecl to have recl values interpreted as bytes.
  • In gfortran compensate for the size and use recl=320*385*32 so that your reads are correctly positioned.

A better way, however, is to engineer agnosticism in the recl unit size. You can use inquire to figure out the recl of an array. For example:

real(kind=wp), allocatable, dimension(:,:) :: recltest
integer :: reclen
allocate(recltest(320,385))
inquire(iolength=reclen) recltest
deallocate(recltest)
...
open (53, file=filename, form='unformatted', status='unknown',
& access='direct',action='write',recl=reclen)
...
OPEN(53, FILE=fname, form="unformatted", status="unknown", &
access="direct", action="read", recl=reclen)

This will set reclen to the value needed to store a 320x385 array based on the that compilers base unit for record length. If you use this when both writing and reading your code will work with both compilers without having to use compile-time flags in ifort or compensate with hardcoded recl differences between compilers.


An illustrative example

Testcase 1

program test
  use iso_fortran_env
  implicit none

  integer(kind=int64), dimension(5) :: array
  integer :: io_output, reclen, i
  reclen = 5*8 ! 5 elements of 8 byte integers.

  open(newunit=io_output, file='output', form='unformatted', status='new', &
       access='direct', action='write', recl=reclen)

  array = [(i,i=1,5)]  
  write (io_output, rec=1) array
  array = [(i,i=101,105)]
  write (io_output, rec=2) array
  array = [(i,i=1001,1005)]
  write (io_output, rec=3) array

  close(io_output)
end program test

This program writes an array of 5 8-byte integers 3 times to the file in records 1,2 and 3. The array is 5*8 bytes and I have hardcoded that number as the recl value.

Testcase 1 with gfortran 5.2

I compiled this testcase with the command line:

gfortran -o write-gfortran write.f90

This produces the output file (interpreted with od -A d -t d8):

0000000                    1                    2
0000016                    3                    4
0000032                    5                  101
0000048                  102                  103
0000064                  104                  105
0000080                 1001                 1002
0000096                 1003                 1004
0000112                 1005
0000120

The arrays of 5 8-bye elements are packed contiguously into the file and record number 2 (101 ... 105) starts where we would expect it to at offset 40, which is the recl value in the file 5*8.

Testcase 1 with ifort 16

This is compiled similarly:

ifort -o write-ifort write.f90

And this, for the exact same code, produces the output file (interpreted with od -A d -t d8):

0000000                    1                    2
0000016                    3                    4
0000032                    5                    0
0000048                    0                    0
*
0000160                  101                  102
0000176                  103                  104
0000192                  105                    0
0000208                    0                    0
*
0000320                 1001                 1002
0000336                 1003                 1004
0000352                 1005                    0
0000368                    0                    0
*
0000480

The data is all there but the file is full of 0 valued elements. The lines starting with * indicate every line between the offsets is 0. Record number 2 starts at offset 160 instead of 40. Notice that 160 is 40*4, where 40 is our specified recl of 5*8. By default ifort uses 4-byte blocks, so a recl of 40 means a physical record size of 160 bytes.

If code compiled with gfortran were to read this, records 2,3 and 4 would contain all 0 elements and a read of record 5 would correctly read the array written as record 2 by ifort. An alternative to have gfortran read record 2 where it lies in the file would be to use recl=160 (4*5*4) so that the physical record size matches what was written by ifort.

Another consequence of this is wasted space. Over-specifying the recl means you are using 4 times the necessary disk space to store your records.

Testcase 1 with ifort 16 and -assume byterecl

This was compiled as:

ifort -assume byterecl -o write-ifort write.f90

And produces the output file:

0000000                    1                    2
0000016                    3                    4
0000032                    5                  101
0000048                  102                  103
0000064                  104                  105
0000080                 1001                 1002
0000096                 1003                 1004
0000112                 1005
0000120

This produces the file as expected. The command line argument -assume byterecl tells ifort to interpret any recl values as bytes rather than double words (4-byte blocks). This will produce writes and reads that match code compiled with gfortran.

Testcase 2

program test
  use iso_fortran_env
  implicit none

  integer(kind=int64), dimension(5) :: array
  integer :: io_output, reclen, i
  inquire(iolength=reclen) array
  print *,'Using recl=',reclen

  open(newunit=io_output, file='output', form='unformatted', status='new', &
       access='direct', action='write', recl=reclen)

  array = [(i,i=1,5)]  
  write (io_output, rec=1) array
  array = [(i,i=101,105)]
  write (io_output, rec=2) array
  array = [(i,i=1001,1005)]
  write (io_output, rec=3) array

  close(io_output)
end program test

The only difference in this testcase is that I am inquiring the proper recl to represent my 40-byte array (5 8-byte integers).

The output

gfortran 5.2:

 Using recl=          40

ifort 16, no options:

 Using recl=          10

ifort 16, -assume byterecl:

 Using recl=          40

We see that for the 1-byte blocks used by gfortran and ifort with the byterecl assumption that recl is 40, which equals our 40 byte array. We also see that by default, ifort uses a recl of 10, which means 10 4-byte blocks or 10 double words, both of which mean 40 bytes. All three of these testcases produce identical file output and read/writes from either compiler will function properly.

Summary

To have record-based, unformatted, direct data be portable between ifort and gfortran the easiest option is to just add -assume byterecl to the flags used by ifort. You really should have been doing this already since you are specifying record lengths in bytes, so this would be a straightforward change that probably has no consequences for you.

The other alternative is to not worry about the option and use the inquire intrinsic to query the iolength for your array.