Compare two files and write it to "match" and "nomatch" files

deepu picture deepu · Apr 27, 2009 · Viewed 129k times · Source

I have two input files, each with length of 5200 bytes. A seven byte key is used to compare both files, if there is a match then it needs to be written to "match" file but while writing to match file I need a few fields from infile1 and all other fields from infile2.

If there is no match then write to no match file.

Is it possible to do it in sort? I know it can be easily done using COBOL program but just want to know in SORT/ICETOOL/Easytrieve Plus (EZTPA00).

Answer

Bill Woodger picture Bill Woodger · Oct 11, 2013

Since 12,200 people have looked at this question and not got an answer:

DFSORT and SyncSort are the predominant Mainframe sorting products. Their control cards have many similarities, and some differences.

JOINKEYS FILE=F1,FIELDS=(key1startpos,7,A)              
JOINKEYS FILE=F2,FIELDS=(key2startpos,7,A)              
JOIN UNPAIRED,F1,F2
REFORMAT FIELDS=(F1:1,5200,F2:1,5200)                         
SORT FIELDS=COPY    

A "JOINKEYS" is made of three Tasks. Sub-Task 1 is the first JOINKEYS. Sub-Task 2 is the second JOINKEYS. The Main Task follows and is where the joined data is processed. In the example above it is a simple COPY operation. The joined data will simply be written to SORTOUT.

The JOIN statement defines that as well as matched records, UNPAIRED F1 and F2 records are to be presented to the Main Task.

The REFORMAT statement defines the record which will be presented to the Main Task. A more efficient example, imagining that three fields are required from F2, is:

 REFORMAT FIELDS=(F1:1,5200,F2:1,10,30,1,5100,100)

Each of the fields on F2 is defined with a start position and a length.

The record which is then processed by the Main task is 5311 bytes long, and the fields from F2 can be referenced by 5201,10,5211,1,5212,100 with the F1 record being 1,5200.

A better way achieve the same thing is to reduce the size of F2 with JNF2CNTL.

//JNF2CNTL DD *
  INREC BUILD=(207,1,10,30,1,5100,100)

Some installations of SyncSort do not support JNF2CNTL, and even where supported (from Syncsort MFX for z/OS release 1.4.1.0 onwards), it is not documented by SyncSort. For users of 1.3.2 or 1.4.0 an update is available from SyncSort to provide JNFnCNTL support.

It should be noted that JOINKEYS by default SORTs the data, with option EQUALS. If the data for a JOINKEYS file is already in sequence, SORTED should be specified. For DFSORT NOSEQCHK can also be specified if sequence-checking is not required.

 JOINKEYS FILE=F1,FIELDS=(key1startpos,7,A),SORTED,NOSEQCHK

Although the request is strange, as the source file won't be able to be determined, all unmatched records are to go to a separate output file.

With DFSORT, there is a matching-marker, specified with ? in the REFORMAT:

 REFORMAT FIELDS=(F1:1,5200,F2:1,10,30,1,5100,100,?)

This increases the length of the REFORMAT record by one byte. The ? can be specified anywhere on the REFORMAT record, and need not be specified. The ? is resolved by DFSORT to: B, data sourced from Both files; 1, unmatched record from F1; 2, unmatched record from F2.

SyncSort does not have the match marker. The absence or presence of data on the REFORMAT record has to be determined by values. Pick a byte on both input records which cannot contain a particular value (for instance, within a number, decide on a non-numeric value). Then specify that value as the FILL character on the REFORMAT.

 REFORMAT FIELDS=(F1:1,5200,F2:1,10,30,1,5100,100),FILL=C'$'

If position 1 on F1 cannot naturally have "$" and position 20 on F2 cannot either, then those two positions can be used to establish the result of the match. The entire record can be tested if necessary, but sucks up more CPU time.

The apparent requirement is for all unmatched records, from either F1 or F2, to be written to one file. This will require a REFORMAT statement which includes both records in their entirety:

DFSORT, output unmatched records:

  REFORMAT FIELDS=(F1:1,5200,F2:1,5200,?)

  OUTFIL FNAMES=NOMATCH,INCLUDE=(10401,1,SS,EQ,C'1,2'),
        IFTHEN=(WHEN=(10401,1,CH,EQ,C'1'),
                    BUILD=(1,5200)),
        IFTHEN=(WHEN=NONE,
                    BUILD=(5201,5200))

SyncSort, output unmatched records:

  REFORMAT FIELDS=(F1:1,5200,F2:1,5200),FILL=C'$'

  OUTFIL FNAMES=NOMATCH,INCLUDE=(1,1,CH,EQ,C'$',
                          OR,5220,1,CH,EQ,C'$'),
        IFTHEN=(WHEN=(1,1,CH,EQ,C'$'),
                    BUILD=(1,5200)),
        IFTHEN=(WHEN=NONE,
                    BUILD=(5201,5200))

The coding for SyncSort will also work with DFSORT.

To get the matched records written is easy.

  OUTFIL FNAMES=MATCH,SAVE

SAVE ensures that all records not written by another OUTFIL will be written here.

There is some reformatting required, to mainly output data from F1, but to select some fields from F2. This will work for either DFSORT or SyncSort:

  OUTFIL FNAMES=MATCH,SAVE,
     BUILD=(1,50,10300,100,51,212,5201,10,263,8,5230,1,271,4929)

The whole thing, with arbitrary starts and lengths is:

DFSORT

  JOINKEYS FILE=F1,FIELDS=(1,7,A)              
  JOINKEYS FILE=F2,FIELDS=(20,7,A)    

  JOIN UNPAIRED,F1,F2

  REFORMAT FIELDS=(F1:1,5200,F2:1,5200,?)                         

  SORT FIELDS=COPY    

  OUTFIL FNAMES=NOMATCH,INCLUDE=(10401,1,SS,EQ,C'1,2'),
        IFTHEN=(WHEN=(10401,1,CH,EQ,C'1'),
                    BUILD=(1,5200)),
        IFTHEN=(WHEN=NONE,
                    BUILD=(5201,5200))

  OUTFIL FNAMES=MATCH,SAVE,
     BUILD=(1,50,10300,100,51,212,5201,10,263,8,5230,1,271,4929)

SyncSort

  JOINKEYS FILE=F1,FIELDS=(1,7,A)              
  JOINKEYS FILE=F2,FIELDS=(20,7,A)              

  JOIN UNPAIRED,F1,F2

  REFORMAT FIELDS=(F1:1,5200,F2:1,5200),FILL=C'$'                         

  SORT FIELDS=COPY    

  OUTFIL FNAMES=NOMATCH,INCLUDE=(1,1,CH,EQ,C'$',
                          OR,5220,1,CH,EQ,C'$'),
        IFTHEN=(WHEN=(1,1,CH,EQ,C'$'),
                    BUILD=(1,5200)),
        IFTHEN=(WHEN=NONE,
                    BUILD=(5201,5200))

  OUTFIL FNAMES=MATCH,SAVE,
     BUILD=(1,50,10300,100,51,212,5201,10,263,8,5230,1,271,4929)