Parsing pipe delimited input in awk

scorpdaddy picture scorpdaddy · Aug 2, 2011 · Viewed 25.3k times · Source

Have seen many posts asking similar question. Can't get it working.

Input looks like:

<field one with spaces>|<field two with spaces>

Trying to parse with awk.

Have tried many variants from excellent posts:

FS = "^[\x00- ]*|[\x00- ]*[|][\x00- ]*|[\x00- ]*$";
FS = "^[\x00- ]*|[\x00- ]*\|[\x00- ]*|[\x00- ]*$";
FS = "^[\x00- ]*|[\x00- ]*\\|[\x00- ]*|[\x00- ]*$";

Still can't get the pipe delimiter to work.

Using CentOS.

Any help?

Answer

shellter picture shellter · Aug 2, 2011
 echo "field one has spaces | field two has spaces" \
 | awk '
   BEGIN {
      FS="|" 
 }
 {
   print $2
   print $1
   # or what ever you want
 }'

 #output

  field two has spaces
  field one has spaces

You can also reduce this to

awk -F'|' {
    print $2
    print $1
}'

Edit Also, not all awks can take a multi-character regex for the FS value.

Edit2 Somehow I missed this originally, but I see you are trying to include \x00 in the char classes pre and post of the | char. I assume you mean for \x00 == null char? I don't think you're going to be able to have awk parse a file with null chars embedded. You could prep-rocess your input like

 tr '\x00'   ' ' < file.txt > spacesForNulls.txt 

OR delete them altogether with

tr -d '\x00' < file.txt > deletedNulls.txt

and eliminate that part of your regex. But as above, some awk don't support regex for the FS value. And, I don't use the tr trick very much, you may find that it requires a slightly different notation for the null char, depending on your version of tr.

I hope this helps.