How to import a CSV file with delimiter as ";" and decimal separator as "," into SAS?

Joz picture Joz · Feb 5, 2015 · Viewed 21.7k times · Source

I`ve got (and will receive in the future) many CSV files that use the semicolon as delimiter and the comma as decimal separator. So far I could not find out how to import these files into SAS using proc import -- or in any other automated fashion without the need for messing around with the variable names manually.

Create some sample data:

%let filename = %sysfunc(pathname(work))\sap.csv;

data _null_;
  file "&filename";
  put 'a;b';
  put '12345,11;67890,66';
run;

The import code:

proc import out = sap01 
datafile= "&filename"
dbms = dlm; 
delimiter = ";";
GETNAMES = YES; 
run;

After the import a value for the variable "AMOUNT" such as 350,58 (which corresponds to 350.58 in the US format) would look like 35,058 (meaning thirtyfivethousand...) in SAS (and after re-export to the German EXCEL it would look like 35.058,00). A simple but dirty workaround would be the following:

data sap02; set sap01;
AMOUNT = AMOUNT/100;
format AMOUNT best15.2;
run;

I wonder if there is a simple way to define the decimal separator for the CVS-import (similar to the specification of the delimiter). ..or any other "cleaner" solution compared to my workaround. Many thanks in advance!

Answer

Joe picture Joe · Feb 5, 2015

You technically should use dbms=dlm not dbms=csv, though it does figure things out. CSV means "Comma separated values", while DLM means "delimited", which is correct here.

I don't think there's a direct way to make SAS read in with the comma via PROC IMPORT. You need to tell SAS to use the NUMXw.d informat when reading in the data, and I don't see a way to force that setting in SAS. (There's an option for output with a comma, NLDECSEPARATOR, but I don't think that works here.)

Your best bet is either to write data step code yourself, or to run the PROC IMPORT, go to the log, and copy/paste the read in code into your program; then for each of the read-in records add :NUMX10. or whatever the appropriate maximum width of the field is. It will end up looking something like this:

data want;
  infile "whatever.txt" dlm=';' lrecl=32767 missover;
  input
    firstnumvar :NUMX10.
    secondnumvar :NUMX10.
    thirdnumvar :NUMX10.
    fourthnumvar :NUMX10.
    charvar :$15.
    charvar2 :$15.
  ;
run;

It will also generate lots of informat and format code; you can alternately convert the informats to NUMX10. instead of BEST. instead of adding the informat to the read-in. You can also just remove the informats, unless you have date fields.

data want;
  infile "whatever.txt" dlm=';' lrecl=32767 missover;
  informat firstnumvar secondnumvar thirdnumvar fourthnumvar NUMX10.;
  informat charvar $15.;
  format  firstnumvar secondnumvar thirdnumvar fourthnumvar BEST12.;
  format charvar $15.;
  input
    firstnumvar
    secondnumvar
    thirdnumvar
    fourthnumvar
    charvar $
  ;
run;