I realize that reading a .csv file removes the leading zeros, but for some of my files, it maintains the leading zeros without my having to explicitly set colClasses in read.csv. On the other hand, what's confusing me is in other cases, it DOES remove the leading zeros. So my question is: in which cases does read.csv remove the leading zeros?
The read.csv
, read.table
, and related functions read everything in as character strings, then depending on arguments to the function (specifically colClasses
, but also others) and options the function will then try to "simplify" the columns. If enough of the column looks numeric and you have not told the function otherwise, then it will convert it to a numeric column, this will drop any leading 0's (and trailing 0's after the decimal). If there is something in the column that does not look like a number then it will not convert to numeric and either keep it as character or convert to a factor, this keeps the leading 0's. The function does not always look at the entire column to make the decision, so what may be obvious to you as not being numeric may still be converted.
The safest approach (and quickest) is to specify colClasses
so that R does not need to guess (and you do not need to guess what R is going to guess).