r - read.csv - skip rows with different number of columns

datavoredan picture datavoredan · Apr 9, 2014 · Viewed 13.1k times · Source

There are 5 rows at the top of my csv file which serve as information about the file, which I do not need.

These information rows have only 2 columns, while the headers, and rows of data (from 6 on-wards) have 8. This appears to be the cause of the issue.

I have tried using the skip function within read.csv to skip these lines, and the same with read.table

df = read.csv("myfile.csv", skip=5)
df = read.table("myfile.csv", skip=5)

but this still gives me the same error message, which is:

Error in read.table("myfile.csv",  :empty beginning of file

In addition: Warning messages:

1: In readLines(file, skip) : line 1 appears to contain an embedded nul
2: In readLines(file, skip) : line 2 appears to contain an embedded nul
...
5: In readLines(file, skip) : line 5 appears to contain an embedded nul

How can I get this .csv to be read into r without the null values in the first 5 rows causing this issue?

Answer

jbaums picture jbaums · Apr 9, 2014

You could try:

read.csv(text=readLines('myfile.csv')[-(1:5)])

This will initially store each line in its own vector element, then drop the first five and treat the rest as a csv.