Dealing with readLines() function in R

user3521631 picture user3521631 · Apr 11, 2014 · Viewed 49.5k times · Source

I'm experiencing a very hard time with R lately.

I'm not an expert user but I'm trying to use R to read a plain text (.txt) file and capture each line of it. After that, I want to deal with those lines and make some breaks and changes in the text.

Here is the code I'm using:

fileName <- "C:/MyFolder/TEXT_TO_BE_PROCESSED.txt"
con <- file(fileName,open="r")
line <- readLines(con)
close(con)

It reads the text and the line breaks perfectly. But I don't understand how the created object line works.

The object line created with this code has the class: character and the length [57]. If I type line[1] it shows exactly the text of the first line. But if I type

length(line[1])

it returns me [1].

I would like to know how can I transform this string of length == 1 that contains 518 in fact into a string of length == 518.

Does anyone know what I'm doing wrong?

I don't need to necessarily use the readLines() function. I've did some research and also found the function scan(), but I ended with the same situation of a immutable string of 518 characters but length == 1.

Hope I've been clear enough about my doubt. Sorry for the bad English.

Answer

JeremyS picture JeremyS · Apr 11, 2014

You can firstly condense that code into a single line, the other 3 lines just make objects that you don't need.

line <- readLines("C:/MyFolder/TEXT_TO_BE_PROCESSED.txt")

The if you want to know how many space separated words per line

words <- sapply(line,function(x) length(unlist(strsplit(x,split=" "))))

If you leave out the length argument in the above you get a list of character vectors of the words from each line.