How can I ignore the "end of line" or "new line" character when reading text files word by word?

radiomime picture radiomime · Feb 1, 2015 · Viewed 11.5k times · Source

Objective:

I am reading a text file word by word, and am saving each word as an element in an array. I am then printing out this array, word by word. I know this could be done more efficiently, but this is for an assignment and I have to use an array.

I'm doing more with the array, such as counting repeated elements, removing certain elements, etc. I also have successfully converted the files to be entirely lowercase and without punctuation.

Current Situation:

I have a text file that looks like this:

beginning of file




more lines with some bizzare     spacing
some lines next to each other
while

others are farther apart
eof

Here is some of my code with itemsInArray initialized at 0 and an array of words refered to as wordArray[ (approriate length for my file ) ]:


ifstream infile;
infile.open(fileExample);

while (!infile.eof()) {

    string temp;
    getline(infile,temp,' ');  // Successfully reads words seperated by a single space
    
    
    if ((temp != "") && (temp != '\n') && (temp != " ") && (temp != "\n") && (temp != "\0") {
            wordArray[itemsInArray] = temp;
            itemsInArray++;
    }

The Problem:

My code is saving the end of line character as an item in my array. In my if statement, I've listed all of the ways I have tried to disclude the end of line character, but I've had no luck.

How can I prevent the end of line character from saving as an item in my array?

I've tried a few other methods I have found on threads similar to this, including something with a *const char that I couldn't make work, as well as iterating through and deleting the new line characters. I've been working on this for hours, I don't want to repost the same issue, and have tried many many methods.

Answer

5gon12eder picture 5gon12eder · Feb 1, 2015

The standard >> operator overloaded for std::string already uses white-space as word boundary so your program can be simplified a lot.

#include <iostream>
#include <string>
#include <vector>

int
main()
{
  std::vector<std::string> words {};
  {
    std::string tmp {};
    while (std::cin >> tmp)
      words.push_back(tmp);
  }
  for (const auto& word : words)
    std::cout << "'" << word << "'" << std::endl;
}

For the input you are showing, this will output:

'beginning'
'of'
'file'
'more'
'lines'
'with'
'some'
'bizzare'
'spacing'
'some'
'lines'
'next'
'to'
'each'
'other'
'while'
'others'
'are'
'farther'
'apart'
'eof'

Isn't this what you want?