Seekg(ios::beg) not returning to beginning of redirected input

user0123 picture user0123 · Jul 27, 2013 · Viewed 12.3k times · Source

I am making a huffman encoder and to do so i need to read over the input (which will ALWAYS be a redirected file) to record the frequencies, then create the codebook and then read over the input again so i can encode it.

My problem is that i am currently trying to test out how to make the file read over from cin twice.

I read online that cin.seekg(0) or cin.seekg(ios::beg) or cin.seekg(0, ios::beg) all should work perfectly fine so long as the file is redirected and not piped. But when i do that it seems to not do anything at all to the position of cin.

Here is the code that i am currently using:

#include<iostream>
#include"huffmanNode.h"

using namespace std;

    int main(){

    //create array that stores each character and it's frequency
    unsigned int frequencies[255];
    //initialize to zero
    for(int i=0; i<255; i++){
        frequencies[i] = 0;
    }

    //get input and increment the frequency of corresponding character
    char c;
    while(!cin.eof()){
        cin.get(c);
        frequencies[c]++;
    }

    //create initial leafe nodes for all characters that have appeared at least once
    for(int i=0; i<255; i++){

        if(frequencies[i] != 0){
            huffmanNode* tempNode = new huffmanNode(i, frequencies[i]);
        }
    }


    // test readout of the frequency list
    for(int i=0; i<255; i++){
        cout << "Character: " << (char)i << " Frequency: " << frequencies[i] << endl;;
    }

    //go back to beginning of input
    cin.seekg(ios::beg);

    //read over input again, incrementing frequencies. Should result in double the amount of frequencies
 **THIS IS WHERE IT LOOPS FOREVER**
    while(!cin.eof()){
        cin.get(c);
        frequencies[c]++;
    }

    //another test readout of the frequency list
    for(int i=0; i<255; i++){
        cout << "Character: " << (char)i << " Double Frequency: " << frequencies[i] << endl;
    }


    return 0;
}

Debugging shows that it gets stuck in the while loop on line 40, and it seems to constantly be getting a newline character. Why would it not exit this loop? I assume that cin.seekg() is not actually resetting the input.

Answer

James Kanze picture James Kanze · Jul 27, 2013

There are several problems with your code. The first is that you use the results of an input (cin.get( c )) without checking that the input has succeeded. This is always an error; in your case, it will probably only result in counting (and later outputting) the last character twice, but it can result in undefined behavior. You must check that the input stream is in a good state after each input, before using the value input. The usual way of doint this is:

while ( cin.get( c ) ) // ...

, putting the input directly in the loop condition.

The second is the statement:

cin.seekg( std::ios::beg );

I'm actually sort of surprised that this even compiled: there are two overloads of seekg:

std::istream::seekg( std::streampos );

and

std::istream::seekg( std::streamoff, std::ios_base::seekdir );

std::ios::beg has type std::ios_base::seekdir. It's possible for an impementation to define std::streampos and std::ios_base::seekdir in a way so that there is an implicit conversion from std::ios_base::seekdir to std::streampos, but in my opinion, it shouldn't, since the results will almost certainly not be what you want. To seek to the beginning of a file:

std::cin.seekg( 0, std::ios_base::beg );

A third problem: errors in the input stream are sticky. Once you've reached the end of file, that error will remain, and all other operations will be no-ops, until you have cleared the error: std::cin.clear();.

One final comment: the fact that you are using std::cin worries me. It will probably work (although there is no guarantee that you can seek on std::cin, even if the input is redirected from a file), but do be aware that there is no way you can output the results of a huffman encoding to std::cout. It will work under Unix, but probably no where else. Huffman encoding requires that the files be open in binary mode, which is never the case for std::cin and std::cout.