DataInputStream vs InputStreamReader, trying to conceptually understand the two

ross studtman picture ross studtman · Aug 26, 2014 · Viewed 8.3k times · Source

As I tentatively understand it at the moment:

DataInputStream is an InputStream subclass, hence it reads and writes bytes. If you are reading bytes and you know they are all going to be ints or some other primitive data type, then you can read those bytes directly into the primitive using DataInputStream.

  • Question: Would you would need to know the type (int, string, etc) of the content being read before it is read? And would the whole file need to consist of that one primitive type?

The question I am having is: Why not use an InputStreamReader wrapped around the InputStream's byte data? With this approach you are still reading the bytes, then converting them to integers that represent characters. Which integers represent which characters depends on the character set specified, e.g., "UTF-8".

  • Question: In what case would an InputStreamReader fail to work where a DataInputStream would work?

My guess answer: If speed is really important, and you can do it, then converting the InputStream's byte data directly to the primitive via DataInputStream would be the way to go? This avoids the Reader having to "cast" the byte data to an int first; and it wouldn't rely on providing a character set to interpret which character is being represented by the returned integer. I suppose this is what people mean by DataInputStream allows for a machine-indepent read of the underlying data.

  • Simplification: DataInputStream can convert bytes directly to primitives.

Question that spurred the whole thing: I was reading the following tutorial code:

    FileInputStream fis = openFileInput("myFileText");

    BufferedReader reader = new BufferedReader( new InputStreamReader( new DataInputStream(fis)));

    EditText editText = (EditText)findViewById(R.id.edit_text);

    String line;

    while(  (line = reader.readline()) != null){

        editText.append(line);
        editText.append("\n");
    }

...I do not understand why the instructor chose to use new DataInputStream(fis) because it doesn't look like any of the ability to directly convert from bytes to primitives is being leveraged?

  • Am I missing something?

Thanks for your insights.

Answer

J4v4 picture J4v4 · Aug 26, 2014

InputStreamReader and DataInputStream are completely different.

DataInputStream is an InputStream subclass, hence it reads and writes bytes.

This is incorrect, an InputStream only reads bytes and the DataInputStream extends it so you can read Java primitives as well. Neither of them is able to write any data.

Question: would you would need to know the type (int, string, etc) of the content being read before it is read? And would the whole file need to consist of that one primitive type?

A DataInputStream should only be used to read data that was previously written by a DataOutputStream. If that's not the case, your DataInputStream is not likely to "understand" the data you are reading and will return random data. Therefore, you should know exactly what type of data was written by the corresponding DataOutputStream in which order.

For example, if you want to save your application's state (let's say it consists of a few numbers):

public void exit() {
    //...
    DataOutputStream dout = new DataOutputStream(new FileOutputStream(STATE_FILE));
    dout.write(somefloat);
    dout.write(someInt);
    dout.write(someDouble);
}

public void startup() {
    DataInputStream dout = new DataInputStream(new FileInputStream(STATE_FILE));
    //exactly the same order, otherwise it's going to return weird data
    dout.read(somefloat);
    dout.read(someInt);
    dout.read(someDouble);
}

That's basically the whole story of DataInputStream and DataOutputStream: write your primitive variables to a stream and read them.

Now, the InputStreamReader is something entirely different. An InputStreamReader "translates" encoded text to Java characters. You can basically use any text stream (knowing its encoding) and read Java Characters from that source using an InputStreamReader.

With this approach you are still reading the bytes, then converting them to integers that represent characters. Which integers represent which characters depends on the character set specified, e.g., "UTF-8".

A character encoding is more than a simple mapping between code points and characters. Further than that, it specifies how a code point is represented in memory. For example, UTF-8 and UTF-16 share the same character mapping, but an InputStreamReader would fail dramatically if you tried to read a UTF-8 stream as UTF-16. The string aabb, which represented by four bytes un UTF-8 ('a', 'a', 'b', 'b') would be converted to two characters. The values of the two a's and b's would be regarded as one character. I'm too lazy to look up which characters those would be, but they would be very weird.

An InputStreamReader handles all that stuff and is therefore able to read text from any source (unlike DataInputStream) if you know the encoding.

Question: In what case would an InputStreamReader fail to work where a DataInputStream would work?

This should be quite clear by now. Since both classes have completely different purposes, you shouldn't ask this question. An InputStreamReader does not convert bytes to integers like a DataInputStream and is not designed for that purpose.

In the tutorial code, I am quite sure that you could omit the DataInputStream:

BufferedReader reader = new BufferedReader( new InputStreamReader(fis));

However, DataInputStream provides the same methods as InputStream, which is why it's not wrong to wrap the FileInputStream inside it (although it's unnecessary).