modify existing contents of file in c

zee picture zee · Feb 22, 2014 · Viewed 43.4k times · Source
int main()
{
    FILE *ft;
    char ch;
    ft=fopen("abc.txt","r+");
    if(ft==NULL)
    {
        printf("can not open target file\n");
        exit(1);
    }
    while(1)
    {
        ch=fgetc(ft);
        if(ch==EOF)
        {
            printf("done");
            break;
        }
        if(ch=='i')
        {
            fputc('a',ft);
        }
    }
    fclose(ft);
    return 0;
}

As one can see that I want to edit abc.txt in such a way that i is replaced by a in it.
The program works fine but when I open abc.txt externally, it seemed to be unedited.
Any possible reason for that?

Why in this case the character after i is not replace by a, as the answers suggest?

Answer

Jonathan Leffler picture Jonathan Leffler · Feb 22, 2014

Analysis

There are multiple problems:

  1. fgetc() returns an int, not a char; it has to return every valid char value plus a separate value, EOF. As written, you can't reliably detect EOF. If char is an unsigned type, you'll never find EOF; if char is a signed type, you'll misidentify some valid character (often ÿ, y-umlaut, U+00FF, LATIN SMALL LETTER Y WITH DIAERESIS) as EOF.

  2. If you switch between input and output on a file opened for update mode, you must use a file positioning operation (fseek(), rewind(), nominally fsetpos()) between reading and writing; and you must use a positioning operation or fflush() between writing and reading.

  3. It is a good idea to close what you open (now fixed in the code).

  4. If your writes worked, you'd overwrite the character after the i with a.

Synthesis

These changes lead to:

#include <stdio.h>
#include <stdlib.h>

int main(void)
{
    FILE *ft;
    char const *name = "abc.txt";
    int ch;
    ft = fopen(name, "r+");
    if (ft == NULL)
    {
        fprintf(stderr, "cannot open target file %s\n", name);
        exit(1);
    }
    while ((ch = fgetc(ft)) != EOF)
    {
        if (ch == 'i')
        {
            fseek(ft, -1, SEEK_CUR);
            fputc('a',ft);
            fseek(ft, 0, SEEK_CUR);
        }
    }
    fclose(ft);
    return 0;
}

There is room for more error checking.

Exegesis

Input followed by output requires seeks

The fseek(ft, 0, SEEK_CUR); statement is required by the C standard.

ISO/IEC 9899:2011 §7.21.5.3 The fopen function

¶7 When a file is opened with update mode ('+' as the second or third character in the above list of mode argument values), both input and output may be performed on the associated stream. However, output shall not be directly followed by input without an intervening call to the fflush function or to a file positioning function (fseek, fsetpos, or rewind), and input shall not be directly followed by output without an intervening call to a file positioning function, unless the input operation encounters end-of- file. Opening (or creating) a text file with update mode may instead open (or create) a binary stream in some implementations.

(Emphasis added.)

fgetc() returns an int

Quotes from ISO/IEC 9899:2011, the current C standard.

§7.21 Input/output <stdio.h>

§7.21.1 Introduction

EOF which expands to an integer constant expression, with type int and a negative value, that is returned by several functions to indicate end-of-file, that is, no more input from a stream;

§7.21.7.1 The fgetc function

int fgetc(FILE *stream);

¶2 If the end-of-file indicator for the input stream pointed to by stream is not set and a next character is present, the fgetc function obtains that character as an unsigned char converted to an int and advances the associated file position indicator for the stream (if defined).

Returns

¶3 If the end-of-file indicator for the stream is set, or if the stream is at end-of-file, the end-of-file indicator for the stream is set and the fgetc function returns EOF. Otherwise, the fgetc function returns the next character from the input stream pointed to by stream. If a read error occurs, the error indicator for the stream is set and the fgetc function returns EOF.289)

289) An end-of-file and a read error can be distinguished by use of the feof and ferror functions.

So, EOF is a negative integer (conventionally it is -1, but the standard does not require that). The fgetc() function either returns EOF or the value of the character as an unsigned char (in the range 0..UCHAR_MAX, usually 0..255).

§6.2.5 Types

¶3 An object declared as type char is large enough to store any member of the basic execution character set. If a member of the basic execution character set is stored in a char object, its value is guaranteed to be nonnegative. If any other character is stored in a char object, the resulting value is implementation-defined but shall be within the range of values that can be represented in that type.

¶5 An object declared as type signed char occupies the same amount of storage as a ‘‘plain’’ char object.

§6 For each of the signed integer types, there is a corresponding (but different) unsigned integer type (designated with the keyword unsigned) that uses the same amount of storage (including sign information) and has the same alignment requirements.

§15 The three types char, signed char, and unsigned char are collectively called the character types. The implementation shall define char to have the same range, representation, and behavior as either signed char or unsigned char.45)

45) CHAR_MIN, defined in <limits.h>, will have one of the values 0 or SCHAR_MIN, and this can be used to distinguish the two options. Irrespective of the choice made, char is a separate type from the other two and is not compatible with either.

This justifies my assertion that plain char can be a signed or an unsigned type.

Now consider:

char c = fgetc(fp);
if (c == EOF)
   …

Suppose fgetc() returns EOF, and plain char is an unsigned (8-bit) type, and EOF is -1. The assignment puts the value 0xFF into c, which is a positive integer. When the comparison is made, c is promoted to an int (and hence to the value 255), and 255 is not negative, so the comparison fails.

Conversely, suppose that plain char is a signed (8-bit) type and the character set is ISO 8859-15. If fgetc() returns ÿ, the value assigned will be the bit pattern 0b11111111, which is the same as -1, so in the comparison, c will be converted to -1 and the comparison c == EOF will return true even though a valid character was read.

You can tweak the details, but the basic argument remains valid while sizeof(char) < sizeof(int). There are DSP chips where that doesn't apply; you have to rethink the rules. Even so, the basic point remains; fgetc() returns an int, not a char.

If your data is truly ASCII (7-bit data), then all characters are in the range 0..127 and you won't run into the misinterpretation of ÿ problem. However, if your char type is unsigned, you still have the 'cannot detect EOF' problem, so your program will run for a long time. If you need to consider portability, you will take this into account. These are the professional grade issues that you need to handle as a C programmer. You can kludge your way to programs that work on your system for your data relatively easily and without taking all these nuances into account. But your program won't work on other people's systems.