Remove part of string in file using batch

user3552829 picture user3552829 · Feb 14, 2017 · Viewed 11.6k times · Source

I have some data in a text file (test.txt), reading:

wantedunwanteddata

I want to remove the "unwanted" part in that string and output the rest (i.e. "wanteddata" in another file (test2.txt). I'm using:

findstr /v "unwanted" test.txt>test2.txt

however that's returning an empty file.

Answer

J.Baoby picture J.Baoby · Feb 14, 2017

The reason why findstr /v "unwanted" test.txt>test2.txt won't work is because findstr searches for lines matching the conditions you gave it. findstr will not return substrings matching the conditions but the whole line where the conditions are met. In findstr /v "unwanted" test.txt>test2.txt you're asking for all lines in test.txt without "unwanted" in it. That's why test2.txt is empty: there are no such lines.

In batch, you can replace all occurences of a substring from a value of a variable with the following syntax: %var:substr=repl%. This will replace all occurences of substr with repl in the string %var% contains. As removing substring is similar to replacing with an empty string (at least in this context), you can use %var:substr=% to remove all occurences of a substr.

If you want to remove all occurences of a substring in a file, you can read each line of that file in a variable with for /f and print out that variable after removing the substring from it. Be aware that as we will have to create a variable inside a for /f-block and use it inside that same block, delayed expansion will be needed (this answer explains why).

@echo off
SetLocal EnableDelayedExpansion

set input=text1.txt
set output=text2.txt
set "substr=unwanted"

(
    FOR /F "usebackq delims=" %%G IN ("%input%") DO (
        set line=%%G
        echo. !line:%substr%=!
    )
) > "%output%"

EndLocal
exit /b 0

I've set (paths to) your inputfile text1.txt and your outputfile text2.txt in variables (respectively input and output) without surrounding quotes (quotes are added when variables are used). That will make it easier to change them if needed.
The extra (..) surrounding the for /f is just for handling the output redirect to the outputfile.
In case you don't want to use delayed expansion, you can omit the SetLocal EnableDelayedExpansion and the EndLocal and replace echo !line:%substr%=! with call echo %%line:%substr%=%% inside the for /f.

EDIT: If your input file contains special characters like <>()|&%, you must use delayed expansion. With the normal variable expansion used in call echo %%line:%substr%=%% those special characters will be processed with their special meanings by the cmd-interpreter (< and > for input or output redirection for example) and generate unexpected results.
Also I've surrounded the assignment of the substr variable but if the substring you're trying to replace contains special characters like <>()|&% each of them must also be escaped in order for %substr% to work as expected. You can escape a special character with a caret-sign ^, except for the % that must be doubled (%% instead of %).

EDIT2: for /f skips blank lines, so if one wants to keep those blank lines in the output file, some workarounds will be required. A common hack in pure batch to cope with that is to use findstr /n or find /n to prepend each line (including the empty ones) with their line number while feeding the inputfile to the for /f. This will of course require some extra processing to cope with the line numbers inside the for /f block and remove them from the output of the for /f but it is possible. This answer to a similar question provides an excellent explanation for those workarounds and their drawbacks.