I need to delete string from position X to position Y on each line in a text file

Abhishek.Sunshine picture Abhishek.Sunshine · Aug 13, 2014 · Viewed 9k times · Source

I have a huge flat file 100K records each spanning 3000 columns. I need to removed a segment of the data fay starting position 300 to position 500 before archiving. This is sensitive part of data that needs to be wiped before I can archive. I am looking for a awk or sed or any similar command that can do the trick for me.

Sample file

003133780 MORNING GLORY DR                                        SOUTHAMPTON         PA18966780 MORNING GLORY DR    
0054381303 MADISON ST                                             RADFORD             VA241411303 MADISON ST         
00586728 CONESTOGA COURT                                          CHADDS FORD         PA1931728 CONESTOGA COURT      
1852921800 SAMER RD                                               MILAN               MI481601800 SAMER RD           
192717175 EVERGREEN CIRCLE                                        HENDERSONVILLE      TN37075175 EVERGREEN CIRCLE    
213673217 EAST BRANCH                                             LONGVIEW            TX75604217 EAST BRANCH         
2490423205 NOTTAGE LANE                                           FALLS CHURCH        VA220423205 NOTTAGE LANE       
249357344 BALOGH PLACE                                            LONGWOOD            FL32750344 BALOGH PLACE        
2502811224 WILFORD HOLLOW ROAD                                    VINTON              VA241791224 WILFORD HOLLOW ROAD
277634210 AMANDA CT                                               WHITEHOUSE          TX7579119726 COPPER OAKS DRIVE 
282482507 B ST.                                                   CHESAPEAKE          VA23324507 B ST.               

Expected output

003133780 MORNING GLORY DR                                        SOUTHAMPTON         PA780 MORNING GLORY DR    
0054381303 MADISON ST                                             RADFORD             VA1303 MADISON ST         
00586728 CONESTOGA COURT                                          CHADDS FORD         PA28 CONESTOGA COURT      
1852921800 SAMER RD                                               MILAN               MI1800 SAMER RD           
192717175 EVERGREEN CIRCLE                                        HENDERSONVILLE      TN175 EVERGREEN CIRCLE    
213673217 EAST BRANCH                                             LONGVIEW            TX217 EAST BRANCH         
2490423205 NOTTAGE LANE                                           FALLS CHURCH        VA3205 NOTTAGE LANE       
249357344 BALOGH PLACE                                            LONGWOOD            FL344 BALOGH PLACE        
2502811224 WILFORD HOLLOW ROAD                                    VINTON              VA1224 WILFORD HOLLOW ROAD
277634210 AMANDA CT                                               WHITEHOUSE          TX19726 COPPER OAKS DRIVE 
282482507 B ST.                                                   CHESAPEAKE          VA507 B ST.               

Here I removed the char between position 89 and 95. One small change, I also need to write the changed content to the same file.

Below is the script I have so far. I am looping through all files, dividing them into files of max rows 20000 and then removing the characters from position X and Y before archiving.

for currentfilename in ls -1 *.[tT][xX][tT] do echo $currentfilename tempfilename=${currentfilename%%.*} awk -v A="$tempfilename" '{filename = A "Part" int((NR-1)/20000) ".txt"; print >> filename}' $currentfilename awk '{print substr($0,1,522) substr($0,953) >> filename}' $currentfilename mv $currentfilename $APP_ROOT/Archive done

Answer

merlin2011 picture merlin2011 · Aug 13, 2014

Assuming that position means column, you can use cut to select the columns you want.

cut -f 1-299,501-3000 CutMe.txt

If your data is delimited by commas instead of tabs, then use -d.

cut -d, -f 1-299,501-3000 CutMe.txt

If position means character, you can do the same with cut -c.

cut -c 1-299,501-3000 CutMe.txt