I am new to Go and I am trying to write a simple script that reads a file line by line. I also want to save the progress (i.e. the last line number that was read) on the filesystem somewhere so that if the same file was given as the input to the script again, it starts reading the file from the line where it left off. Following is what I have started off with.
package main
// Package Imports
import (
"bufio"
"flag"
"fmt"
"log"
"os"
)
// Variable Declaration
var (
ConfigFile = flag.String("configfile", "../config.json", "Path to json configuration file.")
)
// The main function that reads the file and parses the log entries
func main() {
flag.Parse()
settings := NewConfig(*ConfigFile)
inputFile, err := os.Open(settings.Source)
if err != nil {
log.Fatal(err)
}
defer inputFile.Close()
scanner := bufio.NewScanner(inputFile)
for scanner.Scan() {
fmt.Println(scanner.Text())
}
if err := scanner.Err(); err != nil {
log.Fatal(err)
}
}
// Saves the current progress
func SaveProgress() {
}
// Get the line count from the progress to make sure
func GetCounter() {
}
I could not find any methods that deals with line numbers in the scanner package. I know I can declare an integer say counter := 0
and increment it each time a line is read like counter++
. But the next time how do I tell the scanner to start from a specific line? So for example if I read till line 30
the next time I run the script with the same input file, how can I make scanner to start reading from line 31
?
One solution I can think of here is to use the counter as I stated above and use an if condition like the following.
scanner := bufio.NewScanner(inputFile)
for scanner.Scan() {
if counter > progress {
fmt.Println(scanner.Text())
}
}
I am pretty sure something like this would work, but it is still going to loop over the lines that we have already read. Please suggest a better way.
If you don't want to read but just skip the lines you read previously, you need to acquire the position where you left off.
The different solutions are presented in a form of a function which takes the input to read from and the start position (byte position) to start reading lines from, e.g.:
func solution(input io.ReadSeeker, start int64) error
A special io.Reader
input is used which also implements io.Seeker
, the common interface which allows skipping data without having to read them. *os.File
implements this, so you are allowed to pass a *File
to these functions. Good. The "merged" interface of both io.Reader
and io.Seeker
is io.ReadSeeker
.
If you want a clean start (to start reading from the beginning of the file), simply pass start = 0
. If you want to resume a previous processing, pass the byte position where the last processing was stopped/aborted. This position is the value of the pos
local variable in the functions (solutions) below.
All the examples below with their testing code can be found on the Go Playground.
bufio.Scanner
bufio.Scanner
does not maintain the position, but we can very easily extend it to maintain the position (the read bytes), so when we want to restart next, we can seek to this position.
In order to do this with minimal effort, we can use a new split function which splits the input into tokens (lines). We can use Scanner.Split()
to set the splitter function (the logic to decide where are the boundaries of tokens/lines). The default split function is bufio.ScanLines()
.
Let's take a look at the split function declaration: bufio.SplitFunc
type SplitFunc func(data []byte, atEOF bool) (advance int, token []byte, err error)
It returns the number of bytes to advance: advance
. Exactly what we need to maintain the file position. So we can create a new split function using the builtin bufio.ScanLines()
, so we don't even have to implement its logic, just use the advance
return value to maintain position:
func withScanner(input io.ReadSeeker, start int64) error {
fmt.Println("--SCANNER, start:", start)
if _, err := input.Seek(start, 0); err != nil {
return err
}
scanner := bufio.NewScanner(input)
pos := start
scanLines := func(data []byte, atEOF bool) (advance int, token []byte, err error) {
advance, token, err = bufio.ScanLines(data, atEOF)
pos += int64(advance)
return
}
scanner.Split(scanLines)
for scanner.Scan() {
fmt.Printf("Pos: %d, Scanned: %s\n", pos, scanner.Text())
}
return scanner.Err()
}
bufio.Reader
In this solution we use the bufio.Reader
type instead of the Scanner
. bufio.Reader
already has a ReadBytes()
method which is very similar to the "read a line" functionality if we pass the '\n'
byte as the delimeter.
This solution is similar to JimB's, with the addition of handling all valid line terminator sequences and also stripping them off from the read line (it is very rare they are needed); in regular expression notation, it is \r?\n
.
func withReader(input io.ReadSeeker, start int64) error {
fmt.Println("--READER, start:", start)
if _, err := input.Seek(start, 0); err != nil {
return err
}
r := bufio.NewReader(input)
pos := start
for {
data, err := r.ReadBytes('\n')
pos += int64(len(data))
if err == nil || err == io.EOF {
if len(data) > 0 && data[len(data)-1] == '\n' {
data = data[:len(data)-1]
}
if len(data) > 0 && data[len(data)-1] == '\r' {
data = data[:len(data)-1]
}
fmt.Printf("Pos: %d, Read: %s\n", pos, data)
}
if err != nil {
if err != io.EOF {
return err
}
break
}
}
return nil
}
Note: If the content ends with an empty line (line terminator), this solution will process an empty line. If you don't want this, you can simply check it like this:
if len(data) != 0 {
fmt.Printf("Pos: %d, Read: %s\n", pos, data)
} else {
// Last line is empty, omit it
}
Testing code will simply use the content "first\r\nsecond\nthird\nfourth"
which contains multiple lines with varying line terminating. We will use strings.NewReader()
to obtain an io.ReadSeeker
whose source is a string
.
Test code first calls withScanner()
and withReader()
passing 0
start position: a clean start. In the next round we will pass a start position of start = 14
which is the position of the 3. line, so we won't see the first 2 lines processed (printed): resume simulation.
func main() {
const content = "first\r\nsecond\nthird\nfourth"
if err := withScanner(strings.NewReader(content), 0); err != nil {
fmt.Println("Scanner error:", err)
}
if err := withReader(strings.NewReader(content), 0); err != nil {
fmt.Println("Reader error:", err)
}
if err := withScanner(strings.NewReader(content), 14); err != nil {
fmt.Println("Scanner error:", err)
}
if err := withReader(strings.NewReader(content), 14); err != nil {
fmt.Println("Reader error:", err)
}
}
Output:
--SCANNER, start: 0
Pos: 7, Scanned: first
Pos: 14, Scanned: second
Pos: 20, Scanned: third
Pos: 26, Scanned: fourth
--READER, start: 0
Pos: 7, Read: first
Pos: 14, Read: second
Pos: 20, Read: third
Pos: 26, Read: fourth
--SCANNER, start: 14
Pos: 20, Scanned: third
Pos: 26, Scanned: fourth
--READER, start: 14
Pos: 20, Read: third
Pos: 26, Read: fourth
Try the solutions and testing code on the Go Playground.