golang - bufio read multiline until (CRLF) \r\n delimiter

Gravy picture Gravy · May 30, 2016 · Viewed 9.7k times · Source

I am trying to implement my own beanstalkd client as a way of learning go. https://github.com/kr/beanstalkd/blob/master/doc/protocol.txt

At the moment, I am using bufio to read in a line of data delimited by \n.

res, err := this.reader.ReadLine('\n')

This is fine for when I send a single command, and read a a single line response like: INSERTED %d\r\n but I find difficulties when I try to reserve a job because the job body could be multiple lines and as such, I cannot use the \n delimiter.

Is there a way to read into the buffer until CRLF?

e.g. when I send the reserve command. My expected response is as follows:

RESERVED <id> <bytes>\r\n
<data>\r\n

But data could contain \n, so I need to read until the \r\n.

Alternatively - is there a way of reading a specific number of bytes as specified in <bytes> in example response above?

At the moment, I have (err handling removed):

func (this *Bean) receiveLine() (string, error) {
    res, err := this.reader.ReadString('\n')
    return res, err
}

func (this *Bean) receiveBody(numBytesToRead int) ([]byte, error) {
    res, err := this.reader.ReadString('\r\n') // What to do here to read to CRLF / up to number of expected bytes?

    return res, err
}

func (this *Bean) Reserve() (*Job, error) {

    this.send("reserve\r\n")
    res, err := this.receiveLine()

    var jobId uint64
    var bodylen int
    _, err = fmt.Sscanf(res, "RESERVED %d %d\r\n", &jobId, &bodylen)

    body, err := this.receiveBody(bodylen)

    job := new(Job)
    job.Id = jobId
    job.Body = body

    return job, nil
}

Answer

Darigaaz picture Darigaaz · May 30, 2016

res, err := this.reader.Read('\n')

Does not make any sense to me. Did you mean ReadBytes/ReadSlice/ReadString?

You need bufio.Scanner.

Define your bufio.SplitFunc (example is a copy of bufio.ScanLines with modifications to look for '\r\n'). Modify it to match your case.

// dropCR drops a terminal \r from the data.
func dropCR(data []byte) []byte {
    if len(data) > 0 && data[len(data)-1] == '\r' {
        return data[0 : len(data)-1]
    }
    return data
}


func ScanCRLF(data []byte, atEOF bool) (advance int, token []byte, err error) {
        if atEOF && len(data) == 0 {
            return 0, nil, nil
        }
        if i := bytes.Index(data, []byte{'\r','\n'}); i >= 0 {
            // We have a full newline-terminated line.
            return i + 2, dropCR(data[0:i]), nil
        }
        // If we're at EOF, we have a final, non-terminated line. Return it.
        if atEOF {
            return len(data), dropCR(data), nil
        }
        // Request more data.
        return 0, nil, nil
    }

Now, wrap your io.Reader with your custom scanner.

scanner := bufio.NewScanner(this.reader)
scanner.Split(ScanCRLF)
// Set the split function for the scanning operation.
scanner.Split(split)
// Validate the input
for scanner.Scan() {
        fmt.Printf("%s\n", scanner.Text())
}

if err := scanner.Err(); err != nil {
        fmt.Printf("Invalid input: %s", err)
}

Read bufio package's source code about Scanner.

Alternatively - is there a way of reading a specific number of bytes as specified in in example response above?

First you need to read "RESERVED \r\n" line some how.

And then you can use

nr_of_bytes : = read_number_of_butes_somehow(this.reader)
buf : = make([]byte, nr_of_bytes)
this.reader.Read(buf)

or LimitedReader.

But i dont like this approach.

Thanks for this - reader.Read('\n') was a typo - I corrected question. I have also attached example code of where I have got so far. As you can see, I can get the number of expected bytes of the body. Could you elaborate on why you don't like the idea of reading a specific number of bytes? This seems most logical?

I'd like to see Bean's definition, especially reader's part. Imagine, this counter is wrong somehow.

  1. Its short: you need to find following "\r\n" and discard everything up to that point? or not? why do you need counter in the first place then?

  2. Its bigger then it should be (or even worse its huge!).

    2.1 No next message in the reader: fine, read is shorter then expected but its fine.

    2.2 There is next message waiting: bah, you read part of it and there is no easy way to recover.

    2.3 Its huge: you cant allocate memory even if message is only 1 byte.

This byte counters in general are designed to verify the message. And looks like it is the case with beanstalkd protocol.

Use Scanner, parse message, check length with expected number ... profit

UPD

Be warned, default bufio.Scanner cant read more then 64k, set max length with scanner.Buffer first. And thats bad, because you cant change this option on the fly and some data may have had been "pre"-read by scanner.

UPD2

Thinking about my last update. Take a look at net.textproto how it implements dotReader like simple state machine. You could do something similar with reading command first and "expected bytes" checking on payload.