Split string by length in Golang

Fernando Parra picture Fernando Parra · Sep 5, 2014 · Viewed 20.7k times · Source

Does anyone know how to split a string in Golang by length?

For example to split "helloworld" after every 3 characters, so it should ideally return an array of "hel" "low" "orl" "d"?

Alternatively a possible solution would be to also append a newline after every 3 characters..

All ideas are greatly appreciated!

Answer

VonC picture VonC · Sep 5, 2014

Make sure to convert your string into a slice of rune: see "Slice string into letters".

for automatically converts string to rune so there is no additional code needed in this case to convert the string to rune first.

for i, r := range s {
    fmt.Printf("i%d r %c\n", i, r)
    // every 3 i, do something
}

r[n:n+3] will work best with a being a slice of rune.

The index will increase by one every rune, while it might increase by more than one for every byte in a slice of string: "世界": i would be 0 and 3: a character (rune) can be formed of multiple bytes.


For instance, consider s := "世a界世bcd界efg世": 12 runes. (see play.golang.org)

If you try to parse it byte by byte, you will miss (in a naive split every 3 chars implementation) some of the "index modulo 3" (equals to 2, 5, 8 and 11), because the index will increase past those values:

for i, r := range s {
    res = res + string(r)
    fmt.Printf("i %d r %c\n", i, r)
    if i > 0 && (i+1)%3 == 0 {
        fmt.Printf("=>(%d) '%v'\n", i, res)
        res = ""
    }
}

The output:

i  0 r 世
i  3 r a   <== miss i==2
i  4 r 界
i  7 r 世  <== miss i==5
i 10 r b  <== miss i==8
i 11 r c  ===============> would print '世a界世bc', not exactly '3 chars'!
i 12 r d
i 13 r 界
i 16 r e  <== miss i==14
i 17 r f  ===============> would print 'd界ef'
i 18 r g
i 19 r 世 <== miss the rest of the string

But if you were to iterate on runes (a := []rune(s)), you would get what you expect, as the index would increase one rune at a time, making it easy to aggregate exactly 3 characters:

for i, r := range a {
    res = res + string(r)
    fmt.Printf("i%d r %c\n", i, r)
    if i > 0 && (i+1)%3 == 0 {
        fmt.Printf("=>(%d) '%v'\n", i, res)
        res = ""
    }
}

Output:

i 0 r 世
i 1 r a
i 2 r 界 ===============> would print '世a界'
i 3 r 世
i 4 r b
i 5 r c ===============> would print '世bc'
i 6 r d
i 7 r 界
i 8 r e ===============> would print 'd界e'
i 9 r f
i10 r g
i11 r 世 ===============> would print 'fg世'