redigo: getting dial tcp: connect: cannot assign requested address

orcaman picture orcaman · Jun 15, 2016 · Viewed 10.8k times · Source

I have an application that makes about 400 reads per seconds and 100 writes per second to redis (hosted on redislabs). The application is using github.com/garyburd/redigo package as a redis proxy.

I have two functions which are the only ones being used to read and write:

func getCachedVPAIDConfig(key string) chan *cachedVPAIDConfig {
    c := make(chan *cachedVPAIDConfig)
    go func() {
        p := pool.Get()
        defer p.Close()

        switch p.Err() {
        case nil:
            item, err := redis.Bytes(p.Do("GET", key))
            if err != nil {
                c <- &cachedVPAIDConfig{nil, err}
                return
            }

            c <- &cachedVPAIDConfig{item, nil}
        default:
            c <- &cachedVPAIDConfig{nil, p.Err()}
            return
        }
    }()
    return c
}



func setCachedVPAIDConfig(key string, j []byte) chan error {
    c := make(chan error)
    go func() {
        p := pool.Get()
        defer p.Close()

        switch p.Err() {
        case nil:
            _, err := p.Do("SET", key, j)

            if err != nil {
                c <- err
                return
            }

            c <- nil
        default:
            c <- p.Err()
            return
        }
    }()
    return c
}

As you can see, I'm using the recommended connection pooling mechanism (http://godoc.org/github.com/garyburd/redigo/redis#Pool).

I'm calling these functions on every http request an endpoint on the application is getting. The problem is: once the application starts getting requests, it immediately starts throwing the error

dial tcp 54.160.xxx.xx:yyyy: connect: cannot assign requested address

(54.160.xxx.xx:yyyy is the redis host)

I see on redis that there are only about 600 connections when this starts to happen, which doesn't sound like a lot.

I tried playing with the MaxActive setting of the pool, setting it anywhere between 1000 and 50K, but the result is the same.

Any ideas?

EDIT

Here's my pool initialization code (doing this in func init):

pool = redis.Pool{
    MaxActive:   1000, // note: I tried changing this to 50K, result the same
    Dial: func() (redis.Conn, error) {
        c, err := redis.Dial("tcp", redisHost)
        if err != nil {
            return nil, err
        }
        if _, err := c.Do("AUTH", redisPassword); err != nil {
            c.Close()
            return nil, err
        }
        return c, err
    },
}

Edit 2: Issue solved by applying the stuff suggested in the answer below!

New code for pool init:

pool = redis.Pool{
    MaxActive:   500,
    MaxIdle:     500,
    IdleTimeout: 5 * time.Second,
    Dial: func() (redis.Conn, error) {
        c, err := redis.DialTimeout("tcp", redisHost, 100*time.Millisecond, 100*time.Millisecond, 100*time.Millisecond)
        if err != nil {
            return nil, err
        }
        if _, err := c.Do("AUTH", redisPassword); err != nil {
            c.Close()
            return nil, err
        }
        return c, err
    },
}

This new init makes it so that the get and set timeouts are handled by redigo internally, so I no longer need to return a channel on the getCachedVPAIDConfig and setCachedVPAIDConfig funcs. This is how they look now:

func setCachedVPAIDConfig(key string, j []byte) error {
    p := pool.Get()
    switch p.Err() {
    case nil:
        _, err := p.Do("SET", key, j)
        p.Close()
        return err
    default:
        p.Close()
        return p.Err()
    }
}

func getCachedVPAIDConfig(key string) ([]byte, error) {
    p := pool.Get()
    switch p.Err() {
    case nil:
        item, err := redis.Bytes(p.Do("GET", key))
        p.Close()
        return item, err
    default:
        p.Close()
        return nil, p.Err()
    }
}

Answer

Not_a_Golfer picture Not_a_Golfer · Jun 15, 2016
  1. You're closing the connection after sending on the channels, if the channel is blocking you're not closing connections, which would result in the error you're seeing. so don't just defer, close the connection explicitly.

  2. I don't think it's the problem but a good idea regardless - set a timeout on your connections with DialTimeout.

  3. Make sure you have a proper TestOnBorrow function to get rid of dead connections, especially if you have timeout. I usually do a PING if the connection has been idle for more than 3 seconds (the function receives the idle time as a parameter)

  4. Try setting MaxIdle to a larger number as well, I remember having problems with pooling that were resolved by increasing that parameter in the pool.