"Name or service not known (SocketError)" error when runs in many threads

Tagussan picture Tagussan · Mar 26, 2013 · Viewed 7.3k times · Source

I have made a program that parses text file and download data in parallel. When runs download method in 9 or less threads, the program doesn't have error. But when runs the method in 10 or more threads, the program throws "`initialize': getaddrinfo: Name or service not known (SocketError)" error. I tried some algorithms to run in parallel, but the same problem occurs. I put the url, which was passed to 'open' method(open-uri) when "Name or service not known" error happens, into browser and confirmed that this url is valid and received correct data.Here's partial code.

jobs = []
aps = []
....
#jobs are pushed into jobs[]
....
max_thread = 15
loop do
  ary_threads = []
  max_thread.times do |i|
    break if jobs.size == 0
    job =  jobs.pop
    ary_threads << Thread.start {
      begin
        request(job[0],job[1]).each do |ap| #in "request" method, open(url)are called
            aps.push(ap)
        end
      end
    }
 end
 ary_threads.each { |th| th.join }
 break if jobs.size == 0
end

and error is

/usr/lib/ruby/1.9.1/net/http.rb:762:in `initialize': getaddrinfo: Name or service not known (SocketError)
from /usr/lib/ruby/1.9.1/net/http.rb:762:in `open'
from /usr/lib/ruby/1.9.1/net/http.rb:762:in `block in connect'
from /usr/lib/ruby/1.9.1/timeout.rb:54:in `timeout'
from /usr/lib/ruby/1.9.1/timeout.rb:99:in `timeout'
from /usr/lib/ruby/1.9.1/net/http.rb:762:in `connect'
from /usr/lib/ruby/1.9.1/net/http.rb:755:in `do_start'
from /usr/lib/ruby/1.9.1/net/http.rb:744:in `start'
from /usr/lib/ruby/1.9.1/open-uri.rb:306:in `open_http'
from /usr/lib/ruby/1.9.1/open-uri.rb:775:in `buffer_open'
from /usr/lib/ruby/1.9.1/open-uri.rb:203:in `block in open_loop'
from /usr/lib/ruby/1.9.1/open-uri.rb:201:in `catch'
from /usr/lib/ruby/1.9.1/open-uri.rb:201:in `open_loop'
from /usr/lib/ruby/1.9.1/open-uri.rb:146:in `open_uri'
from /var/lib/gems/1.9.1/gems/open-uri-cached-0.0.5/lib/open-uri/cached.rb:10:in `open_uri'
from /usr/lib/ruby/1.9.1/open-uri.rb:677:in `open'
from /usr/lib/ruby/1.9.1/open-uri.rb:33:in `open'
from Test1.rb:42:in `request'
from Test1.rb:77:in `block (3 levels) in <main>'

Why does this happen? Have anyone encountered similar problem? Please help me!

3 hours after first question,I found temporary solution. If I sandwiched 'open' method in 'request' method with 'begin ~ rescue ~ retry ~ end', the error does not happen when the second time 'open' called.Here's the code.

begin
    response = open(url)
rescue Exception
    puts url
    puts "retrying"
    retry
end

After catching Exception and displaying url and "retrying", the url and "retrying" will never be displayed and the program working correctly:) But still can't I find what causes this problem.

Answer

Naveen Agarwal picture Naveen Agarwal · Jul 26, 2013

I think it might be because of race condition between threads. Try doing the operations atomically. Put the mutex lock.

    @mutex = Mutex.new

    @mutex.syncronize do
      ...

      ary_threads << Thread.start {
       begin
        request(job[0],job[1]).each do |ap| #in "request" method, open(url)are called
          aps.push(ap)
        end
        end
      }

      ...
    end