Why do we need fibers

fl00r picture fl00r · Jan 29, 2012 · Viewed 19.7k times · Source

For Fibers we have got classic example: generating of Fibonacci numbers

fib = Fiber.new do  
  x, y = 0, 1 
  loop do  
    Fiber.yield y 
    x,y = y,x+y 
  end 
end

Why do we need Fibers here? I can rewrite this with just the same Proc (closure, actually)

def clsr
  x, y = 0, 1
  Proc.new do
    x, y = y, x + y
    x
  end
end

So

10.times { puts fib.resume }

and

prc = clsr 
10.times { puts prc.call }

will return just the same result.

So what are the advantages of fibers. What kind of stuff I can write with Fibers I can't do with lambdas and other cool Ruby features?

Answer

Alex D picture Alex D · Feb 8, 2012

Fibers are something you will probably never use directly in application-level code. They are a flow-control primitive which you can use to build other abstractions, which you then use in higher-level code.

Probably the #1 use of fibers in Ruby is to implement Enumerators, which are a core Ruby class in Ruby 1.9. These are incredibly useful.

In Ruby 1.9, if you call almost any iterator method on the core classes, without passing a block, it will return an Enumerator.

irb(main):001:0> [1,2,3].reverse_each
=> #<Enumerator: [1, 2, 3]:reverse_each>
irb(main):002:0> "abc".chars
=> #<Enumerator: "abc":chars>
irb(main):003:0> 1.upto(10)
=> #<Enumerator: 1:upto(10)>

These Enumerators are Enumerable objects, and their each methods yield the elements which would have been yielded by the original iterator method, had it been called with a block. In the example I just gave, the Enumerator returned by reverse_each has a each method which yields 3,2,1. The Enumerator returned by chars yields "c","b","a" (and so on). BUT, unlike the original iterator method, the Enumerator can also return the elements one by one if you call next on it repeatedly:

irb(main):001:0> e = "abc".chars
=> #<Enumerator: "abc":chars>
irb(main):002:0> e.next
=> "a"
irb(main):003:0> e.next
=> "b"
irb(main):004:0> e.next
=> "c"

You may have heard of "internal iterators" and "external iterators" (a good description of both is given in the "Gang of Four" Design Patterns book). The above example shows that Enumerators can be used to turn an internal iterator into an external one.

This is one way to make your own enumerators:

class SomeClass
  def an_iterator
    # note the 'return enum_for...' pattern; it's very useful
    # enum_for is an Object method
    # so even for iterators which don't return an Enumerator when called
    #   with no block, you can easily get one by calling 'enum_for'
    return enum_for(:an_iterator) if not block_given?
    yield 1
    yield 2
    yield 3
  end
end

Let's try it:

e = SomeClass.new.an_iterator
e.next  # => 1
e.next  # => 2
e.next  # => 3

Wait a minute... does anything seem strange there? You wrote the yield statements in an_iterator as straight-line code, but the Enumerator can run them one at a time. In between calls to next, the execution of an_iterator is "frozen". Each time you call next, it continues running down to the following yield statement, and then "freezes" again.

Can you guess how this is implemented? The Enumerator wraps the call to an_iterator in a fiber, and passes a block which suspends the fiber. So every time an_iterator yields to the block, the fiber which it is running on is suspended, and execution continues on the main thread. Next time you call next, it passes control to the fiber, the block returns, and an_iterator continues where it left off.

It would be instructive to think of what would be required to do this without fibers. EVERY class which wanted to provide both internal and external iterators would have to contain explicit code to keep track of state between calls to next. Each call to next would have to check that state, and update it before returning a value. With fibers, we can automatically convert any internal iterator to an external one.

This doesn't have to do with fibers persay, but let me mention one more thing you can do with Enumerators: they allow you to apply higher-order Enumerable methods to other iterators other than each. Think about it: normally all the Enumerable methods, including map, select, include?, inject, and so on, all work on the elements yielded by each. But what if an object has other iterators other than each?

irb(main):001:0> "Hello".chars.select { |c| c =~ /[A-Z]/ }
=> ["H"]
irb(main):002:0> "Hello".bytes.sort
=> [72, 101, 108, 108, 111]

Calling the iterator with no block returns an Enumerator, and then you can call other Enumerable methods on that.

Getting back to fibers, have you used the take method from Enumerable?

class InfiniteSeries
  include Enumerable
  def each
    i = 0
    loop { yield(i += 1) }
  end
end

If anything calls that each method, it looks like it should never return, right? Check this out:

InfiniteSeries.new.take(10) # => [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

I don't know if this uses fibers under the hood, but it could. Fibers can be used to implement infinite lists and lazy evaluation of a series. For an example of some lazy methods defined with Enumerators, I have defined some here: https://github.com/alexdowad/showcase/blob/master/ruby-core/collections.rb

You can also build a general-purpose coroutine facility using fibers. I've never used coroutines in any of my programs yet, but it's a good concept to know.

I hope this gives you some idea of the possibilities. As I said at the beginning, fibers are a low-level flow-control primitive. They make it possible to maintain multiple control-flow "positions" within your program (like different "bookmarks" in the pages of a book) and switch between them as desired. Since arbitrary code can run in a fiber, you can call into 3rd-party code on a fiber, and then "freeze" it and continue doing something else when it calls back into code you control.

Imagine something like this: you are writing a server program which will service many clients. A complete interaction with a client involves going through a series of steps, but each connection is transient, and you have to remember state for each client between connections. (Sound like web programming?)

Rather than explicitly storing that state, and checking it each time a client connects (to see what the next "step" they have to do is), you could maintain a fiber for each client. After identifying the client, you would retrieve their fiber and re-start it. Then at the end of each connection, you would suspend the fiber and store it again. This way, you could write straight-line code to implement all the logic for a complete interaction, including all the steps (just as you naturally would if your program was made to run locally).

I'm sure there's many reasons why such a thing may not be practical (at least for now), but again I'm just trying to show you some of the possibilities. Who knows; once you get the concept, you may come up with some totally new application which no-one else has thought of yet!