Wildcard string matching in Ruby

sa125 picture sa125 · Jun 23, 2011 · Viewed 15.3k times · Source

I'd like to write a utility function/module that'll provide simple wildcard/glob matching to strings. The reason I'm not using regular expressions is that the user will be the one who'll end up providing the patterns to match using some sort of configuration file. I could not find any such gem that's stable - tried joker but it had problems setting up.

The functionality I'm looking for is simple. For example, given the following patterns, here are the matches:

pattern | test-string         | match
========|=====================|====================
*hn     | john, johnny, hanna | true , false, false     # wildcard  , similar to /hn$/i
*hn*    | john, johnny, hanna | true , true , false     # like /hn/i
hn      | john, johnny, hanna | false, false, false     # /^hn$/i
*h*n*   | john, johnny, hanna | true , true , true
etc...

I'd like this to be as efficient as possible. I thought about creating regexes from the pattern strings, but that seemed rather inefficient to do at runtime. Any suggestions on this implementation? thanks.

EDIT: I'm using ruby 1.8.7

Answer

Joshua Cheek picture Joshua Cheek · Jun 23, 2011

I don't see why you think it would be inefficient. Predictions about these sorts of things are notoriously unreliable, you should decide that it is too slow before you go bending over backwards to find a faster way. And then you should profile it to make sure that this is where the problem lies (btw there is an average of 3-4x speed boost from switching to 1.9)

Anyway, it should be pretty easy to do this, something like:

class Globber 
  def self.parse_to_regex(str)
    escaped = Regexp.escape(str).gsub('\*','.*?')
    Regexp.new "^#{escaped}$", Regexp::IGNORECASE
  end

  def initialize(str)
    @regex = self.class.parse_to_regex str
  end

  def =~(str)
    !!(str =~ @regex)
  end
end


glob_strs = {
  '*hn'    => [['john', true, ], ['johnny', false,], ['hanna', false]],
  '*hn*'   => [['john', true, ], ['johnny', true, ], ['hanna', false]],
  'hn'     => [['john', false,], ['johnny', false,], ['hanna', false]],
  '*h*n*'  => [['john', true, ], ['johnny', true, ], ['hanna', true ]],
}

puts glob_strs.all? { |to_glob, examples|
  examples.all? do |to_match, expectation|
    result = Globber.new(to_glob) =~ to_match
    result == expectation
  end
}
# >> true