Should I use Drools in this situation?

Tong Wang picture Tong Wang · Feb 16, 2010 · Viewed 8.7k times · Source

I'll use a university's library system to explain my use case. Students register in the library system and provide their profile: gender, age, department, previously completed courses, currently registered courses, books already borrowed, etc. Each book in the library system will define some borrowing rules based on students' profile, for example, a textbook for the computer algorithm can only be borrowed by students currently registered with that class; another textbook may only be borrowed by students in the math department; there could also be rules such that students can only borrow 2 computer networking book at most. As a result of the borrowing rules, when a student searches/browses in the library system, he will only see the books that can be borrowed by him. So, the requirement really comes down to the line of efficiently generating the list of books that a student is eligible to borrow.

Here is how I vision the design using Drools - each book will have a rule with a few field constraints on the student profile as LHS, the RHS of the book rule simply adds the book id to a global result list, then all the book rules are loaded into a RuleBase. When a student searches/browsers the library system, a stateless session is created from the RuleBase and the student's profile is asserted as the fact, then every book that the student can borrow will fire its book rule and you get the complete list of books that the students can borrow in the global result list.

A few assumptions: the library will handle millions of books; I don't expect the book rule be too complicated, 3 simple field constraints for each rule on average at the most; the number of students that the system needs to handle is in the range of 100K, so the load is fairly heavy. My questions are: how much memory will Drools take if loaded with a million book rules? How fast will it be for all those million rules to fire? If Drools is the right fit, I'd like to hear some best practices in designing such a system from you experienced users. Thanks.

Answer

Michael Deardeuff picture Michael Deardeuff · Feb 17, 2010

First, Don't make rules for every book. Make rules on the restrictions—there are a lot fewer restrictions defined than books. This will make a huge impact on the running time and memory usage.

Running a ton of books through the rule engine is going to be expensive. Especially since you won't show all the results to the user: only 10-50 per page. One idea that comes to mind is to use the rule engine to build a set of query criteria. (I wouldn't actually do this—see below.)

Here's what I have in mind:

rule "Only two books for networking"
when
  Student($checkedOutBooks : checkedOutBooks),
  Book(subjects contains "networking", $book1 : id) from $checkedOutBooks,
  Book(subjects contains "networking", id != $book1) from $checkedOutBooks
then
  criteria.add("subject is not 'networking'", PRIORITY.LOW);
end

rule "Books allowed for course"
when
  $course : Course($textbooks : textbooks),
  Student(enrolledCourses contains $course)

  Book($book : id) from $textbooks,
then
  criteria.add("book_id = " + $book, PRIORITY.HIGH);
end

But I wouldn't actually do that!

This is how I would have changed the problem: Not showing the books to the user is a poor experience. A user may want to peruse the books to see which books to get next time. Show the books, but disallow the checkout of restricted books. This way, you only have 1-50 books to run through the rules at a time per user. This will be pretty zippy. The above rules would become:

rule "Allowed for course"
   activation-group "Only one rule is fired"
   salience 10000
when
  // This book is about to be displayed on the page, hence inserted into working memory
  $book : Book(),

  $course : Course(textbooks contains $book),
  Student(enrolledCourses contains $course),
then
  //Do nothing, allow the book
end

rule "Only two books for networking"
   activation-group "Only one rule is fired"
   salience 100
when
  Student($checkedOutBooks : checkedOutBooks),
  Book(subjects contains "networking", $book1 : id) from $checkedOutBooks,
  Book(subjects contains "networking", id != $book1) from $checkedOutBooks,

  // This book is about to be displayed on the page, hence inserted into working memory.
  $book : Book(subjects contains "networking")
then
  disallowedForCheckout.put($book, "Cannot have more than two networking books");
end

Where I am using activation-group to make sure only one rule is fired, and salience to make sure they are fired in the order I want them to be.

Finally, keep the rules cached. Drools allows—and suggests that—you load the rules only once into a knowledge base and then create sessions from that. Knowledge bases are expensive, sessions are cheap.