Java what's the best data structure to search objects by keywords

oat picture oat · Jun 25, 2014 · Viewed 8.3k times · Source

suppose I have a "journal article" class which has variables such as year, author(s), title, journal name, keyword(s), etc.

variables such as authors and keywords might be declared as String[] authors and String[] keywords

What's the best data structure to search among a group of objects of "journal paper" by one or several "keywords", or one of several author names, or part of the title?

Thanks!

========================================================================== Following everybody's help, the test code realized via the Processing environment is shown below. Advices are greatly appreciated! Thanks!

ArrayList<Paper> papers = new ArrayList<Paper>();

HashMap<String, ArrayList<Paper>> hm = new HashMap<String, ArrayList<Paper>>();

void setup(){
  Paper paperA = new Paper();
  paperA.title = "paperA";
  paperA.keywords.append("cat");
  paperA.keywords.append("dog");
  paperA.keywords.append("egg");
  //println(paperA.keywords);
  papers.add(paperA);

  Paper paperC = new Paper();
  paperC.title = "paperC";
  paperC.keywords.append("egg");
  paperC.keywords.append("cat");
  //println(paperC.keywords);
  papers.add(paperC);

  Paper paperB = new Paper();
  paperB.title = "paperB";
  paperB.keywords.append("dog");
  paperB.keywords.append("egg");
  //println(paperB.keywords); 
  papers.add(paperB);

  for (Paper p : papers) {
    // get a list of keywords for the current paper
    StringList keywords = p.keywords;

    // go through each keyword of the current paper
    for (int i=0; i<keywords.size(); i++) {
      String keyword = keywords.get(i);

      if ( hm.containsKey(keyword) ) { 
        // if the hashmap has this keyword
        // get the current paper list associated with this keyword
        // which is the "value" of this keyword
        ArrayList<Paper> papers = hm.get(keyword);        
        papers.add(p); // add the current paper to the paper list        
        hm.put(keyword, papers); // put the keyword and its paper list back to hashmap
      } else { 
        // if the hashmap doesn't have this keyword
        // create a new Arraylist to store the papers with this keyword
        ArrayList<Paper> papers = new ArrayList<Paper>();        
        papers.add(p); // add the current paper to this ArrayList        
        hm.put(keyword, papers); // put this new keyword and its paper list to hashmap
      }
    }

  }

  ArrayList<Paper> paperList = new ArrayList<Paper>();
  paperList = hm.get("egg");
  for (Paper p : paperList) {
    println(p.title);
  }
}

void draw(){}

class Paper 
{
  //===== variables =====
  int ID;
  int year;
  String title;
  StringList authors  = new StringList();
  StringList keywords = new StringList();
  String DOI;
  String typeOfRef;
  String nameOfSource;
  String abs; // abstract


  //===== constructor =====

  //===== update =====

  //===== display =====
}

Answer

TonyGW picture TonyGW · Jun 25, 2014

Use a HashMap<String, JournalArticle> data structure.

for example

Map<String, JournalArticle> journals = new HashMap<String, JournalArticle>();
journals.put("keyword1", testJA);

if (journals.containsKey("keyword1")
{
    return journals.get("keyword1");
}

you can put your keywords as the key of String type in this map, however, it only supports "exact-match" kind of search, meaning that you have to use the keyword (stored as key in the Hashmap) in your search.

If you are looking for " like " kind of search, I suggest you save your objects in a database that supports queries for "like".

Edit: on a second thought, I think you can do some-kind-of "like" queries (just like the like clause in SQL), but the efficiency is not going to be too good, because you are iterating through all the keys in the HashMap whenever you do a query. If you know regex, you can do all kinds of queries with modification of the following example code (e.g. key.matches(pattern)):

    List<JournalArticle> results = null;

    for (String key : journals.keySet())
    {
        if (key.contains("keyword"))  /* keyword has to be part of the key stored in the HashMap, but does not have to be an exact match any more */
            results.add(journals.get(key));
    }

    return results;