I am working with a TreeMap of Strings TreeMap<String, String>
, and using it to implement a Dictionay of words.
I then have a collection of files, and would like to create a representation of each file in the vector space (space of words) defined by the dictionary.
Each file should have a vector representing it with following properties:
So my idea is to use a Vector<Boolean>
to implement these vectors. (This way of representing documents in a collection is called Boolean Model - http://www.site.uottawa.ca/~diana/csi4107/L3.pdf)
The problem I am facing in the procedure to create this vector is that I need a way to find position of a word in the dictionary, something like this:
String key;
int i = get_position_of_key_in_Treemap(key); <--- purely invented method...
1) Is there any method like this I can use on a TreeMap?If not could you provide some code to help me implement it by myself?
2) Is there an iterator on TreeMap (it's alphabetically ordered on keys) of which I can get position?
3)Eventually should I use another class to implement dictionary?(If you think that with TreeMaps I can't do what I need) If yes, which?
Thanks in advance.
ADDED PART:
Solution proposed by dasblinkenlight looks fine but has the problem of complexity (linear with dimension of dictionary due to copying keys into an array), and the idea of doing it for each file is not acceptable.
Any other ideas for my questions?
Once you have constructed your tree map, copy its sorted keys into an array, and use Arrays.binarySearch
to look up the index in O(logN) time. If you need the value, do a lookup on the original map too.
Edit: this is how you copy keys into an array
String[] mapKeys = new String[treeMap.size()];
int pos = 0;
for (String key : treeMap.keySet()) {
mapKeys[pos++] = key;
}