I have implemented a method which simply loops around a set of CSV files that contain data on a number of different module. This then adds the 'moduleName' into a hashSet. (Code shown below)
I have used a hashSet as it guarantees no duplicates are inserted instead of an ArrayList which would have to use the contain() method and iterate through the list to check if it is already there.
I believe using the hash set has a better performance than an array list. Am I correct in stating that?
Also, can somebody explain to me:
What is the complexity using the big-O notation?
HashSet<String> modulesUploaded = new HashSet<String>();
for (File f: marksheetFiles){
try {
csvFileReader = new CSVFileReader(f);
csvReader = csvFileReader.readFile();
csvReader.readHeaders();
while(csvReader.readRecord()){
String moduleName = csvReader.get("Module");
if (!moduleName.isEmpty()){
modulesUploaded.add(moduleName);
}
}
} catch (IOException e) {
e.printStackTrace();
}
csvReader.close();
}
return modulesUploaded;
}
My experiment shows that HashSet
is faster than an ArrayList
starting at collections of 3 elements inclusively.
A complete results table
| Boost | Collection Size |
| 2x | 3 elements |
| 3x | 10 elements |
| 6x | 50 elements |
| 12x | 200 elements | <= proportion 532-12 vs 10.000-200 elements
| 532x | 10.000 elements | <= shows linear lookup growth for the ArrayList