I am trying to search a large text file (400MB) for a particular String using the following:
File file = new File("fileName.txt");
try {
int count = 0;
Scanner scanner = new Scanner(file);
while(scanner.hasNextLine()) {
if(scanner.nextLine().contains("particularString")) {
count++;
System.out.println("Number of instances of String: " + count);
}
}
} catch (FileNotFoundException e){
System.out.println(e);
}
This works fine for small files however for this particular file and other large ones it takes far too long (>10mins).
What would be the quickest, most efficient way of doing this?
I have now changed to the following and it completes within seconds -
try {
int count = 0;
FileReader fileIn = new FileReader(file);
BufferedReader reader = new BufferedReader(fileIn);
String line;
while((line = reader.readLine()) != null) {
if((line.contains("particularString"))) {
count++;
System.out.println("Number of instances of String " + count);
}
}
}catch (IOException e){
System.out.println(e);
}
1st figure out how long it takes you to actually read the entire file's contents vs how long it takes to scan them for your pattern.
if your results are dominated by the read time (and assumming you read it properly, so channels or at the very least buffered readers) there's not much to do.
if its the scanning time that dominates you could read all lines and then ship small batches of lines to be searched in to a work queue, where you could have multiple threads picking up line batches and searching in them.
ballpark figures
given those 2 estimates i think a proper implementation can easily land you a run time on the order of 10 seconds (if you start kicking off search jobs as you read line batches), and be largely dominated by your disk read time.