I need to be able to recognise date strings. It doesn't matter if I can not distinguish between month and date (e.g. 12/12/10), I just need to classify the string as being a date, rather than converting it to a Date object. So, this is really a classification rather than parsing problem.
I will have pieces of text such as:
"bla bla bla bla 12 Jan 09 bla bla bla 01/04/10 bla bla bla"
and I need to be able to recognise the start and end boundary for each date string within.
I was wondering if anyone knew of any java libraries that can do this. My google-fu hasn't come up with anything so far.
UPDATE: I need to be able to recognise the widest possible set of ways of representing a dates. Of course the naive solution might be to write an if statement for every conceivable format, but a pattern recognition approach, with a trained model, is ideally what I'm after.
You can loop all available date formats in Java:
for (Locale locale : DateFormat.getAvailableLocales()) {
for (int style = DateFormat.FULL; style <= DateFormat.SHORT; style ++) {
DateFormat df = DateFormat.getDateInstance(style, locale);
try {
df.parse(dateString);
// either return "true", or return the Date obtained Date object
} catch (ParseException ex) {
continue; // unperasable, try the next one
}
}
}
This however won't account for any custom date formats.