Recognise an arbitrary date string

Joel picture Joel · Oct 3, 2010 · Viewed 11.4k times · Source

I need to be able to recognise date strings. It doesn't matter if I can not distinguish between month and date (e.g. 12/12/10), I just need to classify the string as being a date, rather than converting it to a Date object. So, this is really a classification rather than parsing problem.

I will have pieces of text such as:

"bla bla bla bla 12 Jan 09 bla bla bla 01/04/10 bla bla bla"

and I need to be able to recognise the start and end boundary for each date string within.

I was wondering if anyone knew of any java libraries that can do this. My google-fu hasn't come up with anything so far.

UPDATE: I need to be able to recognise the widest possible set of ways of representing a dates. Of course the naive solution might be to write an if statement for every conceivable format, but a pattern recognition approach, with a trained model, is ideally what I'm after.

Answer

Bozho picture Bozho · Oct 19, 2010

You can loop all available date formats in Java:

for (Locale locale : DateFormat.getAvailableLocales()) {
    for (int style =  DateFormat.FULL; style <= DateFormat.SHORT; style ++) {
        DateFormat df = DateFormat.getDateInstance(style, locale);
        try {
                df.parse(dateString);
                // either return "true", or return the Date obtained Date object
        } catch (ParseException ex) {
            continue; // unperasable, try the next one
        }
    }
}

This however won't account for any custom date formats.