Regular expression to extract SQL query

Velth picture Velth · May 21, 2013 · Viewed 16.1k times · Source

Is there a regex which extracts SQL queries from a string? I'm NOT interested to validate any SQL syntax, rather and only extracting a selection of SQL commands. This to parse a given SQL file/string in a flexible manner.

Given is the following SQL file/string example:

SELECT
    *
FROM
    test_table
WHERE
    test_row = 'Testing ; semicolon';

SELECT * FROM another_test_table;

INSERT INTO 
    table_name 
VALUES 
    (value1,'value which contains semicolon ;;;;',value3,...);

Some pseudocode example would be: ^(UPDATE|SELECT|INSERT INTO)(.*)(;)$. In the future i'm looking to extend this with all (possible) commands.

  • Look for a starting match with either: (UPDATE|SELECT|INSERT|INTO)
  • Zero or more any character (including whitespaces and newlines)
  • Stop at ;, which delimits the SQL query.

Whenever this would be possible via a regex the following java code is able to extract all SQL commands:

final String regex = "LOOKING_FOR_THIS_ONE";
final Pattern p = Pattern.compile(regex, Pattern.MULTILINE);
final Matcher matcher = p.matcher(content);

while (matcher.find()) {
  // matcher.group() now contains the full SQL command
}

Thanks in advance!

Answer

melwil picture melwil · May 21, 2013

I'll start off by saying that this is not a good way of doing it, and strongly urge you to find another method of doing it, preferrably tagging it properly where the statements are made, so you don't end up in this situation.

That being said, SQL requires it to start with one of the following; DELETE, SELECT, WITH, UPDATE or INSERT INTO. It also requires that the input ends with ;.

We can use this to grab all sequences matching SQL with the following:

final String regex = "^(INSERT INTO|UPDATE|SELECT|WITH|DELETE)(?:[^;']|(?:'[^']+'))+;\\s*$";
final Pattern p = Pattern.compile(regex, Pattern.MULTILINE | Pattern.DOTALL);

Group 1 now holds the operating word, in case you wish to filter valid SQL on UPDATE or SELECT.

See the regex in action, as well as a cave-at here:

https://regex101.com/r/dt9XTK/2