SQL Read Where IN (Long List from .TXT file)

user206168 picture user206168 · Sep 27, 2017 · Viewed 9.8k times · Source

I have a long list about 5000+ of ID's (numbers).

ID
4
5
6
9
10
14
62
63
655
656
657
658
659
661
662

I would like to know if there a way to call to read the ID's from the txt file instead of typing all 5000 in the query?

example

SELECT count(*) from table where ID in (file1.txt)

Answer

zedfoxus picture zedfoxus · Sep 28, 2017

You have a few options, of which one option is my recommended one.

Option 1

Create a table in your database like so:

create table ID_Comparer (
    ID int primary key
);

With a programming language of your choice, empty out the table and then load the 5000+ IDs that you want to eventually query in this table.

Then, write one of these queries to extract the data you want:

select *
from main_table m
where exists (
    select 1 from ID_Comparer where ID = m.ID
)

or

select *
from main_table m
inner join ID_Comparer c on m.ID = c.ID

Since ID_Comparer and (assuming that) main_table's ID is indexed/keyed, matching should be relatively fast.

Option 1 modified

This option is just like the one above but helps a bit with concurrency. That means, if application 1 is wanting to compare 2000 IDs whereas application 2 is wanting to compare 5000 IDs with your main table at the same time, you'd not want to delete data from comparer table. So, change the table a bit.

create table ID_Comparer (
    ID int primary key,
    token char(32), -- index this
    entered date default current_date() -- use the syntax of your DB
);

Then, use your favorite programming language to create a GUID. Load all the ID and the same GUID into the table like so:

1, 7089e5eced2f408eac8b390d2e891df5
2, 7089e5eced2f408eac8b390d2e891df5
...

Another process doing the same thing will be loading its own IDs with a GUID

2412, 96d9d6aa6b8d49ada44af5a99e6edf56
9434, 96d9d6aa6b8d49ada44af5a99e6edf56
...

Now, your select:

select *
from main_table m
where exists (
    select 1 from ID_Comparer where ID = m.ID and token = '<your guid>'
)

OR

select *
from main_table m
inner join ID_Comparer c on m.ID = c.ID and token = '<your guid>'

After you receive your data, be sure to do delete from ID_Comparer where token = '<your guid>' - that'd just be nice cleanup

You could create a nightly task to remove all data that's more than 2 days old or some such for additional housekeeping.

Since ID_Comparer and (assuming that) main_table's ID is indexed/keyed, matching should be relatively fast even when the GUID is an additional keyed lookup.

Option 2

Instead of creating a table, you could create a large SQL query like so:

select * from main_table where id = <first id>
union select * from main_table where id = <second id>
union select * from main_table where id = <third id>
...

OR

select * from main_table where id IN (<first 5 ids>)
union select * from main_table where id IN (<next 5 ids>)
union select * from main_table where id IN (<next 5 ids>)
...

If the performance is acceptable and if creating a new table like in option 1 doesn't feel right to you, you could try one of these methods.

(assuming that) main_table's ID is indexed/keyed, individual matching might result in faster query rather than matching with a long list of comma separated values. That's a speculation. You'll have to see the query plan and run it against a test case.

Which option to choose?

Testing these options should be fast. I'd recommend trying all these options with your database engine and the size of your table and see which one suits your use-case the most.