Databases versus plain text

database complexity-theory

dsimcha · Feb 5, 2009 · Viewed 8.3k times · Source

When dealing with small projects, what do you feel is the break even point for storing data in simple text files, hash tables, etc., versus using a real database? For small projects with simple data management requirements, a real database is unnecessary complexity and violates YAGNI. However, at some point the complexity of a database is obviously worth it. What are some signs that your problem is too complex for simple ad-hoc techniques and needs a real database?

Note: To people used to enterprise environments, this will probably sound like a weird question. However, my problem domain is bioinformatics. Most of my programming is prototypes, not production code. I'm primarily a domain expert and secondarily a programmer. Most of my code is algorithm-centric, not data management-centric. The purpose of this question is largely for me to figure out how much work I might save in the long run if I learn to use proper databases in my code instead of the more ad-hoc techniques I typically use.

Answer

1) Concurrency. Do you have multiple people accessing the same dataset? Then it's going to get pretty involved to broker all of the different readers and writers in a scalable fashion if you roll your own system.

2) Formatting and relationships: Is your data something that doesn't fit neatly into a table structure? Long nucleotide sequences and stuff like that? That's not really conveniently tabular data.

Another example: Nobody would consider implementing software like Photoshop to store PSDs in a relational format, because the data structures don't really lend themselves to that type of storage or query pattern.

3) ACID (sort of a corollary to #1): If Atomicity, Consistency, Integrity, and Durability are not challenges with a flat file, then go with a flat file.

Databases versus plain text

Answer

Related questions