We're working on an application that's going to serve thousands of users daily (90% of them will be active during the working hours, using the system constantly during their workday). The main purpose of the system is to query multiple databases and combine the information from the databases into a single response to the user. Depending on the user input, our query load could be around 500 queries per second for a system with 1000 users. 80% of those queries are read queries.
Now, I did some profiling using the SQL Server Profiler tool and I get on average ~300 logical reads for the read queries (I did not bother with the write queries yet). That would amount to 150k logical reads per second for 1k users. Full production system is expected to have ~10k users.
How do I estimate actual read requirement on the storage for those databases? I am pretty sure that actual physical reads will amount to much less than that, but how do I estimate that? Of course, I can't do an actual run in the production environment as the production environment is not there yet, and I need to tell the hardware guys how much IOPS we're going to need for the system so that they know what to buy.
I tried the HP sizing tool suggested in the previous answers, but it only suggests HP products, without actual performance estimates. Any insight is appreciated.
EDIT: Main read-only dataset (where most of the queries will go) is a couple of gigs (order of magnitude 4gigs) on the disk. This will probably significantly affect the logical vs physical reads. Any insight how to get this ratio?
Disk I/O demand varies tremendously based on many factors, including:
For those reasons, the best way to estimate production disk load is usually by building a small prototype and benchmarking it. Use a copy of production data if you can; otherwise, use a data generation tool to build a similarly sized DB.
With the sample data in place, build a simple benchmark app that produces a mix of the types of queries you're expecting. Scale memory size if you need to.
Measure the results with Windows performance counters. The most useful stats are for the Physical Disk: time per transfer, transfers per second, queue depth, etc.
You can then apply some heuristics (also known as "experience") to those results and extrapolate them to a first-cut estimate for production I/O requirements.
If you absolutely can't build a prototype, then it's possible to make some educated guesses based on initial measurements, but it still takes work. For starters, turn on statistics:
SET STATISTICS IO ON
Before you run a test query, clear the RAM cache:
CHECKPOINT
DBCC DROPCLEANBUFFERS
Then, run your query, and look at physical reads + read-ahead reads to see the physical disk I/O demand. Repeat in some mix without clearing the RAM cache first to get an idea of how much caching will help.
Having said that, I would recommend against using IOPS alone as a target. I realize that SAN vendors and IT managers seem to love IOPS, but they are a very misleading measure of disk subsystem performance. As an example, there can be a 40:1 difference in deliverable IOPS when you switch from sequential I/O to random.