First up, I don't know much about USB, so apologies in advance if my question is wrong.
In USB 2.0 the polling interval was 0.125ms, so the best possible latency for the host to read some data from the device was 0.125ms. I'm hoping for reduced latency in USB 3.0 devices, but I'm finding it hard to learn what the minimum latency is. The USB 3.0 spec says, "USB 2.0 style polling has been replaced with asynchronous notifications", which implies the 0.125ms polling interval may no longer be a limit.
I found some benchmarks for a USB 3.0 SSDs that look like data can be read from the device in just slightly less than 0.125ms, and that includes all time spent in the host OS and the device's flash controller.
http://www.guru3d.com/articles_pages/ocz_enyo_usb_3_portable_ssd_review,8.html
Can someone tell me what the lowest possible latency is? A theoretical answer is fine. An answer including the practical limits of the various versions of Linux and Windows USB stacks would be awesome.
To head-off the "tell me what you're trying to achieve" question, I'm creating a debug interface for the ASICs my company designs. ie A PC connects to one of our ASICs via a debug dongle. One possible use case is to implement conditional breakpoints when the ASIC hardware only implements simple breakpoints. To do so, I need to determine when a simple breakpoint has been hit, evaluate the condition, if false set the processor running again. The simple breakpoint may be hit millions of times before the condition becomes true. We might implement the debug dongle on an FPGA or an off-the-shelf USB 3.0 enabled micro-controller.
Answering my own question...
I've come to realise that this question kind-of misses the point of USB 3.0. Unlike 2.0, it is not a shared-bus system. Instead it uses a point-to-point link between the host and each device (I'm oversimplifying but the gist is true). With USB 2.0, the 125 us polling interval was critical to how the bus was time-division multiplexed between devices. However, because 3.0 uses point-to-point links, there is no multiplexing to be done and thus the polling interval no longer exists. As a result, the latency on packet delivery is much less than with USB 2.0.
In my experiments with a Cypress FX-3 devkit, I have found that it is easy enough to get an average round trip from Windows application to the device and back with an average latency of 30 us. I suspect that the vast majority of that time is spent in various OS delays, eg the user-space to kernel-space mode switch and the DPC latency within the driver.