What is the definition of realtime, near realtime and batch? Give examples of each?

web-services real-time etl batch-processing

Albert T. Wong · Mar 11, 2011 · Viewed 16.3k times · Source

I'm trying to get a good definition of realtime, near realtime and batch? I am not talking about sync and async although to me, they are different dimensions. Here is what I'm thinking

Realtime is sync web services or async web services.
Near realtime could be JMS or messaging systems or most event driven systems.
Batch to me is more of an timed system that is processing when it wakes up.

Give examples of each and feel free to fix my assumptions.

Answer

https://stackoverflow.com/tags/real-time/info

Real-Time

Real-time means that the time of an activity's completion is part of its functional correctness. For example, the sqrt() function's correctness is something like

The sqrt() function is implemented correctly if, for all x >=0, sqrt(x) = y implies y^2 == x.

In this setting, the time it takes to execute the sqrt() procedure is not part of its functional correctness. A faster algorithm may be better in some qualitative sense, but no more or less correct.

Suppose we have a mythical function called sqrtrt(), a real-time version of square root. Imagine, for instance, we need to compute the square root of velocity in order to properly execute the next brake application in an anti-lock braking system. In this setting, we might say instead:

The sqrtrt() function is implemented correctly if

for all x >=0, sqrtrt(x) = y implies y^2 == x and

sqrtrt() returns a result in <= 275 microseconds.

In this case, the time constraint is not merely a performance parameter. If sqrtrt() fails to complete in 275 microseconds, you may be late applying the brakes, triggering either a skid or reduced braking efficiency, possibly resulting in an accident. The time constraint is part of the functional correctness of the routine. Lift this up a few layers, and you get a real-time system as one (at least partially) composed of activities that have timeliness as part of their functional correctness conditions.

Near Real-Time

A near real-time system is one in which activities completion times, responsiveness, or perceived latency when measured against wall clock time are important aspects of system quality. The canonical example of this is a stock ticker system -- you want to get quotes reasonably quickly after the price changes. For most of us non-high-speed-traders, what this means is that the perceived delay between data being available and our seeing it is negligible.

The difference between "real-time" and "near real-time" is both a difference in precision and magnitude. Real-time systems have time constraints that range from microseconds to hours, but those time constraints tend to be fairly precise. Near-real-time usually implies a narrower range of magnitudes -- within human perception tolerances -- but typically aren't articulated precisely.

I would claim that near-real-time systems could be called real-time systems, but that their time constraints are merely probabilistic:

The stock price will be displayed to the user within 500ms of its change at the exchange, with probability p > 0.75.

Batch

Batch operations are those which are perceived to be large blocks of computing tasks with only macroscopic, human- or process-induced deadlines. The specific context of computation is typically not important, and a batch computation is usually a self-contained computational task. Real-time and near-real-time tasks are often strongly coupled to the physical world, and their time constraints emerge from demands from physical/real-world interactions. Batch operations, by contrast, could be computed at any time and at any place; their outputs are solely defined by the inputs provided when the batch is defined.

Original Post

I would say that real-time means that the time (rather than merely the correct output) to complete an operation is part of its correctness.

Near real-time is weasel words for wanting the same thing as real-time but not wanting to go to the discipline/effort/cost to guarantee it.

Batch is "near real-time" where you are even more tolerant of long response times.

Often these terms are used (badly, IMHO) to distinguish among human perceptions of latency/performance. People think real-time is real-fast, e.g., milliseconds or something. Near real-time is often seconds or milliseconds. Batch is a latency of seconds, minutes, hours, or even days. But I think those aren't particularly useful distinctions. If you care about timeliness, there are disciplines to help you get that.