I'm wondering if there is a standard or "normal" means of interpreting time interval data end points with respect to inclusiveness/exclusiveness of the value defining the end point. Note however that I am asking what the standard (or most common) convention is (if there is one), not for a dissertation on your personal preference. If you really want to provide a dissertation, please attach it to a reference to someone's published standard or a standard text on the matter. Open standards (that I don't have to pay to read) are greatly preferred unless they are fundamentally flawed :).
Of course there are 4 possibilities for a time interval from A to B:
Each of these has different characteristics (as I see it, feel free to point out more)
The [A, B] convention would have the seemingly inconvenient property that B is contained withing the inteval [A, B] and also [B, C]. This is particularly inconvenient if B is meant to represent the midnight boundary and you are trying to determine which day it falls on for example. Also, this means the duration of the interval is slightly irritatig to calculate since [A, B] where A = B should have a length of 1 and therefore the duration of [A, B] is (B - A) + 1
Similarly the (A, B) convention would have the difficulty that B falls within neither (A,B) nor (B,C)... continuing the analogy with day boundaries, midnight would be part of neither day. This is also logically inconvenient because [A, B] where A = B is a non-sense interval with duration less than zero, but reversing A and B does not make it a valid interval.
So I think I want either [A, B), or (A, B] and I can't figure out how to decide between them.
So if someone has a link to a standards document, reference to a standard text or similar that clarify the convention that would be great. Alternately, if you can link a variety of standards documents and/or references that more or less completely fail to agree, then I can just pick one that seems to have sufficient authority to CMA and be done with it :).
Finally, I will be working in Java, so I am particularly susceptible to answers that work well in Java.
In the general case, [A, B)
(inclusive start, exclusive end) has a lot going for it and I don't see any reason why the same wouldn't be true for time intervals.
Djikstra wrote a nice article about it Why numbering should start at zero which - despite the name - deals mostly with exactly this.
Short summary of the advantages:
end - start
equals the number of items in the listPersonally the second point is extremely useful for lots of problems; consider a pretty standard recursive function (in pseudo python):
def foo(start, end):
if end - start == 1:
# base case
else:
middle = start + (end - start) / 2
foo(start, middle)
foo(middle, end)
Writing the same with inclusive upper bound introduces lots of error prone off by one errors.
[1] That's the advantage compared to (A, B]
- a interval starting from 0 is MUCH more common than an interval ending in MAX_VAL
. Note that also relates to one additional problem: Using two inclusive bounds means we can denote a sequence whose length cannot be expressed with the same size.