I don't do much coding outside of Matlab, but I have a need to export my Matlab code to another language, most likely C. My Matlab code includes a histogram function, histc(), that places my input data (which is double-precision, not integer) into a specified array of bins, to form a histogram.
I'm sure I can piece together a couple nested loops to generate a histogram function, but I need this function to be fast and memory-light, as it will be accessed repeatedly and often.
To avoid re-inventing the wheel, anyone know if C language has any existing histogram function(s) available for use, or whether people needing such a thing generally create it themselves?
Anyone know an efficient algorithm for creating a histogram? Pseudo-code is fine.
Thanks in advance.
The "ideal" histogram algorithm will depend upon the range you expect to capture. Generally any histogram algorithm will look like this:
const int NSAMPLES = whatever;
double samples[NSAMPLES] = { 1.0, 3.93, 1e30, ... }; // your data set
const int NBUCKETS = 10; // or whatever
int counts[NBUCKETS] = { 0 };
for (int i = 0; i != NSAMPLES; ++i) {
counts[TRANSFER(samples[i])]++;
}
where TRANSFER()
is some function that maps your inputs to a bin (0th or Nth bin mapping to "out of range" of applicable).
The exact implementation of TRANSFER()
depends a lot on the expected distribution of your sample and where you are interested in detail. Some common approaches I have seen:
If you don't know the distribution up-front, then you really can't have an efficient mechanism to bin them effectively: you'll either have to guess (biased or uninformative results) or store everything and sort it at the end, binning into equal-sized buckets (poor performance).