Normalizing histogram bins in gnuplot

shivknight picture shivknight · Apr 26, 2011 · Viewed 18.4k times · Source

I'm trying to plot a histogram whose bins are normalized by the number of elements in the bin.

I'm using the following

binwidth=5
bin(x,width)=width*floor(x/width) + binwidth/2.0
plot 'file' using (bin($2, binwidth)):($4) smooth freq with boxes

to get a basic histogram, but I want the value of each bin to be divided by the size of the bin. How can I go about this in gnuplot, or using external tools if necessary?

Answer

Nick picture Nick · Dec 21, 2011

In gnuplot 4.4, functions take on a different property, in that they can execute multiple successive commands, and then return a value (see gnuplot tricks) This means that you can actually calculate the number of points, n, within the gnuplot file without having to know it in advance. This code runs for a file, "out.dat", containing one column: a list of n samples from a normal distribution:

binwidth = 0.1
set boxwidth binwidth
sum = 0

s(x)          = ((sum=sum+1), 0)
bin(x, width) = width*floor(x/width) + binwidth/2.0

plot "out.dat" u ($1):(s($1))
plot "out.dat" u (bin($1, binwidth)):(1.0/(binwidth*sum)) smooth freq w boxes

The first plot statement reads through the datafile and increments sum once for each point, plotting a zero.

The second plot statement actually uses the value of sum to normalise the histogram.