July 3, 2012

Automatically binned histograms in Gnuplot

Here follows a gnuplot script that will make a histogram plot for two series of data, with automatic binning of the data, and adjustment of xrange and bin-size (max 20 bins).

Requirements, working installations of gnuplot and perl.

set datafile separator "," #my data is in comma-separated files
set style fill solid 1.00 border lt -1
set key inside right top vertical Right noreverse noenhanced autotitles nobox
set title "Var:4 Dataset:datasetname"
max=`perl -e '$max=-1e38; while (<>) {@t=split(","); $max=$t[3] if $t[3]>$max}; print $max' < series1.csv` #gets the maximum value of column 4 (perl starts with column 0)
if (max<20) bw=1; else bw = max/20 #if max value is small, boxwidth is 1, otherwise it's 5% of max
if (max<20) set xrange [-1:*];else set xrange [-max/20:*] #if max is small set xrange to start at -1, otherwise at -max/20
bin(x,width)=width*floor(x/width)
set boxwidth bw*0.4
set yrange [0:*]
plot 'series1.csv' using (bin($4,bw)-bw*0.2):(1.0) t "0" smooth freq with boxes,'series2.csv' using (bin($4,bw)+bw*0.2):(1.0) t "1" smooth freq with boxes #+-bw adjusts the columns so they end up next to eachother, 0.2 is half of 0.4 which is the width of the boxes set above.

Example output: