July 3, 2012

Automatically binned histograms in Gnuplot

Here follows a gnuplot script that will make a histogram plot for two series of data, with automatic binning of the data, and adjustment of xrange and bin-size (max 20 bins).

Requirements, working installations of gnuplot and perl.

set datafile separator "," #my data is in comma-separated files
set style fill solid 1.00 border lt -1
set key inside right top vertical Right noreverse noenhanced autotitles nobox
set title "Var:4 Dataset:datasetname"
max=`perl -e '$max=-1e38; while (<>) {@t=split(","); $max=$t[3] if $t[3]>$max}; print $max' < series1.csv` #gets the maximum value of column 4 (perl starts with column 0)
if (max<20) bw=1; else bw = max/20 #if max value is small, boxwidth is 1, otherwise it's 5% of max
if (max<20) set xrange [-1:*];else set xrange [-max/20:*] #if max is small set xrange to start at -1, otherwise at -max/20
set boxwidth bw*0.4
set yrange [0:*]
plot 'series1.csv' using (bin($4,bw)-bw*0.2):(1.0) t "0" smooth freq with boxes,'series2.csv' using (bin($4,bw)+bw*0.2):(1.0) t "1" smooth freq with boxes #+-bw adjusts the columns so they end up next to eachother, 0.2 is half of 0.4 which is the width of the boxes set above.

Example output:


  1. Have you every tried using Flot?



  2. No, I'll have to check that out some time. This time I wanted to script something fast to generate hundreds of histograms.

  3. Is it possible to pass a variable to the Perl script instead of the "series1.csv"? I'm altering the script to be able to use a filename as an argument. But I'm stuck with the perl-part.
    I call the script with:
    gnuplot -e "filename='export.csv'" histo-script.plt
    And then use the 'filename' variable in various places e.g. chart title.

    1. I guess in principle it might work, but I don't know how. I would probably just write a wrapper script that uses sed to replace the filename