четверг, 4 ноября 2010 г.

gnuplot, cygwin and gnuplot custom points

There are not many free or commersial graphical packages out there which allow to use custom, user-defined points. Amazing Gnuplot allows you to use images for points. This post is based on gnuplot demo.

There are many ways one can create custom shapes/points in gnuplot. One of them is to provide gnuplot special postscript file. Another brilliant example is described here. However, I have not seen any example using images as points. For example .png images. That is a trick I would like to share.

One can find all the files used in example in my Skydrive folder. Today we play with the birthrate data available from WolframAlpha engine. For instance, I would like to plot the the probable number of people in Germany, France, Italy and Spain starting from 2008 to 2010. And I would like to use images for points to separate different countires. Here is the image I would to make in Gnuplot:



How do we do that? First, we need images, I use icons from Mark James. Of course we need gnuplot. Linux users can get a recent version from their Linux distribution repository or compile it themselves, I assume that the same is true for Mac fans. The Windows users can obtain specially compled version from gnuplot site or use one provided in cygwin. I preffer the latter one because this way I have bash command completition possibilities. It makes my life definetely easier. Some a little bit outdated cygwin gnuplot tips can be found here. If you need advice, please do not hesitate to ask. For complete set, you can install linux virtual machine or famous andLinux distribution. It is your call.

Once we have everything and before we begin, here is how I use gnuplot. Simple data visualizations are done in command line like this:
plot [] [] "birthrates.txt" u 1:2 title "Germany" w p pt 6 ps 3
This command line will plot XY data, where X are years, Y is the population at year X with circles, pointsize 3. For quick and dirty data visualization it works. More commands and their description can be found in gnuplot manual. This manual corresponds to gnuplot version 4.4. If you use older versions of gnuplot, you might find differences in syntax/features.
For publication purposes, I highly recommend to write a small script/macro file with right settings of fonts, scales, etc. This way you will always be prepared and you don not have to type every time gnuplot commands. It makes life easier, gnuplot learning process faster. For instance, let us suppose, you made a beautiful plot and want to save it as picture. You have done it, it is simple:
gnuplot> set term png size 640x480 enhanced font "Tahoma,10"
gnuplot> set output "my_beautiful_figure.png"
gnuplot> replot
gnuplot> set term x11; # or wxt if you have it
gnuplot> replot
Well I can do it one time, two times, but if I am going to do it regular fashion, I will create a special file, containing the same commands. This implies - commands without 'gnuplot>' of course. Any time I need a picture, I will load this macro file ('gnu2png') with a command:
gnuplot> load "gnu2png"
This is all for the introduction to the gnuplot. Let us finally do the real stuff. :)

The file 'birthrates.txt' contains the following data:
#        germany          france              italy                  spain
2008   82300000.00    63900000.00    59600000.00      44500000.00
2009   82221650.40    64243143.00    59900384.00      44939660.00
2010   82143375.39    64588128.68    60202281.94      45383663.84
And here comes the first part of my gnuplot script. Full version can be found here.

1:#general stuff
2:unset key
3:set title "Expected population for selected countries
Data taken from www.wolframalpha.com" font "Tahoma,8"
4:
5:set xtics font "Tahoma,8"
6:set ytics font "Tahoma,8"
7:unset ylabel
8:set xlabel "Year" font "Tahoma,9"

We do the following. Line 2: deleting plot's legend. Line 3: setting the title for the plot. Line 5, 6, 7: setting the fonts for xtics, ytics and xlabel.

10:#picture/screen size
11:picx=400; picy=400;
12:
13:#point size - depends on image size
14:iw=16;ih=16; #real image size
15:ixscale=1.3; iyscale=1.3; #scale
16:iw=iw*ixscale;ih=ih*iyscale;
17:
18:#x and y minimum values
19:xi=2007.; xa=2021.;
20:yi=4.0e7; ya=9.0E7;
21:
22:#set terminal
23:set term x11 size picx,picy
24:
25:#ImageCalculated[Height,Width]:
26:#not precize, but good first approximation
27:icw=abs(xa-xi)/picx; ich=abs(ya-yi)/picy;
28:
29:#helpful styles to avoid too many words
30:set macros
31:filestyle="binary filetype=png"
32:filestyle1="with rgbimage notitle"

Here is the necessary tricky part. Line 11: setting the terminal or picture size. Yes, yes, it is the size we will use for the final image. Line 14: setting the sizes of real images, here, .png images we will use for each point. We assume all points will have initial .png images of same dimensions (16x16px). Line 15: setting the adjustable scale. These parameters control the x and y scaling of the points (~point size). Line 16: recalculating widths and heights of images. Lines 19,20: setting the limits for our figure. [xy]i are the minimum, [xy]a are the maximum values. Line 23: setting the visual terminal. It is important to see what we get. Line 27: providing necessary parameters, we need to convert plot coordinates to pixels, and recalculate pixels to plot coordinates. At this point we calculate the conversion terms for width (icw) and height (ich). Line 30, 31, 32: setting some helpful parameters, for more information, please read the manual.

At this point the preparations are over, now we need to plot the data. Bad news: we need to plot point by point. This command allow us to plot only one point of the graph
plot [xi:xa] [yi:ya]  "de.png" \
@filestyle \
origin=(2008-icw*iw/2,82300000-ich*ih/2) \
dx=icw*ixscale dy=ich*iyscale \
@filestyle1
We plot a figure with x axis starting at 'xi' and ending at 'xa'. The same - for the y axis. We use 'de.png' icon for one point. By using 'origin=' command we set the position of the point. In this case the center of the 'de.png' image should correspond to the value of the point we want to plot. The 'dx=', 'dy=' commands set the size of point. Remember that we need to convert pixels to plot coordinates? This is what we do here. The '@filestyle' and '@filestyle1' substitution parameters we have defined above. And once again - we have to provide this line/lines for every single point on our graph.

The Good news: well people, we can use bash (native or cygwin's) or even Excel's/OpenOffice Calc capabilities to quickly get and convert data to the format we need. Believe me, it is fast if one uses spreadsheets. Some of us use them any way. I will not describe it here due to simplicity of the task. The hint is: we need to construct a special line containing the x and y coordinates of the point. These coordinates are the only thing one needs to change for gnuplot description of the single point. I will provide the solution for Linux/cygwin users. Behold, this line outputs the lines necessary to plot Spain related data (5th column):
cat birthrates.txt |grep -v "#"|awk '{print " \"es.png\" @filestyle origin=("$1"-icw*iw/2,"$5"-ich*ih/2) \
dx=icw*ixscale dy=ich*iyscale @filestyle1, \\"}'
Let us take a closer look. What we see here? The first two commands:
cat birthrates.txt |grep -v "#"| output the whole data file 'birthrates.txt' line by line, excluding the lines which contain '#' symbol. Line by line the output is piped to awk command which selects the first ($1) and the fifth ($5) columns and prepares a line we need for gnuplot. Alhough, this solution looks ugly, it is fast and efficient.

This is all folks for today. Just in case: the data, scripts and images one can download from Skydrive.