I have a data file that looks like this :
"curve 0"
0 0.7800
10 0.333
12 0.5136
24 0.2096
26 -0.066
40 -0.674
42 -1.123
"curve 1"
0 0.876
2 0.73
4 0.693
6 0.672
10 0.70
12 0.88
16 0.95
148 -0.75
"curve 2"
8 2.2305
10 2.144
12 2.13
76 1.26
78 0.39
98 -0.97
I would like to plot each block of data independently of the others using gnuplot. Here's the code I'm using for this purpose :
plot 'file' i 0 u 1:2 w lines title columnheader(1),\
'file' i 1 u 1:2 w lines title columnheader(1),\
'file' i 2 u 1:2 w lines title columnheader(1),\
'file' i 3 u 1:2 w lines title columnheader(1)
It works fine.
Now, I would like to determine in each data block the point (x,y) that has the maximum y-value, and plot it with a marker which has the same color as the curve corresponding to this data block. I tried to use
max_y = GPVAL_DATA_Y_MAX
replot 'file' u ($2 == max_y ? $2 : 1/0):1
after the previous code, but it seems that this finds the maximum over the whole second column including all blocks.
The second thing I would like to do is : for each data block and with a marker that has a different shape but the same color (that of the curve) than the marker for the maximums, plot the first line of that block.
Are these two tasks possible with gnuplot and with the way I'm plotting the curves (columnheader)?
This can be done. It will use the stats command extensively, and a temporary file. In gnuplot 5, the temporary file can be created in memory using a named data block (see help datablocks
).
Additionally, as your plot command is largely repetitive, you can use the plot for syntax
plot for[in=0:2] 'file' i in u 1:2 w lines t columnheader(1)
which will repeat the plot command using the values 0 through 2 for the variable in (your provided command uses four data blocks, but your provided data file only has 3).
The following script will accomplish what you want:
stats 'file' u 1:2 nooutput
blocks = STATS_blocks
set print 'tempfile'
first_y = ""
first_x = ""
do for[i=0:blocks-1] {
stats 'file' index i u (first_x=($0==1)?sprintf("%s %f",first_x,$1):first_x,first_y=($0==1)?sprintf("%s %f",first_y,$2):first_y,$1):2 nooutput
print sprintf("%f %f",STATS_pos_max_y,STATS_max_y)
}
print ""
print ""
do for[i=1:blocks] {
print sprintf("%s %s",word(first_x,i),word(first_y,i))
}
set print
plot for[i=0:blocks-1] 'file' i i u 1:2 w lines title columnheader(1),\
for[i=0:1] 'tempfile' i i u 1:2:($0+1) w points pt (i==0?7:9) lc variable not
This produces (with your provided datafile)
In the case of curve 0 and 2, the first and maximum points are the same, so the symbols are obscured.
Replotting this, but altering the specification to move the first point markers up by 0.1, we can see that they show up where they should.
This section is going to be long, but I will break down the code and explain it in detail, as close to line by line as possible, because there are a few subtle things in here.
The first two lines
stats 'file' u 1:2 nooutput
blocks = STATS_blocks
run the stats command over the file. Because of the named column headers, the stats function will fail if we don't specify a using spec, so we give it the u 1:2
spec. The nooutput
option tells the stats command to capture the results, but do not output them. Here we only care about getting the number of blocks. We store this in the variable blocks (as later stats commands will overwrite the variable). We could have given a named prefix, but that would have saved all variables and there is no reason for that. Instead of these two commands, in the case of exactly 3 blocks, we could have just substituted the value 3 for all occurrences of blocks below, but this way the number of blocks is not hard-coded.
Next, we use set print 'tempfile'
to redirect print commands to a temporary file. We will build up a new datafile that contains the maximum points and the first points.
The next section of code
first_y = ""
first_x = ""
do for[i=0:blocks-1] {
stats 'file' index i u (first_x=($0==1)?sprintf("%s %f",first_x,$1):first_x,first_y=($0==1)?sprintf("%s %f",first_y,$2):first_y,$1):2 nooutput
print sprintf("%f %f",STATS_pos_max_y,STATS_max_y)
}
is the most difficult and where most of the magic happens. We are going to create our temporary file to have two datablocks. The first is the maximum values and the second is the first values. We will compute the first points in memory and add them after we have created that first data block. The x coordinates and y coordinates will be stored in a space separated string variable.
We iterate over all the data blocks and compute a stats command for it. The expression
(first_x=($0==1)?sprintf("%s %f",first_x,$1):first_x,first_y=($0==1)?sprintf("%s %f",first_y,$2):first_y,$1)
reassigns the two string variables for each point read in. To do this, it first checks if the point is the first one in series (the value of $0 will be 1 since the 0 value corresponds to the header line). If it is, it rebuilds the string variable by adding the value of the first column to it (and similarly for the y coordinates). Otherwise, it just reassigns the same thing to the variable. Finally, it returns the value in the first column. When expressions are put in parentheses and comma separated like this, each expression is evaluated in turn, and the final value is returned.
Thus the stats command behaves like it was
stats 'file' index i u 1:2 nooutput
but this little trick allows us to read the first line values and store them when they come in. Finally the point with the maximum y value is printed out. This will go into the temporary file.
Now we need to add the first points to the temporary file as a new datablock. So first we print two blank lines and then we again iterate over the number of blocks running
print sprintf("%s %s",word(first_x,i),word(first_y,i))
for each block (where i is the number of the block). The word function treats a string variable as a space separated list of words and pulls off the requested word. At this point our string variables look like
0.000000 0.000000 8.000000 # first_x
0.780000 0.876000 2.230500 # first_y
Finally, we issue set print
which restores the print command to print to the console. We have now built a temporary file which looks like
0.000000 0.780000
16.000000 0.950000
8.000000 2.230500
0.000000 0.780000
0.000000 0.876000
8.000000 2.230500
where the first datablock are the points with the maximum y-value and the second datablock are the first points.
Finally, we plot with
plot for[i=0:blocks-1] 'file' i i u 1:2 w lines title columnheader(1),\
for[i=0:1] 'tempfile' i i u 1:2:($0+1) w points pt (i==0?7:9) lc variable not
The first part of this is identical to before, just with the blocks variable used instead of hard-coding the number of blocks.
Next we plot the temporary file twice with index 0 and index 1. The line color is variable based on the line number (0 through 2 in this case). We add one to force the normally 0 based line number to be 1 through 3. This will correspond with the datablocks from before. We plot with points and select the point type based on the datablock we are plotting. It is either a filled circle (for the maximums) or filled triangle (for the first points) in this case.