How to interpolate points between two irregular sets of data?

nucleon picture nucleon · Dec 8, 2013 · Viewed 8.1k times · Source

I'm sorry for the somewhat confusing title, but I wasn't sure how to sum this up any clearer.

I have two sets of X,Y data, each set corresponding to a general overall value. They are fairly densely sampled from the raw data. What I'm looking for is a way to find an interpolated X for any given Y for a value in between the sets I already have.

The graph makes this more clear:

A graph of points

In this case, the red line is from a set corresponding to 100, the yellow line is from a set corresponding to 50.

I want to be able to say, assuming these sets correspond to a gradient of values (even though they are clearly made up of discrete X,Y measurements), how do I find, say, where the X would be if the Y was 500 for a set that corresponded to a value of 75?

In the example here I would expect my desired point to be somewhere around here:

A graph of points with an interpolated point

I do not need this function to be overly fancy — it can be simple linear interpolation of data points. I'm just having trouble thinking it through.

Note that neither the Xs nor the Ys of the two sets overlap perfectly. However it is rather trivial to say, "where are the nearest X points these sets share," or "where are the nearest Y points these sets share."

I have used simple interpolation between known values (e.g. find the X for corresponding Ys for set "50" and "100", then average those to get "75") and I end up with something like that looks like this:

Not very good interpolation

So clearly I am doing something wrong here. Obviously in this case X is (correctly) returning as 0 for all of those cases where the Y is higher than the maximum Y of the "lowest" set. Things start out great but somewhere around when one starts to approach the maximum Y for the lowest set it starts going haywire.

It's easy to see why mine is going wrong. Here's another way to look at the problem:

Illustration

In the "correct" version, X ought to be about 250. Instead, what I'm doing is essentially averaging 400 and 0 so X is 200. How do I solve for X in such a situation? I was thinking that bilinear interpolation might hold the answer but nothing I've been able to find on that has made it clear how I'd go about this sort of thing, because they all seem to be structured for somewhat different problems.

Thank you for your help. Note that while I have obviously graphed the above data in R to make it easy to see what I'm talking about, the final work for this is in Javascript and PHP. I'm not looking for something heavy duty; simple is better.

Answer

nucleon picture nucleon · Dec 10, 2013

Good lord, I finally figured it out. Here's the end result:

The final product

Beautiful! But what a lot of work it was.

My code is too cobbled and too specific to my project to be of much use to anyone else. But here's the underlying logic.

You have to have two sets of data to interpolate from. I am calling these the "outer" curve and the "inner" curve. The "outer" curve is assumed to completely encompass, and not intersect with, the "inner" curve. The curves are really just sets of X,Y data, and correspond to a set of values defined as Z. In the example used here, the "outer" curve corresponds to Z = 50 and the "inner" curve corresponds to Z = 100.

The goal, just to reiterate, is to find X for any given Y where Z is some number in between our known points of data.

  1. Start by figuring out the percentage between the two curve sets that the unknown Z represents. So if Z=75 in our example then that works out to be 0.5. If Z = 60 that would be 0.2. If Z = 90 then that would be 0.8. Call this proportion P.

  2. Select the data point on the "outer" curve where Y = your desired Y. Imagine a line segment between that point and 0,0. Define that as AB.

  3. We want to find where AB intersects with the "inner" curve. To do this, we iterate through each point on the inner curve. Define the line segment between the chosen point and the point+1 as CD. Check if AB and CD intersect. If not, continue iterating until they do.

  4. When we find an AB-CD intersection, we now look at the line created by the intersection and our original point on the "outer" curve from step 2. This line segment, then, is a line between the inner and outer curve where the slope of the line, were it to be continued "down" the chart, would intersect with 0,0. Define this new line segment as EF.

  5. Find the position at P percent (from step 1) of the length of EF. Check the Y value. Is it our desired Y value? If it is (unlikely), return the X of that point. If not, see if Y is less than the goal Y. If it is, store the position of that point in a variable, which I'll dub lowY. Then go back to step 2 again for the next point on the outer curve. If it is greater than the goal Y, see if lowY has a value in it. If it does, interpolate between the two values and return the interpolated X. (We have "boxed in" our desired coordinate, in other words.)

The above procedure works pretty well. It fails in the case of Y=0 but it is easy to do that one since you can just do interpolation on those two specific points. In places where the number of sample is much less, it produces kind of jaggy results, but I guess that's to be expected (these are Z = 5000,6000,7000,8000,9000,10000, where only 5000 and 10000 are known points and they have only 20 datapoints each — the rest are interpolated):

Jaggy results

I am under no pretensions that this is an optimized solution, but solving for gobs of points is practically instantaneous on my computer so I assume it is not too taxing for a modern machine, at least with the number of total points I have (30-50 per curve).

Thanks for everyone's help; it helped a lot to talk this through a bit and realize that what I was really going for here was not any simple linear interpolation but a kind of "radial" interpolation along the curve.