I have a set of data points and am curious if the data represents a linear function or a logarithmic function.
The data set is 2 dimensional.
Let's say an ideal set of data points followed the function f(x) = x. If I plotted the data point I would be able to tell it is linear.
Similarly if the data points followed the function f(x) = log(x), I would be able to visually tell it is logarithmic.
On the other hand, having the program determine if a set of data is linear or logarithmic is nontrivial. How would I approach this?
One option would be to do a linear regression on the data set to get a best-fit line. If the data is linear, you'll get a very good fit and the mean squared error should be low. Otherwise, you'll get an okay fit and a reasonable error.
Alternatively, you could consider transforming the data set by converting each point (x0, x1, ..., xn, y) to (x0, x1, ..., xn, ey). If the data was linear, now it will be exponential, and if the data was logarithmic, now it will be linear. Running a linear regression and getting the mean-squared error now will have a low error for the logarithmic data and a staggeringly huge error for the linear data, since the exponential function blows up extremely quickly.
To actually implement the regression, one option would be to use a least-squares regression. This would have the added benefit of giving you a correlation coefficient in addition to the model, which could also be used to distinguish between the two data sets.
Because you've asked for how to do this in Java, a quick Google search turned up this Java code to do a linear regression. However, you might have a better fit in a language like Matlab that is specifically optimized to do these sorts of queries. For example, in Matlab, you can do this regression in one line of code by writing
linearFunction = inputs / outputs
Hope this helps!