Clusterpolate¶
Inter- and Extrapolation for Scattered Data
Traditional approaches for inter- and extrapolation of scattered data work on a filled rectangular area surrounding the data points or in their filled convex hull. However, scattered data often consists of different clusters of irregular shapes and usually contains areas where there simply is no data. Forcing such data into a traditional inter- or extrapolation scheme often does not lead to the desired results.
Heatmaps, on the other hand, deal well with scattered data but often do not provide real interpolation: Instead they usually use raw sums of kernel functions which overestimate the target value in densely populated areas.
Clusterpolation is a hybrid inter- and extrapolation scheme to fix this. It uses kernel functions for a weighted inter- and extrapolation of local values, as well as for a density estimation of the data. The latter is used to assign a membership degree to clusterpolated points: Points with a low membership degree lie in an area where there’s just not enough data.
Installation¶
The clusterpolate
package is available from PyPI and can be installed via
pip:
pip install clusterpolate
Quickstart¶
Use image()
to generate images using clusterpolated data:
import numpy as np
from matplotlib.cm import summer
import matplotlib.pyplot as plt
from clusterpolate import image
# Generate some data
n = 500
angles = np.random.normal(0, 0.75, n) - 0.2 * np.pi
radii = np.random.normal(1, 0.05, n)
points = np.vstack((radii * np.sin(angles),
radii * np.cos(angles))).T
values = np.sin(angles) + np.random.normal(0, 0.5, n)
size = (500, 500)
area = ((-1.5, 1.5), (1.5, -1.5))
# Plot raw data
plt.scatter(points[:, 0], points[:, 1], c=values, cmap='summer')
plt.axis('equal')
plt.axis([-1.5, 1.5, -1.5, 1.5])
plt.gca().set_aspect('equal', adjustable='box')
plt.axis('off')
plt.show()
# Generate clusterpolated image
img = image(points, values, size, area, radius=0.2,
colormap=summer)[3]
img.save('clusterpolated.png')
Raw data:

Clusterpolated data:

Note how the values are cleanly interpolated even within dense regions and how extrapolation only occurs close to existing data points.
Of course you can also use clusterpolation on your data without generating any images: simply use clusterpolate()
.
API Reference¶
-
clusterpolate.
bounding_box
(points)[source]¶ Compute a point cloud’s bounding box.
points
is a list or array of 2D points.The return value is a 2x2 tuple containing the upper left and the lower right bounding box corners.
-
clusterpolate.
bump
(r)[source]¶ Factory for bump kernel functions.
r
is the radius of the bump function.The returned bump function assumes that all values in the input vector are non-negative.
-
clusterpolate.
clusterpolate
(points, values, targets, radius=1, kernel_factory=<function bump>, neighbors=None, num_jobs=None)[source]¶ Clusterpolate data.
points
(array-like) are the data points andvalues
(array-like) are the associated values.targets
(array-like) are the points at which the data should be clusterpolated.radius
(float) is the radius of each data point’s kernel.kernel_factory
is a function that takes a radius and returns a corresponding kernel function. The kernel function must accept an array of distances (>= 0) and return the corresponding kernel values. The kernel function must be normalized (a distance of 0 must yield a value of 1) and it should be zero for distances greater thanradius
.Neighbor lookup is done using an instance of
sklearn.neighbors.NearestNeighbors
, constructed with the default options. You can pass an instance that is configured to suit your data via theneighbors
parameter.By default, computations are parallelized according to the number of available CPUs. Set
num_jobs
to a specific number to use more or fewer parallel processes.Returns two arrays. The first contains the predicted value for the corresponding target point, and the second contains the target point’s degree of membership (a float between 0 and 1).
-
clusterpolate.
image
(points, values, size, area=None, normalize=True, colormap=None, **kwargs)[source]¶ Create an image for clusterpolated data.
points
andvalues
is the input data, seeclusterpolate()
.size
is a 2-tuple containing the image dimensions.area
is an optional 2-tuple of 2-tuples, specifying the top-left and bottom-right corner of the sampling area. If it is not given then the points’ bounding box is used.If
normalize
is true then the clusterpolated values are normalized to the range[0, 1]
. If you set this toFalse
you should ensure that input values are already in that range.colormap
is an optional callback that can be used to color the clusterpolated values. It should accept values in a 2D array and return the corresponding colors in an array of the same shape but with an extra dimension containing the RGB components (between 0 and 1). The colormaps frommatplotlib.cm
are a good choice. If no colormap is given then a grayscale image is generated.Any additional keyword-argument is passed on to
clusterpolate()
.This function returns 4 values: The first 3 are arrays containing the pixel coordinates, the clusterpolated values, and the membership degrees. The last one is the generated image as an instance of
PIL.Image.Image
. Note that the predictions are returned unnormalized.
Development¶
The code for this package can be found on GitHub. It is available under the MIT license.
History¶
0.2.0: |
|
---|---|
0.1.0: |
|