As a technologist at HumanGeo, you're often asked to perform some kind of analysis on geospatial data, and quickly! We frequently work on short turnaround times for our customers so anything that gives us a boost is welcome, which is probably why so many of us love Python. As evidenced by the volume of scientific talks at PyCon 2014, we can also lean on the great work of the scientific community. Python lets us go from zero to answer within hours or days, not weeks.
I recently had to do some science on the way we can observe clusters of points on the map - to show how regions of social significance emerge. Luckily I was able to lean heavily on Shapely which is a fantastic Python library for performing geometric operations on points, shapes, lines, etc. As an aside, if you are doing any sort of geospatial work with Python, you'll want to
pip install shapely. Once we found a cluster of points which we believed were identifying a unique region, we needed to draw a boundary around the region so that it could be more easily digested by a geospatial analyst. Boundaries are just polygons that enclose something, so I'll walk through some of your options and attempt to provide complete code examples.
The first step towards geospatial analysis in Python is loading your data. In the example below, I have a shapefile containing a number of points which I generated manually with QGIS. I'll use the fiona library to read the file in, and then create point objects with shapely.
The points list can now be manipulated with Shapely. First, let's plot the points to see what we have.
We can now interrogate the collection. Many shapely operations result in a different kind of geometry than the one you're currently working with. Since our geometry is a collection of points, I can instantiate a MultiPoint, and then ask that MultiPoint for its envelope, which is a Polygon. Easily done like so:
We should take a look at that envelope. matplotlib can help us out, but polygons aren't functions, so we need to use PolygonPatch.
So without a whole lot of code, we were able to get the envelope of the points, which is the smallest rectangle that contains all of the points. In the real world, boundaries are rarely so uniform and straight, so we were naturally led to experiment with the convex hull of the points. Convex hulls are polygons drawn around points too - as if you took a pencil and connected the dots on the outer-most points. Shapely has convex hull as a built in function so let's try that out on our points.
A tighter boundary, but it ignores those places in the "H" where the points dip inward. For many applications, this is probably good enough but we wanted to explore one more option which is known as a concave hull or alpha shape. At this point we've left the built-in functions of Shapely and we'll have to write some more code. Thankfully, smart people like Sean Gillies, the author of Shapely and fiona, have done the heavy lifting. His post on the fading shape of alpha gave me a great place to start. I had to fill in some gaps that Sean left so I'll recreate the entire working function here.
That's a mouthful, but the gist is that we are going to compute Delaunay triangles which establish a connection between each point and nearby points and then we remove some of the triangles that are too far from their neighbors. This removal part is key. By identifying candidates for removal we are saying that these points are too far from their connected points so don't use that connection as part of the boundary. The result looks like this.
Better, but not great.
It turns out that the alpha value and the scale of the points matters a lot when it comes to how well the Delaunay triangulation method will work. You can usually play with the alpha value to find a suitable response, but unless you can scale up your points it might not help. For the sake of a good example, I'll do both: scale up the "H" and try some different alpha values.
To get more points, I opened up QGIS, drew an "H" like polygon, used the tool to generate regular points, and then spatially joined them to remove any points outside the "H". My new dataset looks like this:
When we try the alpha shape transformation on these points we get a much more satisfying boundary. We can try a few permutations to find the best alpha value for these points with the following code. I combined each plot into an animated gif below.
So in this case, alpha of about 0.4 looks pretty good. We can use shapely's buffer operation to clean up that polygon a bit and smooth out any of the jagged edges.
And there you have it. Hopefully this has been a useful tour through some of your geometric boundary options in python. I recommend exploring the Shapely manual to find out about all of the other easy geometric operations you have at your fingertips. Also, if you dig Python and playing with maps, we want to hear from you.