Loading Spatial Data into Azure SQL Part 2
I left off my last post uploading the polygons to Azure SQL, which took just a few hours. My next step was to be able to retrieve the LGA (LGA name) from the database, given a set of coordinates. This is a fairly easy task with the cool spatial datatypes and functions available in SQL (well, fairly easy).
The main SQL query I initially used was the following:
-- declare a geometry datatype variable
declare @p geometry;
-- set the variable to a point (in this example the centre of Blacktown)
set @p = geometry::STPointFromText('POINT(150.908493 -33.769464)', 4283);
-- set the top 1 record where the record's geom data is valid, and the distance between it and the point @p, order it by the distance
select TOP(1) * FROM db.lga WHERE geom.STIsValid()=1 AND geom.STDistance(@p) IS NOT NULL
ORDER BY geom.STDistance(@p)
This query is rather inefficient probably, and theoretically an intersection algorithm would prove faster (for a human anyway), so I've recently used this one (it's pretty self explanatory):
declare @p geometry;
set @p = geometry::STPointFromText('POINT(150.9625421 -33.7293265)', 4283);
select * FROM dbo.lga WHERE geom.STIsValid()=1 AND geom.STIntersects(@p)=1
Both provide an average read time of 5 seconds per query, and work incredibly accurately (to my astonishment!). 5 seconds is both amazing for the human brain considering it's going through (at worst) 500 distance calculations, and also pretty shit for responsiveness - we want to use this query to eventually populate a mobile application with data and no user wants to wait 5 seconds for a best case bandwidth transaction.
I've tried creating some indexes on the spatial data to speed this up, but to no avail so far. It would require wiping my database and rewriting it or taking it offline to make changes - you need to use a clustered index on the spatial data and furthermore, Azure SQL requires and provisions for exactly one clustered index at all times. This clustered index has already been set and according to various forums I've been trawling, is impossible to hot swap, which is irritating. Azure SQL's docs have been pretty shit at that.
It's been interesting briefly playing around with some of this coordinate based stuff, and I'd like to get into it more now that this uni project is over and I have time to experiment and mess things up. There are obviously more optimisations that can be implemented, especially at the application level.