How to use Geospatial data analytics in Exasol

12 Nov 2019 | Share

Geospatial data

How to use Geospatial data analytics in Exasol

The chances are your business is demanding, or in need of, richer insights from its data. Advanced analytics holds the key to these insights, which is why we ran a technical deep dive session at Tableau Conference 2019 to show you the ropes.

We decided to use Geospatial data as our entry point, and since many of us spend so much time in taxis, the analysis centred around Uber and taxi pickups in NYC. This blog will give you a quick breakdown of how to get started on this yourself and run geospatial data analytics in Exasol.

What is geospatial data?

Geospatial data at a high-level is information that identifies geographic features, locations and boundaries on Earth. You can store and analyze geographical information by using geospatial data in Exasol. So, where do you start?

How to get set up in Exasol

Points, LineStrings and areas such as Polygons are defined through coordinates and saved as GEOMETERY objects.

These GEOMETRY columns can have a spatial reference system which are universal means for defining real world locations. For example, the spherical coordinate system which is the standard the US department of defense uses for geospatial information, or the Mercator coordinate system which is what Google maps uses. The diagram below shows that we have numerous objects that are under the GEOMETRY tree, ranging from points, linestrings to polygons and more.

In Exasol we ship the database with native support for these geometry objects and a wide array of functions that allow us to execute operations and calculations on these geospatial objects. A few of these functions are highlighted in the table below and you can find more detailed information on our documentation page.

 

General Functions
ST_CONTAINS(g,g) Defines whether the first object fully contains the second one
ST_DISTANCE(g,g) Minimal distance between two geospatial objects
ST_INTERSECTS(g,g) Defines whether an intersection of two geospatial objects exists
ST_OVERLAPS(g,g) Defines whether two geospatial objects overlap.

 

To put this to practical use, we can look into how we use Geospatial functions in Exasol to discover more about Uber and Taxi rides. We have selected a couple of main data sets to do this. One for Uber pickups in NYC and one for taxi pickups there.

We have geospatial objects that define Burroughs, neighborhoods and streets in the NY area and we can use geospatial functions to analyze and visualize this data. So, let’s go through the process.

Step 1

Store geo-data in GEOMETRY type. This can be: POINT, LINESTRING, LINEARRING, POLYGON, MULTIPOLYGON, GEOMETRYCOLLECTION, MULTILINESTRING, MULTIPOINT.

Step 2

Use established spatial reference systems (SRID) to transform geometries.

Spatial Reference Systems are defined in EXA_SPATIAL_REF_SYS and consist of a SRID as well as projection parameters used by the proj4 library to transform between systems.

GPS coordinates are e.g. stored in WGS84 coordinate format which is defined by SRID 4326. To use coordinates on maps or to calculate distance in meters used SRID 3857 for (Google) Mercator projection.

To transform between systems use ST_TRANSFORM.

Step 3

Use well-known geospatial functions to retrieve, manipulate, measure, test, compare geo-spatial objects.

  • For a detailed description of spatial functions see Section 2.4.2. in the user manual
  • Retrieve:
    • ST_X, ST_Y, ST_ENDPOINT, ST_POINTN, ST_CENTROID,…
  • Manipulate:
    • ST_UNION, ST_TRANSFORM, ST_SETSRID,…
  • Measure:
    • ST_AREA, ST_LENGTH, ST_NUMPOINTS, …
  • Test:
    • ST_ISRING, ST_ISSIMPLE, ST_ISEMPTY,…
  • Compare:
    • ST_DISTANCE, ST_CROSSES, ST_TOUCHES, ST_DISJOINT,…

Step 4

Create Geo-Indices for fast computation: R-tree with version 6.1 for the following functions: ST_CONTAINS, ST_CROSSES, ST_EQUALS, ST_INTERSECTS, ST_OVERLAPS, ST_TOUCHES, ST_WITHIN.

R-trees store Minimal Bounding Rectangles (MBR) around geometries and build a hierarchy of those. Operations like those mentioned above can work on those generalizations and do easy pruning (best case right underneath the root node) to avoid costly geometry-to-geometry comparisons.

Well-known standards: naming, parameters, and functionality of geo-spatial features follows the specification of the Open Geospatial Consortium (OGC)  Simple Feature Access Part 2: SQL Option
http://www.opengeospatial.org/standards/sfs

Now you can see how quickly you can tap into new insights from spatial data with Exasol, you can try it yourself here. We will also be running a full webinar to take you through this demo in greater detail – if you want to find out more about advanced analytics and pose any questions to us, secure your place by registering here.

 

 

 

 

10 trends impacting data analytics

Now that we’re well and truly in the age of data, what’s coming next? 

FREE WHITE PAPER