Elastic Search Geo Point and Geo Shape Queries Explained

Eresh Gorantla
Geek Culture
Published in
10 min readSep 23, 2021

--

This story focuses on Geo-Spatial data manipulation similar to Postgis extension for PostgreSQL, Mongo DB, and Redis. In this article, we’ll look at Elasticsearch’s Geo queries, how you can set up mappings and indices, and provide you with some examples of how you can query your data.

Geo Data In Elastic Search

Elasticsearch allows you to represent GeoData in two ways, geo_shape, and geo_point.

Geo Point allows you to store data as latitude and longitude coordinate pairs. Use this field type when you want to filter data for distances between points, search within bounding boxes, or when using aggregations. There are a lot of features and options that you can specify which are beyond the scope of this article. We’ll cover a couple here, but you can view the options for Geo Bounding Box, Geo Distance, and Geo Aggregations in Elasticsearch’s documentation.

Use Geo-Shape when you have GeoData that represents a shape, or when you want to query points within a shape. geo_shape data must be encoded in GeoJSON format which is converted into strings representing long/lat coordinate pairs on a grid of Geohash cells. Since Elasticsearch indexes shape as terms, it's simple for it to determine the relationships between shapes, which can be queried using intersects, disjoint, contains, or within query spatial relation operators.

Unfortunately, geo-point and geo-shape cannot be queried together. For example, if you want to get all the cities within a specified polygon, you cannot use cities that are indexed with geo-point. They must be indexed using an "type": "Point" in GeoJSON and indexed as geo-shape.

Geo Point Field Type

Fields of type geo_point accept latitude-longitude pairs, which can be used

We can store Geo Point in five different ways

Mapping of Geo Point

PUT location_index
{
"mappings": {
"properties": {
"text" : {
"type" : "text"
},
"location": {
"type": "geo_point"
}
}
}
}

Geo Point as Object: An object can be used with properties as lat and lon .

PUT location_index/_doc/1
{
"text": "Geopoint as an object",
"location": {
"lat": 41.12,
"lon": -71.34
}

}

Geo Point as String: A plain string can be used separated by “,” with the format of lat, lon .

PUT location_index/_doc/1
{
"text": "Geopoint as a string",
"location": "41.12,-71.34"
}

Geo Point as Geo Hash: A hash value is used to represent lat and lon . There is an online website to do so.

PUT location_index/_doc/3
{
"text": "Geopoint as a geohash",
"location": "drm3btev3e86"
}

Geo Point as an array: The coordinates can be represented in the form of an array [lon, lat] with values as double .

PUT location_index/_doc/4
{
"text": "Geopoint as an array",
"location": [ -71.34, 41.12 ]
}

Geo Point as WKT Point: The coordinates can be represented in the form of a function POINT(lon lat) .

PUT location_index/_doc/5
{
"text": "Geopoint as a WKT POINT primitive",
"location" : "POINT (-71.34 41.12)"
}

No matter whatever the format the Geo point is saved we can query against other formats too. But be careful with defining the format correctly. Do not replace with lat and lon values. That can give un precended values.

Geo Shape Field Type

The geo_shape datatype facilitates the indexing of and searching with arbitrary geo shapes such as rectangles and polygons. It should be used when either the data being indexed or the queries being executed contain shapes other than just points.

You can query documents using this type using a geo_shape query.

Mapping File

PUT /example
{
"mappings": {
"properties": {
"location": {
"type": "geo_shape"
}
}
}
}

Geo Json Type POINT: A single geographic coordinate. Note: Elasticsearch uses WGS-84 coordinates only.

POST /example/_doc
{
"location" : {
"type" : "point",
"coordinates" : [-77.03653, 38.897676]
}
}

Geo Json Type LINESTRING: An arbitrary line given two or more points.

POST /example/_doc
{
"location" : {
"type" : "linestring",
"coordinates" : [[-77.03653, 38.897676], [-77.009051, 38.889939]]
}
}

Geo Json Type POLYGON: A closed polygon whose first and last point must match, thus requiring n + 1 vertices to create an n-sided polygon and a minimum of 4 vertices.

POST /example/_doc
{
"location" : {
"type" : "polygon",
"coordinates" : [
[ [-77.03653, 38.897676], [-77.03653, 37.897676], [-76.03653, 38.897676], [-77.03653, 38.997676], [-77.03653, 38.897676] ]
]
}
}

Geo Json Type MULTIPOLYGON: An array of separate polygons.

POST /example/_doc
{
"location" : {
"type" : "multipolygon",
"coordinates" : [
[ [[-80.0, 40.0], [-79.0, 40.0], [-79.0, 39.0], [-80.0, 39.0], [-80.0, 40]] ],
[ [ [-78.0, 38.0], [-79.0, 38.0], [-79.0, 39.0], [-78.0, 39.0], [-78.0, 38.0] ],
[ [-78.2, 38.2], [-78.8, 38.2], [-78.2, 38.8], [-78.8, 38.8], [-78.2, 38.2] ] ]
]
}
}

Geo Json Type MULTIPOINT: An array of unconnected, but likely related points.

POST /example/_doc
{
"location" : {
"type" : "multipoint",
"coordinates" : [
[-78.0, 38.0], [-79.0, 38.0]
]
}
}

Geo Json Type MULTILINESTRING: An array of separate line strings.

POST /example/_doc
{
"location" : {
"type" : "multilinestring",
"coordinates" : [
[ [-77.03, 38.89], [-78.03, 38.89], [-78.03, 39.89], [-78.03, 39.89] ],
[ [-76.03, 36.89], [-77.03, 36.89], [-77.03, 37.89], [-76.03, 37.89] ],
[ [-76.23, 36.69], [-76.03, 36.89], [-76.23, 36.89], [-76.23, 36.09] ]
]
}
}

Geo Json Type GEOMETRYCOLLECTION: A GeoJSON shape is similar to the multi* shapes except that multiple types can coexist (e.g., a Point and a LineString).

POST /example/_doc
{
"location" : {
"type": "geometrycollection",
"geometries": [
{
"type": "point",
"coordinates" : [-77.03653, 38.897676]
},
{
"type": "linestring",
"coordinates" : [[-77.03653, 38.897676], [-77.009051, 38.889939]]
}
]
}
}

Geo Json Type BBOX (ENVELOPE in Elastic Search): A Bounding rectangle, or envelope, specified by specifying only the top left and bottom-right points.

POST /example/_doc
{
"location" : {
"type" : "envelope",
"coordinates" : [ [-77.03653, 38.897676], [-76.03653, 37.897676] ]
}
}

By using geo_point or geo_shape, Elasticsearch will automatically find the coordinates, validate them according to the needed format, and index them.

To Load Data on to Elastic Search

The data that we’ll be using for this walkthrough is taken from the Washington State Department of Transportation (WSDOT) GeoData Catalogue. Download the shapefiles for “City Points” and “WSDOT Regions 24k”. City Points will give us the cities in Washington, while WSDOT Regions will provide us with regions designated by WSDOT. You can view the data before downloading by clicking View next to their download link. I have converted the shapefiles to GeoJSON format.

I have created a node js application to create index and load data. Please follow the steps from the Github link and do-follow the README file to load data.

Geo POINT Queries

Elasticsearch uses the terms queries and filters. Querying relies on “scoring”, or if and how well a document matches the query. Filtering, on the other hand, is “non-scoring” and determines if the document matches a query. According to Elasticsearch, as of 2.x querying and filtering have become synonymous in that you can have queries that are both scoring and non-scoring. There are various performance benefits and drawbacks to using scoring or non-scoring queries, but the rule-of-thumb is to use scoring queries when a relevance score is important, and non-scoring queries for everything else.

Since we have some data in our indices, it’s time to start querying. We’ll look at some of the basic queries that we can use for geo_point and geo_shape.

Distance with Geo Point

To get the distance between any two points, our data must be stored using the geo_point type. The documentation provides various data formats as examples. Matches geo_point and geo_shape values within a given distance of a GeoPoint. The following query list outs locations that are in a distance of 10 miles.

GET geo_cities_point/_search
{
"query": {
"bool": {
"must": {
"match_all": {}
},
"filter": {
"geo_distance": {
"distance": "10mi",
"location": [
-122.3375,
47.6112
]
}
}
}
}
}

Geo distance range Query

The geo_distance_range query has been removed. Instead use the Geo Distance Query with pagination or the Geo Distance Aggregation depending on your needs.

Geo Distance Aggregation

A multi-bucket aggregation that works on geo_point fields and conceptually works very similarly to the range aggregation. The user can define a point of origin and a set of distance range buckets. The aggregation evaluates the distance of each document value from the origin point and determines the buckets it belongs to based on the ranges (a document belongs to a bucket if the distance between the document and the origin falls within the distance range of the bucket).

Sometimes we need to know the number of coordinates that are in a range. This is an aggregate function to list down the results. Now we will try to find the coordinates that are till 10 MI , from 10 MI to 50 MI , from 50 MI to 100 MI , and from 100 MI . This should return the number of documents matching within the range.

GET geo_cities_point/_search?size=0
{
"aggs" : {
"data_around_city" : {
"geo_distance" : {
"unit": "mi",
"field" : "location",
"origin" : "47.6112, -122.3375",
"ranges" : [
{ "to" : 10 },
{"from" : 10, "to" : 50},
{"from" : 50, "to" : 100},
{"from" : 100}
]

}
}
}
}

The response would be

{
"aggregations" : {
"rings_around_amsterdam" : {
"buckets" : [
{
"key" : "*-10.0",
"from" : 0.0,
"to" : 10.0,
"doc_count" : 12
},
{
"key" : "10.0-50.0",
"from" : 10.0,
"to" : 50.0,
"doc_count" : 77
},
{
"key" : "50.0-100.0",
"from" : 50.0,
"to" : 100.0,
"doc_count" : 57
},
{
"key" : "100.0-*",
"from" : 100.0,
"doc_count" : 135
}
]
}
}
}

Geo Polygon with Geo Point

A query returning hits that only fall within a polygon of points.

GET geo_cities_point/_search
{
"query": {
"bool": {
"must": {
"match_all": {}
},
"filter": {
"geo_polygon": {
"location": {
"points": [
[
-122.35610961914062,
47.70514099299205
],
[
-122.48519897460936,
47.5626274374099
],
[
-122.28744506835938,
47.44852243794931
],
[
-122.15972900390624,
47.558920607496525
],
[
-122.2283935546875,
47.719001413201916
],
[
-122.35610961914062,
47.70514099299205
]
]
}
}
}
}
}
}

Geo Bounding Box with Geo Point

Matches geo_point and geo_shape values that intersect a bounding box. When GeoHashes are used to specify the bounding of the edges of the bounding box, the GeoHashes are treated as rectangles. The bounding box is defined in such a way that its top-left corresponds to the top left corner of the GeoHash specified in the top_left parameter and its bottom right is defined as the bottom right of the GeoHash specified in the bottom_right parameter.

Geopoints have limited precision and are always rounded down during index time. During the query time, upper boundaries of the bounding boxes are rounded down, while lower boundaries are rounded up. As a result, the points along on the lower bounds (bottom and left edges of the bounding box) might not make it into the bounding box due to the rounding error. At the same time points alongside the upper bounds (top and right edges) might be selected by the query even if they are located slightly outside the edge. The rounding error should be less than 4.20e-8 degrees on the latitude and less than 8.39e-8 degrees on the longitude, which translates to less than 1cm error even at the equator.

GET geo_cities_point/_search
{
"query": {
"bool": {
"must": {
"match_all": {}
},
"filter": {
"geo_bounding_box": {
"location": {
"top_left": {
"lat": 47.7328,
"lon": -122.448
},
"bottom_right": {
"lat": 47.468,
"lon": -122.0924
}
}
}
}
}
}
}

Geo Shape Queries

All geo-shape queries require your data to be mapped using the geo_shape mapping. Using geo-shapes we can find documents that intersect with the query shape.

Geo Shape Query

Filter documents indexed using the geo_shape or geo_point type.Requires the geo_shape mapping or the geo_point mapping.

The geo_shape the query uses the same grid square representation as to the geo_shape mapping to find documents that have a shape that intersects with the query shape. It will also use the same Prefix Tree configuration as defined for the field mapping. The query supports two ways of defining the query shape, either by providing a whole shape definition or by referencing the name of a shape pre-indexed in another index. Both formats are defined below with examples.

Spatial Relations

The geo_shape strategy mapping parameter determines which spatial relation operators may be used at search time.

The following is a complete list of spatial relation operators available when searching a geo field:

  • INTERSECTS - (default) Return all documents whose geo_shape or geo_point field intersects the query geometry.
  • DISJOINT - Return all documents whose geo_shape or geo_point the field has nothing in common with the query geometry.
  • WITHIN - Return all documents whose geo_shape or geo_point the field is within the query geometry. Line geometries are not supported.
  • CONTAINS - Return all documents whose geo_shape or geo_point the field contains the query geometry.
GET geo_cities_shapes/_search
{
"query": {
"bool": {
"must": {
"match_all": {}
},
"filter": {
"geo_shape": {
"location": {
"shape": {
"type": "envelope",
"coordinates": [ [ -122.35610961914062, 47.70514099299205 ], [ -122.2283935546875, 47.019001413201916 ] ]
},
"relation": "disjoint"
}
}
}
}
}
}

Pre-Indexed Shape

The query also supports using a shape that has already been indexed in another index. This is particularly useful when you have a pre-defined list of shapes and you want to reference the list using a logical name (for example New Zealand) rather than having to provide coordinates each time. In this situation, it is only necessary to provide:

  • id - The ID of the document that containing the pre-indexed shape.
  • index - Name of the index where the pre-indexed shape is. Defaults to shapes.
  • path - The field is specified as a path containing the pre-indexed shape. Defaults to shape.
  • routing - The routing of the shape document if required.
PUT shapes
{
"mappings": {
"properties": {
"geometry": {
"type": "geo_shape"
}
}
}
}
PUT shapes/_doc/test
{
"location": {
"type": "envelope",
"coordinates" : [[ -122.35610961914062, 47.70514099299205 ], [ -122.2283935546875, 47.019001413201916 ]]
}
}
GET geo_cities_shapes/_search
{
"query": {
"bool": {
"filter": {
"geo_shape": {
"location": {
"indexed_shape": {
"index": "shapes",
"id": "test",
"path": "location"
}
}
}
}
}
}
}

--

--

Eresh Gorantla
Geek Culture

Experience in Open source stack, microservices, event-driven, analytics. Loves Cricket, cooking, movies and travelling.