Elastic Search Geo Point and Geo Shape Queries Explained
This story focuses on Geo-Spatial data manipulation similar to Postgis extension for PostgreSQL, Mongo DB, and Redis. In this article, we’ll look at Elasticsearch’s Geo queries, how you can set up mappings and indices, and provide you with some examples of how you can query your data.
Geo Data In Elastic Search
Elasticsearch allows you to represent GeoData in two ways, geo_shape, and geo_point.
Geo Point allows you to store data as latitude and longitude coordinate pairs. Use this field type when you want to filter data for distances between points, search within bounding boxes, or when using aggregations. There are a lot of features and options that you can specify which are beyond the scope of this article. We’ll cover a couple here, but you can view the options for Geo Bounding Box, Geo Distance, and Geo Aggregations in Elasticsearch’s documentation.
Use Geo-Shape when you have GeoData that represents a shape, or when you want to query points within a shape. geo_shape
data must be encoded in GeoJSON format which is converted into strings representing long/lat coordinate pairs on a grid of Geohash cells. Since Elasticsearch indexes shape as terms, it's simple for it to determine the relationships between shapes, which can be queried using intersects
, disjoint
, contains
, or within
query spatial relation operators.
Unfortunately, geo-point and geo-shape cannot be queried together. For example, if you want to get all the cities within a specified polygon, you cannot use cities that are indexed with geo-point. They must be indexed using an "type": "Point"
in GeoJSON and indexed as geo-shape
.
Geo Point Field Type
Fields of type geo_point
accept latitude-longitude pairs, which can be used
- To find Geo points within a bounding box, within a certain distance of a central point, or within a polygon or within a
geo_shape
query. - to aggregate documents geographically or by distance from a central point.
- to integrate distance into a document’s relevance score.
- to sort documents by distance.
We can store Geo Point in five different ways
Mapping of Geo Point
PUT location_index
{
"mappings": {
"properties": {
"text" : {
"type" : "text"
},
"location": {
"type": "geo_point"
}
}
}
}
Geo Point as Object: An object can be used with properties as lat
and lon
.
PUT location_index/_doc/1
{
"text": "Geopoint as an object",
"location": {
"lat": 41.12,
"lon": -71.34
}
}
Geo Point as String: A plain string can be used separated by “,” with the format of lat, lon
.
PUT location_index/_doc/1
{
"text": "Geopoint as a string",
"location": "41.12,-71.34"
}
Geo Point as Geo Hash: A hash value is used to represent lat
and lon
. There is an online website to do so.
PUT location_index/_doc/3
{
"text": "Geopoint as a geohash",
"location": "drm3btev3e86"
}
Geo Point as an array: The coordinates can be represented in the form of an array [lon, lat]
with values as double
.
PUT location_index/_doc/4
{
"text": "Geopoint as an array",
"location": [ -71.34, 41.12 ]
}
Geo Point as WKT Point: The coordinates can be represented in the form of a function POINT(lon lat)
.
PUT location_index/_doc/5
{
"text": "Geopoint as a WKT POINT primitive",
"location" : "POINT (-71.34 41.12)"
}
No matter whatever the format the Geo point is saved we can query against other formats too. But be careful with defining the format correctly. Do not replace with
lat
andlon
values. That can give un precended values.
Geo Shape Field Type
The geo_shape
datatype facilitates the indexing of and searching with arbitrary geo shapes such as rectangles and polygons. It should be used when either the data being indexed or the queries being executed contain shapes other than just points.
You can query documents using this type using a geo_shape
query.
Mapping File
PUT /example
{
"mappings": {
"properties": {
"location": {
"type": "geo_shape"
}
}
}
}
Geo Json Type POINT: A single geographic coordinate. Note: Elasticsearch uses WGS-84 coordinates only.
POST /example/_doc
{
"location" : {
"type" : "point",
"coordinates" : [-77.03653, 38.897676]
}
}
Geo Json Type LINESTRING: An arbitrary line given two or more points.
POST /example/_doc
{
"location" : {
"type" : "linestring",
"coordinates" : [[-77.03653, 38.897676], [-77.009051, 38.889939]]
}
}
Geo Json Type POLYGON: A closed polygon whose first and last point must match, thus requiring n + 1
vertices to create an n
-sided polygon and a minimum of 4
vertices.
POST /example/_doc
{
"location" : {
"type" : "polygon",
"coordinates" : [
[ [-77.03653, 38.897676], [-77.03653, 37.897676], [-76.03653, 38.897676], [-77.03653, 38.997676], [-77.03653, 38.897676] ]
]
}
}
Geo Json Type MULTIPOLYGON: An array of separate polygons.
POST /example/_doc
{
"location" : {
"type" : "multipolygon",
"coordinates" : [
[ [[-80.0, 40.0], [-79.0, 40.0], [-79.0, 39.0], [-80.0, 39.0], [-80.0, 40]] ],
[ [ [-78.0, 38.0], [-79.0, 38.0], [-79.0, 39.0], [-78.0, 39.0], [-78.0, 38.0] ],
[ [-78.2, 38.2], [-78.8, 38.2], [-78.2, 38.8], [-78.8, 38.8], [-78.2, 38.2] ] ]
]
}
}
Geo Json Type MULTIPOINT: An array of unconnected, but likely related points.
POST /example/_doc
{
"location" : {
"type" : "multipoint",
"coordinates" : [
[-78.0, 38.0], [-79.0, 38.0]
]
}
}
Geo Json Type MULTILINESTRING: An array of separate line strings.
POST /example/_doc
{
"location" : {
"type" : "multilinestring",
"coordinates" : [
[ [-77.03, 38.89], [-78.03, 38.89], [-78.03, 39.89], [-78.03, 39.89] ],
[ [-76.03, 36.89], [-77.03, 36.89], [-77.03, 37.89], [-76.03, 37.89] ],
[ [-76.23, 36.69], [-76.03, 36.89], [-76.23, 36.89], [-76.23, 36.09] ]
]
}
}
Geo Json Type GEOMETRYCOLLECTION: A GeoJSON shape is similar to the multi*
shapes except that multiple types can coexist (e.g., a Point and a LineString).
POST /example/_doc
{
"location" : {
"type": "geometrycollection",
"geometries": [
{
"type": "point",
"coordinates" : [-77.03653, 38.897676]
},
{
"type": "linestring",
"coordinates" : [[-77.03653, 38.897676], [-77.009051, 38.889939]]
}
]
}
}
Geo Json Type BBOX (ENVELOPE in Elastic Search): A Bounding rectangle, or envelope, specified by specifying only the top left and bottom-right points.
POST /example/_doc
{
"location" : {
"type" : "envelope",
"coordinates" : [ [-77.03653, 38.897676], [-76.03653, 37.897676] ]
}
}
By using
geo_point
orgeo_shape
, Elasticsearch will automatically find the coordinates, validate them according to the needed format, and index them.
To Load Data on to Elastic Search
The data that we’ll be using for this walkthrough is taken from the Washington State Department of Transportation (WSDOT) GeoData Catalogue. Download the shapefiles for “City Points” and “WSDOT Regions 24k”. City Points will give us the cities in Washington, while WSDOT Regions will provide us with regions designated by WSDOT. You can view the data before downloading by clicking View next to their download link. I have converted the shapefiles to GeoJSON format.
I have created a node js application to create index and load data. Please follow the steps from the Github link and do-follow the README file to load data.
Geo POINT Queries
Elasticsearch uses the terms queries and filters. Querying relies on “scoring”, or if and how well a document matches the query. Filtering, on the other hand, is “non-scoring” and determines if the document matches a query. According to Elasticsearch, as of 2.x querying and filtering have become synonymous in that you can have queries that are both scoring and non-scoring. There are various performance benefits and drawbacks to using scoring or non-scoring queries, but the rule-of-thumb is to use scoring queries when a relevance score is important, and non-scoring queries for everything else.
Since we have some data in our indices, it’s time to start querying. We’ll look at some of the basic queries that we can use for
geo_point
andgeo_shape
.
Distance with Geo Point
To get the distance between any two points, our data must be stored using the geo_point
type. The documentation provides various data formats as examples. Matches geo_point
and geo_shape
values within a given distance of a GeoPoint. The following query list outs locations that are in a distance of 10 miles.
GET geo_cities_point/_search
{
"query": {
"bool": {
"must": {
"match_all": {}
},
"filter": {
"geo_distance": {
"distance": "10mi",
"location": [
-122.3375,
47.6112
]
}
}
}
}
}
Geo distance range Query
The
geo_distance_range
query has been removed. Instead use the Geo Distance Query with pagination or the Geo Distance Aggregation depending on your needs.
Geo Distance Aggregation
A multi-bucket aggregation that works on
geo_point
fields and conceptually works very similarly to the range aggregation. The user can define a point of origin and a set of distance range buckets. The aggregation evaluates the distance of each document value from the origin point and determines the buckets it belongs to based on the ranges (a document belongs to a bucket if the distance between the document and the origin falls within the distance range of the bucket).
Sometimes we need to know the number of coordinates that are in a range. This is an aggregate function to list down the results. Now we will try to find the coordinates that are till 10 MI
, from 10 MI to 50 MI
, from 50 MI to 100 MI
, and from 100 MI
. This should return the number of documents matching within the range.
GET geo_cities_point/_search?size=0
{
"aggs" : {
"data_around_city" : {
"geo_distance" : {
"unit": "mi",
"field" : "location",
"origin" : "47.6112, -122.3375",
"ranges" : [
{ "to" : 10 },
{"from" : 10, "to" : 50},
{"from" : 50, "to" : 100},
{"from" : 100}
]
}
}
}
}
The response would be
{
"aggregations" : {
"rings_around_amsterdam" : {
"buckets" : [
{
"key" : "*-10.0",
"from" : 0.0,
"to" : 10.0,
"doc_count" : 12
},
{
"key" : "10.0-50.0",
"from" : 10.0,
"to" : 50.0,
"doc_count" : 77
},
{
"key" : "50.0-100.0",
"from" : 50.0,
"to" : 100.0,
"doc_count" : 57
},
{
"key" : "100.0-*",
"from" : 100.0,
"doc_count" : 135
}
]
}
}
}
Geo Polygon with Geo Point
A query returning hits that only fall within a polygon of points.
GET geo_cities_point/_search
{
"query": {
"bool": {
"must": {
"match_all": {}
},
"filter": {
"geo_polygon": {
"location": {
"points": [
[
-122.35610961914062,
47.70514099299205
],
[
-122.48519897460936,
47.5626274374099
],
[
-122.28744506835938,
47.44852243794931
],
[
-122.15972900390624,
47.558920607496525
],
[
-122.2283935546875,
47.719001413201916
],
[
-122.35610961914062,
47.70514099299205
]
]
}
}
}
}
}
}
Geo Bounding Box with Geo Point
Matches geo_point
and geo_shape
values that intersect a bounding box. When GeoHashes are used to specify the bounding of the edges of the bounding box, the GeoHashes are treated as rectangles. The bounding box is defined in such a way that its top-left corresponds to the top left corner of the GeoHash specified in the top_left
parameter and its bottom right is defined as the bottom right of the GeoHash specified in the bottom_right
parameter.
Geopoints have limited precision and are always rounded down during index time. During the query time, upper boundaries of the bounding boxes are rounded down, while lower boundaries are rounded up. As a result, the points along on the lower bounds (bottom and left edges of the bounding box) might not make it into the bounding box due to the rounding error. At the same time points alongside the upper bounds (top and right edges) might be selected by the query even if they are located slightly outside the edge. The rounding error should be less than 4.20e-8 degrees on the latitude and less than 8.39e-8 degrees on the longitude, which translates to less than 1cm error even at the equator.
GET geo_cities_point/_search
{
"query": {
"bool": {
"must": {
"match_all": {}
},
"filter": {
"geo_bounding_box": {
"location": {
"top_left": {
"lat": 47.7328,
"lon": -122.448
},
"bottom_right": {
"lat": 47.468,
"lon": -122.0924
}
}
}
}
}
}
}
Geo Shape Queries
All geo-shape queries require your data to be mapped using the geo_shape
mapping. Using geo-shapes we can find documents that intersect with the query shape.
Geo Shape Query
Filter documents indexed using the geo_shape
or geo_point
type.Requires the geo_shape
mapping or the geo_point
mapping.
The geo_shape
the query uses the same grid square representation as to the geo_shape
mapping to find documents that have a shape that intersects with the query shape. It will also use the same Prefix Tree configuration as defined for the field mapping. The query supports two ways of defining the query shape, either by providing a whole shape definition or by referencing the name of a shape pre-indexed in another index. Both formats are defined below with examples.
Spatial Relations
The geo_shape strategy mapping parameter determines which spatial relation operators may be used at search time.
The following is a complete list of spatial relation operators available when searching a geo field:
INTERSECTS
- (default) Return all documents whosegeo_shape
orgeo_point
field intersects the query geometry.DISJOINT
- Return all documents whosegeo_shape
orgeo_point
the field has nothing in common with the query geometry.WITHIN
- Return all documents whosegeo_shape
orgeo_point
the field is within the query geometry. Line geometries are not supported.CONTAINS
- Return all documents whosegeo_shape
orgeo_point
the field contains the query geometry.
GET geo_cities_shapes/_search
{
"query": {
"bool": {
"must": {
"match_all": {}
},
"filter": {
"geo_shape": {
"location": {
"shape": {
"type": "envelope",
"coordinates": [ [ -122.35610961914062, 47.70514099299205 ], [ -122.2283935546875, 47.019001413201916 ] ]
},
"relation": "disjoint"
}
}
}
}
}
}
Pre-Indexed Shape
The query also supports using a shape that has already been indexed in another index. This is particularly useful when you have a pre-defined list of shapes and you want to reference the list using a logical name (for example New Zealand) rather than having to provide coordinates each time. In this situation, it is only necessary to provide:
id
- The ID of the document that containing the pre-indexed shape.index
- Name of the index where the pre-indexed shape is. Defaults to shapes.path
- The field is specified as a path containing the pre-indexed shape. Defaults to shape.routing
- The routing of the shape document if required.
PUT shapes
{
"mappings": {
"properties": {
"geometry": {
"type": "geo_shape"
}
}
}
}PUT shapes/_doc/test
{
"location": {
"type": "envelope",
"coordinates" : [[ -122.35610961914062, 47.70514099299205 ], [ -122.2283935546875, 47.019001413201916 ]]
}
}GET geo_cities_shapes/_search
{
"query": {
"bool": {
"filter": {
"geo_shape": {
"location": {
"indexed_shape": {
"index": "shapes",
"id": "test",
"path": "location"
}
}
}
}
}
}
}