Designing Schema for Elasticsearch

Elasticsearch is a very popular search and analytics engine which helps you get up and running with search for your site or application in no time. This course covers how to improve search nuances by designing the right schema for your documents.
Course info
Rating
(23)
Level
Intermediate
Updated
Feb 23, 2018
Duration
2h 59m
Table of contents
Course Overview
Modeling Data in Elasticsearch
Managing Relational Content
Working with Geo-spatial Data
Designing for Scale
Description
Course info
Rating
(23)
Level
Intermediate
Updated
Feb 23, 2018
Duration
2h 59m
Description

You can get better search results beyond the basic out-of-the-box search experience with Elasticsearch. In this course, Designing Schema for Elasticsearch, you will learn how to configure indexes to get more nuanced and meaningful search results. First, you will use dynamic and explicit mapping which allows you to specify field types within your document, which in turn determines how they are indexed and searched. Next, you will learn how you can map relationships and hierarchies from the traditional RDBMS world to the flat world of Elasticsearch. Finally, you will see Elasticsearch's special features, working with geospatial data such as GPS, and time-based data such as log files, and also aliasing indices to share them across multiple users for a better search experience. At the end of this course, you will have hands-on experience designing your Elasticsearch indexes and mappings to work well with different kinds of data, such as hierarchical, geospatial, or time-based data.

About the author
About the author

A problem solver at heart, Janani has a Masters degree from Stanford and worked for 7+ years at Google. She was one of the original engineers on Google Docs and holds 4 patents for its real-time collaborative editing framework.

More from the author
Using PyTorch in the Cloud: PyTorch Playbook
Intermediate
2h 21m
Apr 25, 2019
Building Clustering Models with scikit-learn
Intermediate
2h 33m
Apr 24, 2019
More courses by Janani Ravi
Section Introduction Transcripts
Section Introduction Transcripts

Course Overview
Hi, my name is Janani Ravi, and welcome to this course on schema design in Elasticsearch. A little bit about myself. I have a Masters degree in Electrical Engineering from Stanford and have worked at companies such as Microsoft, Google, and Flipkart. At Google, I was one of the first engineers working on real-time collaborative editing in Google Docs, and I hold four patents for those underlying technologies. I currently work on my own startup, Loonycorn, a studio for high-quality video content. Elasticsearch is a very popular search and analytics engine, which can help you get up and running with search for your website or application in no time at all. Elasticsearch seems schemaless and works well right out of the box. But as your search corpus grows, it's more important to get search that is meaningful, nuanced, and correct. That requires a little more work. This is exactly where this course comes in. This course covers topics such as dynamic and explicit mapping, which allows you to specify field types within your document, which in turn determines how your documents are indexed and searched. You'll also learn how you can map relationships and hierarchies from the traditional RDBMS world to the flat world of Elasticsearch. Application-side joins, nested objects, parent-child relationships are some constructs that Elasticsearch offers to modeled relationships. Elasticsearch also has special features to work with geospatial data, such as GPS coordinates and time-based data such as log files. It's also possible to alias indices and share them across multiple users to get a better search experience. This course covers all this and more with a focus on real examples, queries, and set up.

Modeling Data in Elasticsearch
Hi, and welcome to this course on Designing Schema for Elasticsearch. Elasticsearch, as you might already know, is a search and analytics engine, which offers powerful search functionality for your website or your web app. It's very easy to get up and running with Elasticsearch. The default functionality that it provides is just great, but you might find when you become a power user that you need to model your data exactly right to get the most intuitive and logical search results. One of the reasons Elasticsearch is so popular is its simplicity and ease of use. You just have to add all the documents that you want indexed into Elasticsearch. It'll do the right thing and get search up and running for you in no time at all. But as your website or application grows and holds more complex forms of data, you might find that the search indices that you've set up don't work just as well right out of the box. Indexes have to be fine-tuned based on specific real-world use cases. This leads us to mappings. Mappings are data types associated with the fields within your document, which determine how they will be indexed and how they can be searched. Getting your mapping right is the difference between search that is intuitive and search that might seem, well, a little bizarre. In addition, you might have search documents that are in different languages or need to be parsed in a different way. Analyzers decide how search data will be parsed, and cleaned up, and used to build and populate indices.

Managing Relational Content
Hi, and welcome to this module on managing the relational content in Elasticsearch. Now all the documents that we've indexed so far have been independent entities. They don't have interrelationships between them. Elasticsearch works very well in this use case because it treats documents like a flat structure of independent entities; however, in the real world different pieces of information might be connected together or associated together in some logical manner. If you want to query this information together, they should be stored together in Elasticsearch in denormalized form for your query to be performant. A common relationship that you might model in the real world is hierarchical information or parent-child relationships. Object hierarchy can be preserved using nested fields. If the parent-child hierarchy exists between entities which belong to the same index, they can be modeled using special parent-child constructs that Elasticsearch provides.

Working with Geo-spatial Data
Hi, and welcome to this new module where we'll see how we can use Elasticsearch to work with geospatial data. Elasticsearch has special constructs, that is special data types that allow you to work with geographic data. These special data types can be used to represent point locations, as well as shapes on the surface of the earth. Geo-points are used to map single locations. These use latitude-longitude pairs to plot these locations on a map. These point locations are basically coordinates such as your smartphone might send out while performing GPS tracking. There are a number of search queries and aggregations that you can perform that are specific to geographic data. For example, you can find all points that lie within a certain bounding box, if you want to find all restaurants within a certain region, for example. You can also find all points within a certain distance of the origin. All points within a polygon or any arbitrary shape. In addition to single-point locations on a map, you can also represent arbitrary shapes. You can query geo-shape data to determine whether the locations that you've set up overlap with a certain region.

Designing for Scale
Hi, and welcome to this module where we'll structure Elasticsearch to work in different specific real-world use cases. We'll design for scale. A common use case for searching and indexing of data is logs. Elasticsearch has special constructs to work with time-based data such as logs. Logs are different in that recent data is more relevant than older data, so typically all your searches will hit recent logs. Your older logs may be archived and may never be queried. An index in Elasticsearch is really useful only if you're storing a huge number of documents within it. If you have multiple small indices, you may not harness the full power of Elasticsearch. A way around this is to have multiple smaller indices share the same underlying larger index using shards. We'll see how to do that in this module. If you have multiple users of Elasticsearch where each user doesn't have very much data to be stored, you can have all of these users share the same underlying index, but have each user believe that they have their own dedicated index. This can be done using aliases. And finally, we'll see how we can auto-scale indices by adding replicas on the fly.