Searching and Analyzing Data with Elasticsearch: Getting Started

Elasticsearch is a popular enterprise search engine, which allows you to build powerful search capability. This course focuses on understanding search components and algorithms from first principles, and applying these in practice using REST APIs.
Course info
Rating
(67)
Level
Intermediate
Updated
June 16, 2017
Duration
2h 46m
Table of contents
Description
Course info
Rating
(67)
Level
Intermediate
Updated
June 16, 2017
Duration
2h 46m
Description

Elasticsearch is one of the most popular open source technologies, which allows you to build and deploy efficient and robust search quickly. In this course, Searching and Analyzing Data with Elasticsearch: Getting Started, you'll be introduced to Elasticsearch by learning the basic building blocks of search algorithms, and how the basic data structure at the heart of every search engine works. First, you'll cover how to install and set up a single node server, index and update documents whose contents you want to search, perform a variety of search queries on these document contents, and run analysis to extract insights from your data. Next, you'll explore the TF/IDF algorithm for search ranking and relevance, and the important factors which determine how a document is scored for every search term. Finally, you'll learn how Elasticsearch handles a variety of searches, such as full-text queries, term queries, compound queries, and filters. You'll also run analytical queries on interesting data subsets specified by search terms. By the end of this course, you'll have the necessary knowledge to utilize Elasticsearch in practice.

About the author
About the author

A problem solver at heart, Janani has a Masters degree from Stanford and worked for 7+ years at Google. She was one of the original engineers on Google Docs and holds 4 patents for its real time collaborative editing framework.

More from the author
Building Classification Models with TensorFlow
Intermediate
3h 16m
19 Oct 2017
More courses by Janani Ravi
Transcript
Transcript

Hi, my name is Janani Ravi and welcome to this course on searching and analyzing data using Elasticsearch. I’ll introduce myself, I have a Masters in EE from Stanford and have worked at companies such as Microsoft, Google and Flipkart. At Google I was one of the first engineers working on real time collaborative editing in Google Docs and I hold 4 patents for its underlying technologies. I currently work on my own startup, Loonycorn, a studio for high quality video content.

The search feature is a core part of any product today. Elasticsearch is one of the most popular open source technologies which allows you to build and deploy efficient and robust search quickly.

This course helps you understand the basic building blocks of search algorithms and focuses on the inverted index, the data structure at the heart of every search engine.

Learn Elasticsearch from first principles, install and set up a single node search server, index and update documents whose contents you want to search, perform a variety of search queries on these documents and finally, run analysis to extract insights from your data.

All of these are implemented using queries specified in JSON notation on Elasticsearch's REST API.

Understand the theory behind search - the TF/IDF algorithm for search ranking and relevance and the important factors which determine how a document is scored for every search term.

Learn how Elasticsearch handles a variety of searches such as full-text queries, term queries, compound queries and filters. Finally we’ll, run analytical queries on interesting data subsets specified by search terms.

Experience the full power of Elasticsearch as a search and analytical engine.

Pig is an open source engine which is part of the Hadoop eco-system of technologies. Pig is great at working with data which are beyond traditional data warehouses. It can deal well with missing, incomplete, and inconsistent data having no schema. Pig has it's own language for expressing data manipulations i.e. Pig Latin.

This course starts from the very basics, an overview of Pig, shows you how to get Pig installed and get started working with the Grunt shell. You’ll see how you can load data into relations, store transformed results to files via the load and store commands.

The main focus of the course is on how this data can be transformed to make it more useful for analysis. It'll cover the foreach-generate command along with evaluation and filter functions.

You'll also work on a real world dataset where you analyze accidents in NYC using collision data from the City of New York.

And finally we’ll cover advanced constructs such as the nested foreach and also get a brief glimpse into the world of MapReduce, the parallel programming paradigm.