Description
Course info
Level
Intermediate
Updated
Jun 2, 2020
Duration
1h 33m
Description

In this course, Creating Named Entity Recognition Systems with Python, you'll look at how data professionals and software developers make use of the Python language.

First, you'll explore the unique ability of such systems to perform information retrieval by identifying specific classes of entities in texts.

Next, you'll learn how to install prerequisite tools and how to create in a step-by-step manner all the specific components of performant NER systems. Finally, you'll be able to create Named Entity Recognition (NER) systems by leveraging the language’s powerful set of open-source NLP libraries. When you’re finished with this course, you’ll have the skills and knowledge of creating named entity recognition systems with Python

About the author
About the author

Andrei is a passionate Data Scientist. He started his career in tech in the automotive industry. After that he pursued a PhD in CS at Delft University of Technology, the Netherlands. Since the graduation of his studies, he worked with large data-sets in domains ranging from scientific research, energy and utilities. Currently he is consultant in Data Science and working mainly with NLP tools. He enjoys being part of the analytics community and regularly joins conferences and specific meetups

Section Introduction Transcripts
Section Introduction Transcripts

Course Overview
Hi, my name is Andrei Pruteanu, and welcome to this course on Creating Named Entity Recognition Systems with Python. I'll introduce myself. I have a PhD in computer science from Delft University of Technology, the Netherlands, and have worked for companies such as NXP Semiconductors and Digital Science. At Digital Science, I was responsible for back‑end processing of large volumes of text documents such as clinical trials and policy documents. I currently am freelance data scientist covering areas such as NLP and time series processing. This course covers the creation of named entity recognition systems. We'll begin by understanding the most important component of such systems, the classifications model and the classification metrics used for evaluating its performance, precision, recall, and F1 score. We start with classic machine learning approaches for classification, namely linear regression, decision trees, naive Bayes, logistic regression, and support vector classifier, and use that implementation from scikit‑learn Python library. We use CRFsuite version of conditional random fields for creating more accurate entity detection models, starting with specific preprocessing that converts the raw dataset into context‑aware data format. We use a technique called hyperparameter tuning to improve its performance even further by looking for close to optimal model parameters. We check what the CRF model has learned about entity classification using the ELI5 machine learning explainability library. Lastly, we compare the performance of CRF models against the custom non‑tuned entity recognition system, trained with one of the most popular NLP libraries, spaCy. We show how the tuned CRF model compares against non‑tuned spaCy version and use the library's visualization capabilities to observe it's accuracy for a random text selection. Before beginning this course, I recommend your being familiar with the basics of Python language. The beginner's course in Python, available on Pluralsight, can quickly get you up to speed. I hope you'll join me to learn creating named entity recognition systems with Python at Pluralsight.