Named entity recognition (NER) is a natural language processing task used to identify important named entities in text such as people, places, organizations, date, or any other category. It can be used alone or alongside topic identification and adds a lot of semantic knowledge to the content, enabling us to understand the subject of any given text.
Named entity recognition is an import area in research and text mining. Some use cases are to identify places or people mentioned in a tweet, extract key parts from customer feedback, and compliment or assist in sentiment analysis. In this guide, you will learn how to perform named entity recognition in Azure Machine Learning Studio.
In this guide, you will use a dataset containing a column with two sets of text. The first set of text is about the movie Avengers and the second is about Pluralsight.
First text: Avengers: Endgame is a 2019 American superhero film based on the Marvel Comics superhero team the Avengers, produced by Marvel Studios and distributed by Walt Disney Studios Motion Pictures. The movie features an ensemble cast including Robert Downey Jr., Chris Evans, Mark Ruffalo, Chris Hemsworth, and others. (Source: Wikipedia).
Second text: Pluralsight, Inc. is an American publicly held online education company that offers a variety of video training courses for software developers, IT administrators, and creative professionals through its website. Founded in 2004 by Aaron Skonnard, Keith Brown, Fritz Onion, and Bill Williams, the company has its headquarters in Farmington, Utah. As of July 2018, it uses more than 1,400 subject-matter experts as authors, and offers more than 7,000 courses in its catalog.Since first moving its courses online in 2007, the company has expanded, developing a full enterprise platform, and adding skills assessment modules. (Source: Wikipedia).
Start by loading the data into the workspace.
Once you have logged into your Azure Machine Learning Studio account, click on the EXPERIMENTS option listed on the left sidebar, followed by the NEW button.
Next, click on the blank experiment and a new workspace will open. Give the name
Named Entity Recognition to the workspace.
Next, you will load the data into the workspace. Click NEW, and select the DATASET option shown below.
The selection above will open a window shown below, which can be used to upload the dataset from the local system.
Once the data is loaded, you can see it in the Saved Datasets option. The file name is
ner.csv. The next step is to drag it from the Saved Datasets list into the workspace. To explore this data, right-click and select the Visualize option as shown below.
You can see there is a single column with rows containing the two texts highlighted above.
The Named Entity Recognition module is used to identify things such as names, organizational entities, places, etc. Start by typing "named entity" in the search bar to find the module, and then drag it into the workspace.
In the output above, you can see the Story port. This is the port to which you need to connect the text data, and from which to extract entities. Connect the data to this port as shown below. Run the experiment.
Once the module run is complete, you can right click and select the Visualize option to look at the results.
The output below shows the result from the processing done in the Named Entity Recognition module. The output contains 14 rows and five columns. The
Article variable represents the text rows, one each for Avengers and Pluralsight text. The
Mention variable indicates the part of the sentence identified.
Type variable contains the result of the entity. For example, Marvel Studios is recognized as ORG, which stands for organization. Similarly, Chris Evans is recognized as a person, donated as PER.
Scroll down to look at the output for second text corpus on Pluralsight. Pluralsight, Inc is recognized as ORG, while Aaron Skonnard is correctly identified as PER. Also, you'll notice that the module has correctly recognized Farmington and Utah as locations, denoted as LOC.
This simple data gave insight to the power of the Named Entity Recognition module to identify names, locations, and entities.
Named entity recognition is an advanced area of natural language processing, and businesses are using it to extract information about named entities from text data. Content recommendation, improvement in website interactivity, concept extraction, and text classification are some common applications of named entity recognition. You can learn more with this guide on Python.
To learn more about data science and machine learning using Azure Machine Learning Studio, please refer to the following guides: