As users of software applications, we can interact with technology in many ways. We can now talk to a computer and have it understand what we are telling it. We can use computer vision to have a machine recognize facial expressions. But by and large, most of the time we still communicate to computers with plain old text. This is proven by the incredible amount of text data that exists. How can we extract meaningful and actionable insights from that text data?
Azure Cognitive Services includes the Text Analytics service. This gives developers an easy to use a REST API with client libraries in several mainstream languages, including C#, JavaScript, and Python. The Text Analytics service has several features that are commonly found when working with text in software applications today. In this guide, we will discuss three: sentiment analysis, named entity recognition, and language detection. When using Azure Cognitive Services, you as a developer don't have to know anything about machine learning or natural language processing. If you can call a REST API or use a language and client library, you can integrate text analytics with your next project! This guide will demonstrate how to use the Text Analytics service with the C# client library.
The Azure Text Analytics cognitive service has a common setup regardless of the features you need. In the Azure portal, search for the Text Analytics service and create a new resource. To experiment, in the pricing tier, select Free and avoid incurring charges. When the resource is provisioned, click the Keys and Endpoints link on the left. This will show you the API keys and endpoint needed to access the service. Treat the API keys like passwords!
First, you'll need to add the Azure.AI.TextAnalytics
NuGet package to the project dependencies. The latest version as of the publication of this guide is 5.0.0.
Add variables for your API key and endpoint.
1static string API_KEY = "your-api-key";
2static string ENDPOINT = "your-endpoint"
The API key is used to create an instance of AzureKeyCredential
, which is used to authenticate the account owner. The endpoint is used to create an instance of Uri
and is the base location.
1var azureCredentials = new AzureKeyCredentials(API_KEY);
2var endpoint = new Uri(ENDPOINT);
All features of Text Analytics will use methods on the TextAnalyticsClient
class, so I'll create an instance:
1var client = new TextAnalyticsClient(endpoint, azureCredentials);
The first feature of the Text Analytics service this guide will discuss is sentiment analysis. The name is self-explanatory. The service analyzes a piece of text and returns a prediction of whether the sentiment is positive or negative. To analyze the sentiment of a piece of text, call the AnalyzeSentiment
method.
1DocumentSentiment sentimentAnalysisResults = client.AnalyzeSentiment("Azure Cognitive Service is fantastic when you need to add AI to an application quickly");
Notice that where I have been using type inference to declare instances of the client and other classes, here I explicitly used DocumentSentiment
. This is because the actual return type of AnalyzeSentiment
is Response<DocumentSentiment>
. The DocumentSentiment
will have a SentenceSentiment
for each sentence in the analyzed text. These are stored in the Sentences
property. The SentenceSentiment
has three properties of interest.
Text
- the sentence itselfSentiment
- the prediction of 'Positive' or 'Negative'ConfidenceScores
- the values of each sentimentThe ConfidenceScores
property has three values, each between 0.0 and 1.0 inclusive for each sentiment:
Positive
Negative
Neutral
The prediction for this text is positive Sentiment
with a score of 1.0 for Positive
and 0.0 for Negative
and Neutral
.
If I changed the text to "It's not so great if you have specialized needs," the prediction for the Sentiment
is negative and the score for Negative
is 1.0.
The Text Analytics service can also parse fourteen different entities out of text. This includes the names of people, geographic locations, email addresses, and phone numbers in the United States and European Union. This is called named entity recognition (NER). Using NER is as simple as using the sentiment analysis. You just call a different method on the TextAnalyticsClient
instance. Simply provide the RecognizeEntities
method with the text to analyze.
1var recognizeEntitiesResult = client.RecognizeEntities("Microsoft Azure is used all over the world from the Australia to Zimbabwe.")
The return value of RecognizeEntities
has a Value
property that is a collection of CategorizedEntity
for each detected entity in the text. The CategorizedEntity
has three properties of interest:
Text
- the entity itselfCategory
- the predicted category of the entity from the list of fourteenConfidenceScore
- a value between 0.0 and 1.0 inclusive, with 1.0 being the most certain in the predicted categoryIf you look at the list of categories in the Cognitive Services docs, some of the categories have subcategories. There is also a Subcategory
property as well.
If the Text Analytics service is asked to find entities in the string "Microsoft Azure is used all over the world, from Australia to Zimbabwe," it will find three entities: "Microsoft Azure", "Australia", and "Zimbabwe". It recognizes "Australia" and "Zimbabwe" as geographic locations with high certainty, 0.91 and 0.87 respectively. However, it predicts that "Microsoft Azure" is an "Organization." This seems odd as Azure is a software product. And Azure isn't that certain about the label, either, with just a 0.51 score. If I modify the text to "Microsoft sells software all over the world from Australia to Zimbabwe," it predicts "Microsoft" is an organization with a score of 0.79. The scores for Australia and Zimbabwe are still quite high. And it also predicts "software" to be a skill with a score of 0.8.
While this guide is written in English, it is only one of many languages that the Text Analytics service recognizes. Calling the DetectLanguage
method on a TextAnalyticsClient
instance will return a value with a predicted language. I've used Microsoft Translator to translate the string "Microsoft sells software all over the world" into Spanish, Russian, and Japanese.
1var spanish = "Microsoft vende software en todo el mundo.";
2var russian = "Microsoft продает программное обеспечение по всему миру.";
3var japanese = "マイクロソフトは世界中でソフトウェアを販売しています。";
As with the sentiment analysis, I must explicitly declare the type of the return value.
1DetectedLanguage detectSpanish = client.DetectLanguage(spanish);
2DetectedLanguage detectRussian = client.DetectLanguage(russian);
3DetectedLanguage detectJapanese = client.DetectLanguage(japanese);
The DetectedLanguage
has a Name
property, which is the predicted language for the text.
If you check the results of analyzing the three translated sentences, you will find that Azure correctly predicts them to be Spanish, Russian, and Japanese. So Azure can work with languages using the Western alphabet, the Cyrillic alphabet used in Russia, and languages like Japanese in which the differences between characters are subtler.
Applications can use Azure Cognitive Services to add sentiment analysis, entity recognition, and language detection. The Text Analytics service requires no knowledge of machine learning and costs a fraction of a penny per transaction. Using the REST API or client libraries, developers can integrate the service into web and mobile applications in almost any language. The hard part is that natural language processing is handled by Microsoft. It's cheaper and more reliable than trying to train machine learning models yourself. These problems have been solved before, so you get to focus on what makes your applications great! Thanks for reading!