As users of software applications, we can interact with technology in many ways. We can now talk to a computer and have it understand what we are telling it. We can use computer vision to have a machine recognize facial expressions. But by and large, most of the time we still communicate to computers with plain old text. This is proven by the incredible amount of text data that exists. How can we extract meaningful and actionable insights from that text data?
The Azure Text Analytics cognitive service has a common setup regardless of the features you need. In the Azure portal, search for the Text Analytics service and create a new resource. To experiment, in the pricing tier, select Free and avoid incurring charges. When the resource is provisioned, click the Keys and Endpoints link on the left. This will show you the API keys and endpoint needed to access the service. Treat the API keys like passwords!
First, you'll need to add the
Azure.AI.TextAnalytics NuGet package to the project dependencies. The latest version as of the publication of this guide is 5.0.0.
Add variables for your API key and endpoint.
static string API_KEY = "your-api-key"; static string ENDPOINT = "your-endpoint"
The API key is used to create an instance of
AzureKeyCredential, which is used to authenticate the account owner. The endpoint is used to create an instance of
Uri and is the base location.
var azureCredentials = new AzureKeyCredentials(API_KEY); var endpoint = new Uri(ENDPOINT);
All features of Text Analytics will use methods on the
TextAnalyticsClient class, so I'll create an instance:
var client = new TextAnalyticsClient(endpoint, azureCredentials);
The first feature of the Text Analytics service this guide will discuss is sentiment analysis. The name is self-explanatory. The service analyzes a piece of text and returns a prediction of whether the sentiment is positive or negative. To analyze the sentiment of a piece of text, call the
DocumentSentiment sentimentAnalysisResults = client.AnalyzeSentiment("Azure Cognitive Service is fantastic when you need to add AI to an application quickly");
Notice that where I have been using type inference to declare instances of the client and other classes, here I explicitly used
DocumentSentiment. This is because the actual return type of
DocumentSentiment will have a
SentenceSentiment for each sentence in the analyzed text. These are stored in the
Sentences property. The
SentenceSentiment has three properties of interest.
Text- the sentence itself
Sentiment- the prediction of 'Positive' or 'Negative'
ConfidenceScores- the values of each sentiment
ConfidenceScores property has three values, each between 0.0 and 1.0 inclusive for each sentiment:
The prediction for this text is positive
Sentiment with a score of 1.0 for
Positive and 0.0 for
If I changed the text to "It's not so great if you have specialized needs," the prediction for the
Sentiment is negative and the score for
Negative is 1.0.
The Text Analytics service can also parse fourteen different entities out of text. This includes the names of people, geographic locations, email addresses, and phone numbers in the United States and European Union. This is called named entity recognition (NER). Using NER is as simple as using the sentiment analysis. You just call a different method on the
TextAnalyticsClient instance. Simply provide the
RecognizeEntities method with the text to analyze.
var recognizeEntitiesResult = client.RecognizeEntities("Microsoft Azure is used all over the world from the Australia to Zimbabwe.")
The return value of
RecognizeEntities has a
Value property that is a collection of
CategorizedEntity for each detected entity in the text. The
CategorizedEntity has three properties of interest:
Text- the entity itself
Category- the predicted category of the entity from the list of fourteen
ConfidenceScore- a value between 0.0 and 1.0 inclusive, with 1.0 being the most certain in the predicted category
If you look at the list of categories in the Cognitive Services docs, some of the categories have subcategories. There is also a
Subcategory property as well.
If the Text Analytics service is asked to find entities in the string "Microsoft Azure is used all over the world, from Australia to Zimbabwe," it will find three entities: "Microsoft Azure", "Australia", and "Zimbabwe". It recognizes "Australia" and "Zimbabwe" as geographic locations with high certainty, 0.91 and 0.87 respectively. However, it predicts that "Microsoft Azure" is an "Organization." This seems odd as Azure is a software product. And Azure isn't that certain about the label, either, with just a 0.51 score. If I modify the text to "Microsoft sells software all over the world from Australia to Zimbabwe," it predicts "Microsoft" is an organization with a score of 0.79. The scores for Australia and Zimbabwe are still quite high. And it also predicts "software" to be a skill with a score of 0.8.
While this guide is written in English, it is only one of many languages that the Text Analytics service recognizes. Calling the
DetectLanguage method on a
TextAnalyticsClient instance will return a value with a predicted language. I've used Microsoft Translator to translate the string "Microsoft sells software all over the world" into Spanish, Russian, and Japanese.
1 2 3
var spanish = "Microsoft vende software en todo el mundo."; var russian = "Microsoft продает программное обеспечение по всему миру."; var japanese = "マイクロソフトは世界中でソフトウェアを販売しています。";
As with the sentiment analysis, I must explicitly declare the type of the return value.
1 2 3
DetectedLanguage detectSpanish = client.DetectLanguage(spanish); DetectedLanguage detectRussian = client.DetectLanguage(russian); DetectedLanguage detectJapanese = client.DetectLanguage(japanese);
DetectedLanguage has a
Name property, which is the predicted language for the text.
If you check the results of analyzing the three translated sentences, you will find that Azure correctly predicts them to be Spanish, Russian, and Japanese. So Azure can work with languages using the Western alphabet, the Cyrillic alphabet used in Russia, and languages like Japanese in which the differences between characters are subtler.
Applications can use Azure Cognitive Services to add sentiment analysis, entity recognition, and language detection. The Text Analytics service requires no knowledge of machine learning and costs a fraction of a penny per transaction. Using the REST API or client libraries, developers can integrate the service into web and mobile applications in almost any language. The hard part is that natural language processing is handled by Microsoft. It's cheaper and more reliable than trying to train machine learning models yourself. These problems have been solved before, so you get to focus on what makes your applications great! Thanks for reading!