Using the Speech Recognition and Synthesis .NET APIs

This is an introductory course on how to utilize the speech recognition and synthesis APIs in the .NET framework.
Course info
Rating
(11)
Level
Beginner
Updated
Mar 3, 2016
Duration
3h 16m
Table of contents
Description
Course info
Rating
(11)
Level
Beginner
Updated
Mar 3, 2016
Duration
3h 16m
Description

This course introduces the Speech Recognition and Synthesis APIs provided by the .NET framework, which will allow developers to bring new accessibility experiences to their .NET applications. This course will guide developers from getting started with the recognition and synthesis concepts to actually implementing them in C# using a simple WPF application. XML standards, such as Speech Recognition Grammar Specification (SRGS) and Speech Synthesis Markup Language (SSML), will be covered in-depth, as well.

About the author
About the author

Tony has been a .NET developer for 9½ years, and loves to learn and teach. Most of Tony's experience is as a back-end developer, with some experience in web technologies and creating mobile apps.

More from the author
Using the Web Speech API with AngularJS
Intermediate
3h 10m
Jul 12, 2016
Section Introduction Transcripts
Section Introduction Transcripts

Course Overview
(music ) Hello everyone, my name is Tony Thorsen, and welcome to my course, Using the Speech Recognition and Synthesis. NET APIs. I'm a developer, author, and owner at Mark 8 Technologies with about 10 years of development experience. With the advent of digital assistance, like Siris, Cortana, and Alexa, speech recognition and synthesis have been on the rise in recent years, enabling users a more hands-free approach to interacting with technology. In this course, we're going to explore how to add similar functionality to. NET applications using the speech recognition and synthesis APIs the framework offers. Some of the major topics that we will cover include what speech recognition and synthesis actually are, building custom grammars for speech recognition, customizing and selecting different speech synthesis voices, and XML standards, such as speech recognition grammar specification, as well as speech synthesis markup language. By the end of this course, you will have the foundation you need to give your users a more hands-free approach to using your applications. Before beginning the course, you should be familiar with C#, Visual Studio, and WPF. I hope you'll join me on this journey to learn speech recognition and synthesis fundamentals with the Using the Speech Recognition and Synthesis. NET APIs course at Pluralsight.

Introduction
Hi, I'm Tony Thorsen, and welcome to the course on Using the Speech Recognition and Synthesis. NET APIs. As you're probably already aware, systems utilizing speech recognition and synthesis technologies have been on the rise over the years, especially with the introduction of digital assistance like Siri, Cortana, and even Amazon's product the Echo, also known as Alexa. There are definitely more examples out there, but these are some of the more recent and popular applications with which you might already be familiar that utilize the speech recognition and synthesis concepts we will be discussing throughout this course. While this course will not teach you how to build a full-fledged intelligent assistant like these, it will at least get you headed in that direction using the. NET Framework APIs. In this module, we're going to start off by covering the basics on what speech recognition is, as well as a little bit of background information on how it works, at least from Microsoft's standpoint. If you're familiar at all with the machine learning concepts, you may have heard of something else called natural language processing, so we'll quickly cover how it differs from speech recognition and how the two can work together to build more intelligent applications if you want to dig deeper into that later on, since that is really outside of the scope of this course. We'll follow up the speech recognition overview with a little background on what speech synthesis is, and then how it works as well. We'll then take a look at some of the benefits of using these technologies, followed by examples of where you can find them being used, some of the different options Microsoft provides for these technologies that you can bring into your own applications, and then finally, a summary of the key points from this module.

Using the Speech Recognition API
In this module, we're going to kindly bid farewell to speech synthesis for a bit, and focus solely on the speech recognition API. We'll start off by looking at a couple of different speech recognition engine types, their respective benefits, and how they can be used to add speech recognition capabilities to our applications. We'll also get a high level overview of grammars and how they're used, then we'll dig deeper into them in module three. We'll then move on to the different events that the recognition engine fires, and how to handle them, so that we know when the speech has been recognized and converted into text. And then after that, we'll explore some of the engine properties and how we can utilize them to make our engine just a little more efficient. And then finally, we'll wrap up the module with a summary of some of the key points.

Building and Using Speech Recognition Grammars
In this module, we are going to take a look at how we can build and load our own custom grammars, rather than relying on the built-in DictationGrammar we looked at in the previous module. We're going to first get a better understanding as to what grammars are and how we can benefit from using them. We'll then look at a couple of classes in the. NET Framework, Grammar and GrammarBuilder, which will help us construct our grammars. Within the context of these objects, we'll take a look at how to add individual words and phrases to our grammars, and then I'll introduce the concept of choices, which allow for some flexibility in the recognition pattern matching by indicating that any one of a number of defined words can be spoken at different points. We'll then look at how to account for repeated phrases, and how to tell the recognition engine to match on any spoken words or phrases through the use of wildcards. We'll also see how meaning can be added to the words and phrases in our grammars through the use of semantics. And then we'll wrap up the demos on these two classes with the discussion on how we can utilize the priority and weight properties to determine which grammar to use when phrases are matched in multiple loaded grammars. I'll then branch outside of the. NET Framework just a little bit, and introduce you to do the Speech Recognition Grammar Specification, which is a standard that allows you to construct and share grammars across multiple platforms using XML. Each of those items listed under the Grammar and GrammarBuilder bullet will be demonstrated with SRGS as well. And, of course, this module would not be complete without a nice summary of the key takeaways.

Using the Speech Synthesis API
So far throughout this course, our focus has been taking speech input and turning it into text through the speech recognition API. The next two modules are going to be focusing on the opposite process by demonstrating how we can use the. NET speech synthesis API to take text and turn it into speech. We'll start out this module by taking a look at the core class of the API, which is the speech synthesis engine. More specifically, we'll take a look at how to perform basic speech synthesis, the events we can handle on the engine, and how to redirect speech output to wave files. We'll let dig a little deeper by looking at a couple of classes, Prompt and PromptBuilder, and how we can use them to speak from different forms of input such as strings, SSML, another XML standard, and audio files. We'll also look at how to use these classes to supply the synthesizer with aliases, hints, and even custom word pronunciations. Then we'll look at how to add pauses to the speech in order for the output to sound more natural by using breaks and compositional elements, such as sentences and paragraphs. And finally, we'll look at how to use bookmarks to notify us through events when certain points in the speech have been reached, which allows us to kick off processes or perform other work at defined points. Just like with speech recognition, there is also an XML standard for speech synthesis. The standard is called Speech Synthesis Markup Language, or SSML, and we will see how to build out most of the same concepts that we will explore with the Prompt and PromptBuilder classes. We'll look at how to include basic text to be spoken, how to define aliases, hints, and pronunciations, how to add pauses to speech with breaks and compositional elements, and how to define bookmarks as well. And then finally, we'll wrap the module up with a nice little summary.

Customizing the Synthesized Voice
In our final module, we are going to continue working with the speech synthesis engine and see how we can modify some of the properties on the synthesized voice, as well as select different voices to be used by the synthesis engine. Most specifically, we're going to look at how to modify the speech, rate, and volume through properties on the SpeechSynthesizer class itself using methods on the PromptBuilder class, and even through SSML. We'll then see how to retrieve a listing on the voices installed on our machines, and how to select a different voice from that listing. And along those lines, we'll look at some of the other attributes on the voices, such as name, gender, and age, and how we can use those to select different voices as well. The demonstrations on these attributes will be done with the PromptBuilder class, as well as SSML. And then finally, we'll summarize all of the voice changing goodies we picked up along the way.