What will you do if you are tired of manual data entry, run programs, write e-mail and do other things on your Windows 8/8.1 computer? You may need to take a break and start doing tasks again. Why don't you turn on Windows Speech Recognition to help you do all these things? Windows Speech Recognition allows you do everything that can be done with a mouse and keyboard, with your voice. And if you're worried about where you can find speech recognition in Windows on your computer, this post will show you 3 ways to enable speech recognition in Windows and tell you how to disable it.

1.Three ways to enable speech recognition in Windows 8/8.1:

Method 1: Turn on speech recognition in Control Panel.

Step 1: Open the menu quick access using a keyboard shortcut Windows key+X and then select Control Panel.

Step 2: In Control Panel, select Ease of Access to enter it.

Step 3: Once you click on ease of access, a window will open, select Launch speech recognition under the option speech recognition.

Method 2: Open speech recognition in "Applications" screen.

Step 1. on home screen(or Metro interface), right click click on the or icon in any empty area and in the lower right corner select All applications.

Step 2: On the Applications screen, find Windows Speech Recognition and open it with your mouse.

Method 3: Enable it via the search bar.

Step 1: Open the search bar using your keystrokes Windows+F and enter in the empty field speech recognition and select from the search list Applications .

Step 2: To the left of the search bar, when the search results appear, you can select Windows Speech Recognition.

2.Two ways to turn off speech recognition in Windows 8/8.1

Method 1: Choose close button in the speech recognition window to turn it off.

Method 2:Use key Alt+F4 to close it.

Now you can feel free to explore the speech recognition feature in Windows 8/8.1.

im trying to create a dynamic speech recognizer, but for some reason it's not working. I tried using the emulaterecognize function and the app works fine, but it doesn't work when I speak. this means the wordlist is added correctly and the speech recognized event functions correctly, but it is never called without emulaterecognize. any help would be appreciated. Below is the code im.

Using System; using System.Collections.Generic; using System.ComponentModel; using System.Data; using System.Drawing; using System.Linq; using System.Text; using System.Threading.Tasks; using System.Windows.Forms; using System.Speech; using System.Speech.Recognition; namespace HotKeyApp ( public partial class Form1: Form ( //initialize speech recognizer SpeechRecognitionEngine sre = new SpeechRecognitionEngine(new System.Globalization.CultureInfo("en-US")); //initialize grammer builder GrammarBuilder gb = new GrammarBuilder(); / /choices will contain the words from the first column Choices jargon = new Choices(); //words will contain the array to give choices string words; //A speech recognition grammar is a set of rules or constraints that define what a speech recognition engine can recognize as meaningful input. Grammar g; private int columns = 2; private int rows; Dictionary HotKeys = new Dictionary(); public Form1() ( InitializeComponent(); ) private void Form1_Load(object sender, EventArgs e) ( ) private void SpeechRecognized(object sender, SpeechRecognizedEventArgs e) ( MessageBox.Show("ping"); //to implement loop through the words array if the match call approporaite method for (int i = 0; i< words.Length; i++) { if (e.Result.Text == words[i]) { MessageBox.Show(words[i]); } } } private void btnCreate_Click(object sender, EventArgs e) { //get number of rows/words rows = Convert.ToInt32(txtNum.Text); //words length is equal to number of rows words = new string; GenerateTable(columns, rows); } private void GenerateTable(int columnCount, int rowCount) { //Clear out the existing row and column styles myGridView.Rows.Clear(); myGridView.Columns.Clear(); myGridView.Columns.Add("WordColumn", "Word"); myGridView.Columns.Add("HotKeyColumn", "HotKey"); //loop as many times as need to create the rows for (int y = 0; y < rowCount; y++) { myGridView.Rows.Add(); } } private void btnSubmit_Click(object sender, EventArgs e) { int i = 0; foreach (DataGridViewRow r in myGridView.Rows) { string Instructions = r.Cells.Value.ToString(); string Command = r.Cells.Value.ToString(); HotKeys.Add(Instructions, Command); words[i] = Instructions; i++; } //give jargon the words array jargon.Add(words); //give the grammer builder the jargon choices gb.Append(jargon); //build grammer, load grammer, enable voice recognition g = new Grammar(gb); sre.RequestRecognizerUpdate(); sre.LoadGrammarAsync(g); sre.SpeechRecognized += new EventHandler(SpeechRecognized); //set sre to use default audio device sre.SetInputToDefaultAudioDevice(); sre.RecognizeAsync(RecognizeMode.Multiple); MessageBox.Show("Recognition enabled"); //Register a handler for the SpeechRecognized event. //sre.EmulateRecognize("Hello"); } } }

tried to convert it to a console application and have it work, but I need it in a Windows forms application. here is the console code:

Class Program ( static SpeechRecognitionEngine sre; //words will contain the array to give choices static string words; static void Main(string args) ( //initialize speech recognizer sre = new SpeechRecognitionEngine(new System.Globalization.CultureInfo("en-US ")); //initialize grammer builder GrammarBuilder gb = new GrammarBuilder(); //choices will contain the words from the first column Choices jargon = new Choices(); //A speech recognition grammar is a set of rules or constraints that define what a speech recognition engine can recognize as meaningful input. Grammar g; string input; Console.WriteLine("Input words seperated by comma ,"); input = Console.ReadLine(); words = input.Split(new char ( " ," ), StringSplitOptions.RemoveEmptyEntries); foreach (string s in words) ( Console.WriteLine(s); ) Console.ReadKey(); //give jargon the words array jargon.Add(words); //give the grammer builder the jargon choices gb.Append(jargon); //build grammer, load grammer, enable voice recognition g = new Grammar(gb); sre.RequestRecognizerUpdate(); sre.LoadGrammarAsync(g); //set sre to use default audio device sre.SetInputToDefaultAudioDevice(); sre.SpeechRecognized += new EventHandler(SpeechRecognized); sre.RecognizeAsync(RecognizeMode.Multiple); Console.ReadLine(); ) static void SpeechRecognized(object sender, SpeechRecognizedEventArgs e) ( Console.WriteLine("Recognized Word"); //to implement loop through the words array if the match call approporaite method for (int i = 0; i< words.Length; i++) { if (e.Result.Text == words[i]) { Console.WriteLine(words[i]); } } } }

Friends, the other day we looked at one of the innovations brought to Windows 10 by the Fall update Creators Update. Support voice input in Russian Microsoft company promises in the future, but does not specify whether this future is near or far foreseeable. Perhaps this will be when Cortana will be able to speak, and most importantly, understand Russian. There’s no point in waiting for Microsoft to teach Windows 10 to understand ours. If something is not in the environment of the system itself, it can almost always be implemented at the expense of third parties software. Actually, we will talk about them in this article. Below we will consider various ways, how you can use the microphone built into a laptop or connected to a PC to enter search queries by voice and dictate the text of documents.

1. “Ok Alice” and Google voice search for search queries

Inexplicably Google company has not yet integrated this technology into the YouTube interface. But in any case, you can search for videos in the search engine itself, pronouncing key queries. You just need to switch to the “Video” tab in the search results. The lion's share search results for known reasons it will still be from YouTube.

Those who have already managed to get acquainted with it do not need to go to the search engine’s website in the browser window. After installing the program, the Yandex search field with the ability to enter queries by voice will appear directly on the panel Windows tasks. And Alice will be able to answer rare simple questions without a search engine, without going beyond her dialog box.

2. Web service Web Speech API from Google

The Web Speech API technology, through which voice input of queries is implemented in the Google search engine, has its own web interface at the address:

https://www.google.com/intl/ja/chrome/demos/speech.html

The functionality of the service is minimalistic: it contains a button to turn on the microphone and a result field, where the recognized text will then be displayed.

But you cannot make edits in this result field. As a result, we get the recognition results as they are. And we can only edit them in some kind of text editor or data entry form. The button below the result field “Copy and Paste” ends the current input session and automatically places a highlight block on all recognized text. This is done for ease of copying to the clipboard.

Another option is available for recognized text, implemented by the create button email. She launches mail client, installed as such by default in the Windows environment, creates a new letter and transfers the recognized text into it.

It is noteworthy that the Web Speech API can recognize some punctuation marks, at least the period and comma. So during dictations, in places where periods and commas are inserted, you can simply pronounce them.

The lack of ability to edit text within the result field makes using Web interface The Speech API is not entirely convenient for large amounts of typing. For long dictations, it is better to use the web interface of the Google Docs service, which has Web Speech API technology built into it. In Google Docs, you can enter text by voice, edit it immediately, and format the document along the way.

3. “Voice notepad” on Speechpad.Ru

Another website based on the Web Speech API technology is the most popular and most functional voice input service on the RuNet, “Voice Notepad”. Among its main functions:

  • Supports multiple languages, including Russian and Ukrainian;
  • The resulting voice recognition field with the ability to edit text, translate it into other languages, and upload results to a TXT file;
  • Outputting recognized phrases to the clipboard;
  • Transcription;
  • Integration into Chromium browser web forms;
  • Integration into Windows environment and Linux.

Plus to all this, in the “Voice Notepad” the voice input option is turned on and off only by pressing the corresponding button. This option is not deactivated by itself as soon as we think for a while in search of the exact formulation of a thought, as happens in other services based on the Web Speech API.

And we track the recognized text in the resulting field.

4. Integrating Speechpad into browser web forms

After implementing this extension in context menu web text entry forms, the “Speechpad” item will appear. Press this button and speak into the microphone. This way we can, for example, dictate notes in Google Keep.

5. Integration of Speechpad into the Windows environment

The capabilities of the Voice Notepad web service can be integrated into the Windows environment. And type text by voice in any operating system program - standard notepad, Microsoft Word, other text editors. Recognized speech will be inserted directly into edited documents without the mediation of web services or the clipboard. However, this function of Speechpad.Ru is not free, and it costs 100 rubles. per month. Saving options are provided: if you pay for services immediately for the quarter, the cost will be 250 rubles, and prepayment for the year will cost 800 rubles. Each registered user can first test the service functionality integrated into their operating system environment. The creators of Speechpad.Ru offer a two-day trial period for free. How is Voice Notepad directly integrated into OS, in particular in Windows, is described in detail on the website of Speechpad.Ru itself. Click the question mark next to the integration option.

And we go through all the steps described in the instructions:

  • Install the above service extension;
  • Download the package of integration files;
  • Unpack the archive and run the install_host.bat file;
  • On the Speechpad.Ru website we go to the user account;

Click the “Enable test period” button.

And so on every time you need to activate voice input. That's all, actually. Now you can open Microsoft Word, LibreOffice Writer, etc. text editors and start dictation. The recognized text will appear in the window of any active application that supports data entry.

Important: to use Speechpad integrated into the system, you cannot close its website tab in the browser window. Closing the latter deactivates voice input.

5. Free alternatives to integrate voice input into the Windows environment

What could be free alternatives integration of Russian-language voice input into the Windows environment?

Option #1

Completely free of charge on the Speechpad.Ru website you can use the option to output recognized speech to the clipboard. Click on the “Enable Recording” button on the website and go to any Windows application.

Now we can pronounce individual phrases and paste them from the buffer using the Ctrl+V keys. As soon as we pause in speech, we will hear the Speechpad squeak, indicating that the phrase has been recognized and copied to the clipboard. This method of working with voice input has its advantages: when inserting individual phrases, you can simultaneously edit the text completely.

Option No. 2

For those working with office suite applications, Microsoft can offer its work on implementing voice input - the Dictate add-on, which integrates into Word, Outlook and PowerPoint additional menu tab with speech recognition tool. The add-on can recognize speech in 20 languages, including Russian, and allows you to simultaneously translate text into 60 languages.

Another free way voice text input – recording speech into an audio file with further automatic transcription (transcription into text). Almost everyone can immediately express their thoughts in a structured literary language, and at the same time correct recognition errors and add punctuation marks. When recording a speech on a voice recorder, you can fully concentrate on the essence of the material being presented, and in the process of transcription, you can direct all your attention to the eloquence and literacy of presenting this material. But, friends, automation of transcription of audio recordings is a topic for another, separate article.

Continued in articles:

  • Translation

Since deep learning entered the speech recognition scene, the number of errors in word recognition has decreased dramatically. But despite all the articles you may have read, we still don't have human-level speech recognition. Speech recognizers have many failure modes. For further improvement, they need to be identified and tried to be eliminated. This is the only way to move from recognition that works for some people most of the time to recognition that works for all people all of the time.

Improvements in the number of words incorrectly recognized. A test voice set was collected on a telephone switch in 2000 from 40 random conversations between two people whose native language was English.

To say that we have reached human-level speech recognition in conversations based only on a set of conversations from a telephone switchboard is the same as saying that a robotic car drives as well as a person, having tested it in a single city on a sunny day without any traffic. . The recent developments in speech recognition are surprising. But the claims for human-level speech recognition are too bold. Here are a few areas where improvements still need to be made.

Accents and noise

One of the obvious disadvantages of speech recognition is processing accents and background noise. The main reason for this is that most of the training data consists of American speech with a high signal-to-noise ratio. For example, a set of conversations from a telephone switch contains only conversations of people whose native language is English (mostly Americans) with little background noise.

But increasing training data alone will likely not solve this problem. There are many languages ​​containing many dialects and accents. It is unrealistic to collect labeled data for all cases. Creating a high-quality speech recognizer for American English alone requires up to 5 thousand hours of audio recordings translated into text.


Comparison of speech-to-text people with Baidu's Deep Speech 2 on different types speech. People are worse at recognizing non-American accents, perhaps due to the abundance of Americans among them. I think that people who grew up in a certain region would have much fewer errors in recognizing the accent of that region.

In the presence of background noise in a moving car, the signal-to-noise ratio can reach values ​​of -5 dB. People easily cope with recognizing the speech of another person in such conditions. Automatic recognizers deteriorate much faster as noise increases. The graph shows how much the gap between people increases with increasing noise (at low SNR, signal-to-noise ratio) values.

Semantic errors

Often the number of incorrectly recognized words is not the goal in itself of a speech recognition system. We target the number of semantic errors. This is the proportion of expressions in which we incorrectly recognize the meaning.

An example of a semantic error is when someone suggests “let’s meet up Tuesday” and the resolver returns “let’s meet up today.” There are also errors in words without semantic errors. If the recognizer did not recognize “up” and returned “let’s meet Tuesday,” the semantics of the sentence did not change.

We need to carefully use the number of incorrectly recognized words as a criterion. To illustrate this, I will give you an example of the worst possible case. 5% of word errors correspond to one missing word out of 20. If there are 20 words in each sentence (which for in English quite within the average), then the number of incorrectly recognized sentences approaches 100%. One can hope that incorrectly recognized words do not change the semantic meaning of the sentences. Otherwise, the recognizer may incorrectly decipher each sentence even with a 5% number of incorrectly recognized words.

When comparing models with people, it is important to check the essence of errors and monitor not only the number of incorrectly recognized words. In my experience, people who translate speech to text make fewer errors and they are not as serious as those made by computers.

Researchers at Microsoft recently compared the errors of humans and computer recognizers of similar levels. One of the differences found is that the model confuses “uh” [uh-uh...] with “uh huh” [uh-huh] much more often than people. The two terms have very different semantics: “uh” fills pauses, while “uh huh” denotes acknowledgment from the listener. Also, many errors of the same types were found in models and people.

Many voices in one channel

Recognize recorded telephone conversations easier also because each speaker was recorded on separate microphone. There is no overlap of multiple voices in one audio channel. People can understand several speakers, sometimes speaking simultaneously.

A good speech recognizer should be able to divide the audio stream into segments depending on the speaker (subject it to diarization). He must also extract meaning from an audio recording with two overlapping voices (source separation). This must be done without a microphone located directly at the mouth of each speaker, that is, so that the recognizer works well if placed in an arbitrary location.

Recording quality

Accents and background noise are just two factors that a speech recognizer must be robust to. Here are a few more:

Reverberation in different acoustic conditions.
Equipment-related artifacts.
Artifacts of the codec used to record and compress the signal.
Sampling frequency.
The age of the speaker.

Most people cannot tell the difference between mp3 and wav recordings. Before they can claim performance comparable to that of humans, recognizers must become robust to these sources of variation.

Context

You can notice that the number of mistakes that people make on tests in recordings with telephone exchange, quite high. If you were talking to a friend who didn't understand 1 word out of 20, you would have a very difficult time communicating.

One of the reasons for this is recognition without taking into account context. In real life, we use many different additional cues to help us understand what another person is saying. Some examples of context used by humans that are ignored by speech recognizers:

The history of the conversation and the topic being discussed.
Visual clues about the speaker - facial expressions, lip movements.
The body of knowledge about the person we are talking to.

Nowadays, Android's speech recognizer has a list of your contacts, so it can recognize your friends' names. Voice search on maps uses geolocation to narrow down the number possible options, to which you want to build a route.

The accuracy of recognition systems increases with the inclusion of such signals in the data. But we are just beginning to delve into the type of context we might include in processing and how we can use it.

Deployment

Recent advances in spoken language recognition cannot be unrolled. When imagining deploying a speech recognition algorithm, you need to keep latency and processing power in mind. These parameters are related because algorithms that increase power requirements also increase latency. But for simplicity, we will discuss them separately.

Latency: the time from the end of the user's speech until the end of receiving the transcription. A small delay is a typical requirement for recognition. It greatly influences the user's experience of working with the product. Limitations of tens of milliseconds are common. This may seem overly restrictive, but remember that producing a transcript is usually the first step in a series of complex calculations. For example, in the case of voice Internet search, after speech recognition, you still need to have time to perform the search.

Bidirectional recurrent layers are a typical example of an improvement that makes latency worse. All the latest high quality transcription results are obtained with their help. The only problem is that we can't count anything after the first bidirectional layer has passed through until the person has finished speaking. Therefore, the delay increases with sentence length.


Left: Direct recurrence allows decryption to begin immediately. Right: Bidirectional recurrence requires waiting until the end of the speech before starting to transcribe.

A good way to effectively incorporate future information into speech recognition is still being sought.

Computing power: This parameter is affected by economic constraints. The cost of the banquet must be taken into account for each improvement in the accuracy of the recognizer. If the improvement does not reach the economic threshold, it will not be deployed.

A classic example of continuous improvement that is never deployed is collaborative deep learning. Reducing the number of errors by 1-2% rarely justifies an increase in computing power by 2-8 times. Modern models of recurrent networks also fall into this category, since they are very unprofitable to use in searching a bunch of trajectories, although I think the situation will change in the future.

I want to clarify that I am not saying that improving recognition accuracy with a serious increase in computational costs is useless. We have already seen how the principle of “first slowly but surely, then quickly” works in the past. The point is that until the improvement is fast enough, it cannot be used.

In the next five years

In the field of speech recognition, there are still many unresolved and complex problems. Among them:

Expanding the capabilities of new data storage systems, recognition of accents, speech against a background of strong noise.
Incorporating context into the recognition process.
Diarization and source separation.
Number of semantic errors and innovative methods for evaluating recognizers.
Very low latency.

I look forward to the progress that will be made over the next five years on these and other fronts.

Tags: Add tags

Text and speech recognition functions, in my opinion, are one of the most convenient special ones Windows features 8.1. And not only convenient, but also easy to set up. In fact, Windows 8.1 recognizes handwriting quite well with the default settings, but if you're not happy with it, you can do some additional training.

Open the options section " Language" in the control panel, highlight the language you want to learn and click on the link " Options"on his right side.

A tutorial window will appear. Here you can select required action: retrain Windows if it makes specific text recognition errors or teach your handwriting specifically. Note that the second option may have a long learning curve.

Speech recognition in Windows 8.1.

Windows 8.1 allows you to control your PC with your voice, using the microphone built into your tablet, laptop, or ultrabook, or an external headset. The speech recognition option can be accessed at start screen, by entering the phrase speech recognition into the search bar, where you will be asked what audio device you want to use.

Next, you will be asked a series of questions, after which you will be asked to read the training manual. Following the steps in this guide will make learning Windows much easier. It's better to spend some time and train Windows 8.1 to recognize your specific speech.

You will be asked to view help, a printout of which can be very useful for remembering various voice commands. During operation, the speech recognizer floats on the desktop and can be docked at the top or bottom of the screen.

Basically, the speech recognition program in Windows 8.1 works great.

Accessing all of the speech recognition device's controls is easy, just right-click on its window.

There you will see options to continue training the device, configuring both it and your microphone.

Basic speech recognition controls:

  • Launch by program name, for example, the words Calculator, Word or Excel, launch the corresponding program
  • Switch by program name, switches to a program if it is already running.
  • You can control programs that have drop-down menus by saying the name of the menu and then the name of the desired option. This feature also works on ribbon controls in Windows 8.1. Microsoft Office, and other programs that use them.
  • Show numbers displays numbers superimposed on the controls, which can later be named to activate them.
  • On a web page, you can follow a link simply by naming it; for example, contact us.
  • You can activate clicking on an element by saying double click or alternatively right-click on a specific element; for example, double click cart.
  • Start Listening/Stop, turns the speech recognition system on or off.
  • What can I say? Will display help.
  • Show Speech Options, displays a list of options for the speech recognition device; also available by right click.
  • Show/hide speech recognition, the speech recognition device will be minimized to the system tray or returned to the desktop.

If the speech recognition device does not recognize something, it displays " Alternatives panel”, which contains the best guesses of what was said. You can choose from them by saying the number to the left of the correct element. This will also help train Windows 8.1 speech recognition systems.

Using text and speech recognition functions you will make your work much easier, making it more convenient and faster. For example, the handwriting function allows you to enter text by hand, which is very convenient on mobile devices. And speech recognition, control your PC with your voice.