Tutorial — How to insert WebSpeech API in your project!

Davide Cariola
4 min readJul 21, 2021

Hi everyone!

For beginners, one of the most useful and fun things to do is learn how to implement APIs in various projects.

There are a lot of super cool APIs, and many of them are free as well (take a look here, for example).

But, what’s an API?

API stands for Application Programming Interface, and is -generally speaking- a connection between computers or between computer programs. It’s a type of software interface, offering a service to other softwares, making it available for programmers to use.

One of the most used one is the WebSpeech API which allows us to write or, more commonly, to search through our voice (as you do with Cortana, Siri or Alexa).

Let’s get started!

-Intro

First of all, let’s create a simple HTML file, with a JS script and a CSS stylesheet.

For convenience, let’s also insert the CDNs for Bootstrap and Fontawesome. Don’t forget to link JS and CSS.

<div class="container"><div class="row justify-content-center align-items-center"><div class="col-12 text-center mt-5"><form id="formSearch" action="https://www.google.com/search" method="GET" target="_blank"><input id="inputSearch" class="length" name="q" placeholder="Cerca su Google..." autofocus><button id="vocalSearch" type="button"><i class="fas fa-microphone fs-5 my-1 text-secondary"></i></button></form></div></div></div>

We’ll have something like this:

A simple web page, with a plain searchbar

The input will google anything we write. IDs are very important, so try to make them as clear as possible!

-JavaScript

Let’s get on the juicy part!

Open the JS script and save some elements through their IDs:

let formSearch = document.getElementById('formSearch');let inputSearch = document.getElementById('inputSearch');let vocalSearch = document.getElementById('vocalSearch');

WebSpeech API, sadly, works only on Chrome, which supports it with prefixed properties. So, at the start of our code, we need to include the following line to feed the right objects to Chrome.

let SpeechRecognition = window.SpeechRecognition || window.webkitSpeechRecognition;

We need also to set a condition to evaluate which browser our user is -well- using.

if (SpeechRecognition) {console.log('WebSpeech is supported by the browser!');} else {console.log('WebSpeech is not supported by the browser');
Let’s find out if our browser supports the WebSpecch API

That settled we can continue with our code! We want to start the speech recognition when we click on the microphone:

let vocalSearch = formSearch.querySelector('i');var recognition = new SpeechRecognition();
vocalSearch.addEventListener('click', () => {console.log('click');if(vocalSearch.classList.contains("fa-microphone")) {recognition.start();} else {recognition.stop();}})

With this, we’re telling the code to start the speech recognition when the microphone icon is the basic one, and to stop it when the icon is crossed out. We can do that with Fontawesome, which inserts icons thanks to specific classes in an <i> tag.

Going forward, we obviously need to toggle between the plain icon and the slashed one.

recognition.addEventListener('start', () => {vocalSearch.classList.remove("fa-microphone");vocalSearch.classList.add("fa-microphone-slash");inputSearch.focus();console.log('START!');})recognition.addEventListener('end', () => {vocalSearch.classList.add("fa-microphone");vocalSearch.classList.remove("fa-microphone-slash");inputSearch.focus();console.log('STOP!');})

We’ll have something like this:

Clicking the icon will make it change from plain to slashed…
…and the opposite!

So! We’re almost there! Now we need to use what we say. Specifically, WebSpeech API will transform our words in strings, so we can work on that! We only need to save the results:

recognition.addEventListener('result', (event) => {console.log(event);let result = event.results[0][0].transcript;inputSearch.value = result;setTimeout(() => {formSearch.submit();}, 700);})

With this part of code, we’re actually saving our “stringed voice”. The console.log allows us see it and check if everything is all right.

Having saved the transcript in the result variable, we can now modify the value of the input with it. We have also set a timeout of 700ms after which the search will be carried out.

Displaying the “event” parameter, we can see a lot of interesting things, such as confidence percentage and the actual transcript:

The SpeechRecognitionEvent is a huge object, full of data. The most important are confidence and transcript, which we can find in event.results[0][0]

And that’s it! We implemented a web speech recognition inside a project. Using it in more complex projects, or already started one is really simple.

-Final considerations

WebSpeech API is a really nice tool to play with APIs and Javascript with minimun effort, but you can actually make a great impact with it.

I recommend that you still view the complete documentation, which I linked you above, as today we have only seen the tip of the iceberg.

Feel free to share with me your tries or advices or criticism! I welcome them all.

Bye! A presto!

--

--

Davide Cariola

Backend and Laravel Specialist @ Aulab | Scrum Fundamentals Certified™ — follow me at davidecariola.it