About Nesta

Nesta is an innovation foundation. For us, innovation means turning bold ideas into reality and changing lives for the better. We use our expertise, skills and funding in areas where there are big challenges facing society.

More about us

Common Voice

Crowdsourcing voices to train speech recognition software

The challenge

Most of the software and voice data that powers the personal assistants in our smart devices is locked up in privately owned systems. Getting access to good‑ quality data takes time and money. As a result, the cost of developing speech recognition and other software that relies on voice data is prohibitively high, giving a few companies a monopoly on these services. There is also little transparency about what data has been used to develop smart assistants, meaning that certain populations can remain underserved. These limitations make the technology less effective for some groups, such as non-native speakers with accents, or for languages spoken by small populations.

The AI and CI solution

Common Voice is a Mozilla initiative, which addresses this challenge by developing the world’s first open-source voice dataset and a speech recognition engine, called Deep Speech. The concept is simple. Common Voice crowdsources voice contributions through an online platform where users are invited to record themselves reading sentences. All sentences are sourced from texts that are under a Creative Commons license , to ensure they can be freely reused by researchers and entrepreneurs in the future. Users can also listen to and validate the contributions from others to ensure that the data is of high enough quality to train an AI algorithm. The market’s leading voice technologies are powered by deep learning algorithms, which can require up to 10,000 hours of validated data to train.

So what?

As of January 2020, users have recorded almost 2,500 hours of their voices in 29 different languages for Common Voice. The aim of the project is to ensure that the data used to train voice recognition tools represents the full diversity of real people’s voices. Each data entry contains an audio file with the linked text, as well as any associated metadata about the contributor, if it is available. By making the datasets open, Mozilla is creating opportunities for a wider range of researchers, developers and public sector actors to develop voice technologies that can benefit a wider range of people. This accessibility can help to incentivise innovation and healthy competition for better tools. Mozilla released the first version of Deep Speech in 2017.

Common Voice is an example of how a collective intelligence (CI) approach to data collection – that emphasises diversity and open access – can be used to improve the development of AI, which in turn has the opportunity to be used for other CI purposes.

Get our regular newsletter and tailor your updates on our missions, programmes and events

Join our mailing list to receive the Nesta edit: your first look at the latest insights, opportunities and analysis from Nesta and the innovation sector.

* denotes a required field

Sign up for our newsletter

First name:

Last name:

Organisation:

Job title:

Country of residence:

I'm interested in *

A fairer start

A sustainable future

A healthy life

Discovery Hub

You can unsubscribe by clicking the link in our emails where indicated, or emailing [email protected]. Or you can update your contact preferences. We promise to keep your details safe and secure. We won’t share your details outside of Nesta without your permission. Find out more about how we use personal information in our Privacy Policy.

Common Voice

About Nesta

Common Voice

The challenge

The AI and CI solution

So what?

AI and Collective Intelligence: case studies

Stay up to date

Common Voice

About Nesta

Common Voice

The challenge

The AI and CI solution

So what?

Also of interest

AI and Collective Intelligence: case studies

Stay up to date

Stay up to date

Sign up for our newsletter