Whether your digital voice assistant is called Siri, Google or Alexa, you have probably come across some funny suggestions as to what the assistant thinks you’ve told it.
Speech recognition is complicated, which means it hasn’t been super lucrative to invest in developing models for a small language like Danish. But in 2019, a group of researchers and thesis students at DTU set out to improve Danish speech recognition with the project “Danspeech”.
“Danspeech is an open-source project that springs from my and Rasmus Arpe Fogh Egebæk’s master thesis. It culminates in some models for speech recognition in Danish, and it has become a very nice showcase of how deep tech can move the needle: When we started three years ago, we were very much in doubt about whether you could compete with the big tech companies when you did not have large amounts of data and a lot of money to throw after training the models. But today our model is better at Danish than Google,” says Martin Carsten Nielsen.
When the thesis was finished, he actually got a regular job. But then he was brought back to DTU with an offer to explore the business potential of the technology for speech recognition, which he had helped develop. And after several grants from DTU and the Innovation Fund to explore the market potential and build the software, he is now co-founder of the startup Alvenir, which wants to make speech recognition mainstream in Danish.
The broad potential is in the niche
With Danspeech, Alvenir got a language model that is better at Danish than the big tech giants. But at the same time, the model is published as ‘open-source’ and therefore available to everyone – which makes it a smaller competitive advantage than one might think.
On the other hand, the founders of the startup themselves have been responsible for the research behind the language models. They know it better than anyone else, and this has enabled them to build a platform around the models, which allows them to quickly target and train speech recognition for particular niches.
“Transcription in itself is rarely super value-creating – it is the analysis of the transcript that makes the difference. That is why we build everything modularly, and over the past year, we have built a flexible and scalable Machine Learning Operations platform. This enables us to quickly and efficiently specialize our language recognition and conduct analysis in various domains – e.g. the healthcare industry, the financial sector, etc.,” says Nielsen.
While Alvenir will probably offer simple transcription in a broad sense, they expect the biggest business potential to be their ability to easily update the database and thus train speech recognition for niche purposes – including the financial sector. A sector that is heavily regulated and therefore also highly dependent on speech recognition being correct if it is to be utilized in making the sector more efficient. Which gives Alvenir and their platform’s ability to learn “finance-Danish” fast a competitive edge.
“Right now, for example, the banks are recording broker calls between each other. This generates an enormous amount of audio data, which we can add a huge value to just by having it transcribed reliably so that it is searchable,” explains Nielsen.
Open-source supports the business
While the language models being made available as open-source doesn’t add to Alvenirs competitiveness, the co-founder still applauds it. Not only because it has been developed with public funding, but also because he has personally been an enthusiastic member of the open-source movement himself and thus seen its advantages.
“It is also a statement. We think that the basic language models should be open. We might be able to hold on to it for a while, but at some point, someone else will just make it readily available. Instead, it benefits us to be a part of the open-source community – and then we would rather compete on other parameters,” Nielsen says.
In addition, he gives a lot of credit to DTU in Alvenir even existing as a startup today.
“In theory, we could have developed it ourselves, but I think a very big part of doing deep tech is that someone provides security. Another fact is, that we basically started out with a cool technology for speech recognition – and had no idea about doing business. It has taken some maturing for me to even be able to sit here today and know what a product/market-fit is. And I do not think that maturation would have come without DTU,” says Martin Carsten Nielsen.
This article is made in collaboration with ‘Digital Tech Summit‘ – one of the amazing partners making the magazine ‘From University to UnicornUnicorn er et udtryk, der bruges til at beskrive en startup-virksomhed, der har nået en værdiansættelse på 1 milliard dollars eller mere. More 2021’ possible. You can read the full magazine here.