Whisper, a new ASR engine

whisper Whisper is an automatic speech recognition (ASR) system trained on 680,000 hours of multilingual and multitask supervised data collected from the web. The developer of Whisper, OpenAI, shows that the use of such a large and diverse dataset leads to improved robustness to accents, background noise and technical language. Moreover, it enables transcription in multiple languages, as well as translation from those languages into English. The 9 models are open-source and can be downloaded.

To my knowledge, Whisper is a very good (and probably the best) ASR engine for now and it can be used as a foundation for building useful applications and for further research on robust speech processing. It was launced in September 2022 and is gaining a lot of positive response.

Privacy

In many cases where AV-recordings are made, privacy can be an important item. Especcialy when the recordings are "sensitive" the interviewer (or owner of the recordings) must take care about a carefull handling of the recodings. Whisper can easily run on a fast and big server, on your own, small laptop and on any device between these two. The recognition will give an equal result. The only real difference is the processing speed of Whisper: the better your computer (and especially when it has a graphical card) the faster the recognition.
So, certainly for people who have a fast computer and who occasionally do have sensitive date, we always recommend installing it at least on your own system as well in order to avoid the risk of data breach.

Is ASR ready?

Yes and No! At this moment (April 2024) there are still a few drawbacks to Whisper. In Februari 2024 the problem of diarization (knowing which speaker is speaking) was solved by WhisperX.In the same update, they combined their Whisper with Fast-Whisper, which results in a 10x speed improvement (at least if you use a GPU).

But, Whisper still tends to hallucinate, although the effect is relatively infrequent. The use of the VAD-filter (WhisperX) helps to prevent this hallucination. It tells if there is speech or not, and if not no recognition is done. Another disadvantage of Whisper is that the transcription is sometimes too polished. For most people this is no problem at all but for academic research you sometimes want to know if people hesitate, what they might have wanted to say but swallowed it halfway through or what so ever. Whisper usually turns the transcription most of the time into a nice, grammatically correct sentence that reads well but that you don't always want. For example: "I um I, I thought I'd to do that for a moment" is usually recognised by Whisper as "I thought I'd to do that for a moment".
This last effect is probably caused by the fact that Whisper uses the chatGPT-alike language model to "translate" the recognition into a well-running sentence. Again, this is excellent for transcribing most speech but may not always be desirable for the research of speech and/or dialogues where hesitations, pauses, repetitions and other disfluencies are the topic of research.

Set-up Whisper

Whisper came out as a Python script. After installing Python (version 3.9 - 3.10) you need to install PyTorch (1.10.1) and FFMPEG.
Once done, you can download and install (or update to) the latest release of Whisper with the following command:
pip install -U openai-whisper
For more information about this, see here.

Available models and languages

There are five model sizes, four with English-only versions, offering speed and accuracy tradeoffs. Below are the names of the available models and their approximate memory requirements and relative speed.

Size	Parameters	English-only model	Multilingual model	Required VRAM	Relative speed
tiny	39 M	`tiny.en`	`tiny`	~1 GB	~32x
base	74 M	`base.en`	`base`	~1 GB	~16x
small	244 M	`small.en`	`small`	~2 GB	~6x
medium	769 M	`medium.en`	`medium`	~5 GB	~2x
large	1550 M	N/A	`large`	~10 GB	1x

The .en models for English-only applications tend to perform better, especially for the tiny.en and base.en models. OpenAi observed that the difference becomes less significant for the small.en and medium.en models.

Performance

Whisper's performance varies widely depending on the language and the model. The figure below shows a WER (Word Error Rate) breakdown by languages of the Fleurs dataset using the large-v2 model (The smaller the numbers, the better the performance). Additional WER scores corresponding to the other models and datasets can be found on the Whisper website. For more information, see her.

License

Important: Whisper's code and model weights are released under the MIT License. See LICENSE for further details.

Windows

MacOS

Whisper Results

Whisper, especially with the medium and large models, gives very good recognition results. However, the text is sometimes a little too good which certainly tends to improve readability, but makes the results slightly less suitable for linguistic research.

Translations

Whisper has the ability to translate the recognised speech directly into English. However, it turns out that if we recognise a Dutch text with German, English, or Italian, we see the transcriptions in those languages as well. Apparently, Whisper can do more in this area than they indicate. However, as you can see below, the German result, a language close to Dutch, is good, English is acceptable but Italian fails.
To compare with a state-of-the-art translator, we have added the translations of the Dutch text by DeepL below it.

Recognition and Translations by Whisper

Dutch (recognition)	German (recognition/translation?)	English (recognition/translation?)	Italian (recognition/translation?)
Dank u wel, wethouder. Na de introductie ga ik door een aantal slides heen, waarin wellicht wat overlaps zit. Als u denkt, ze gaat te snel of ze gaat te langzaam, roep dan even. Dan kunnen we langer met elkaar in gesprek. Ik denk dat ik de heer Meijer niet moe voor te stellen. Op mijn rechterhand voor u links zit Mark van de Vliet. Hij is één van de externe projectleiders van Hefo. In dit geval dé externe projectleider van Hefo. die ons begeleiden bij de scenario-studie om te komen tot een in ieder geval gedegen verhaal. Mijn naam is Beert de Metz en ik heb in ieder geval voor hier het stokje overgenomen van Jan Derksen die u bij de vorige beeldvormende avond heeft gezien. En ik neem nu even de rol als projectleider van dit deel van het traject op. Even kijken, dit hebben we net al wel een soort van behandeld. Dit gaat over het ZBB-besluit en de besparing van 2 ton op de exploitatielasten van de huisvesting van de gemeentelijke organisatie in 2023 en vanaf 2024 2,5 ton. Onderzoek is verricht naar meer vierkante meters verhuren, dat bleek niet haalbaar. En toen is er onderzoek gedaan naar alternatieve mogelijkheden om toch tot die besparing te komen. Daar bent u op 15 november over geïnformeerd. Er waren er zes. Die zijn teruggebracht naar drie. Drie voorkeursscenario's. En daar is een marktconsultatie op losgelaten met als doel te kijken of iets haalbaar was. We hebben niet al die scenario's de markt opgebracht en gekeken. Gooi, hoe komt u eens met alle mogelijke offertes om het helemaal door te rekenen. Maar wat is haalbaar? En wat draagt bij aan het creëren van waarde in welke zin dan ook? Net zo goed als wat draagt bij aan het besparen op de exploitatielasten? 1 en 3 zijn in principe haalbaar. 2 is zeer onzeker. Dus zijn we verder gegaan met het uitwerken van in ieder geval 1 en 3. En bij 3 zullen we zo meteen laten zien waarom 3 in ons optiek het meeste de voorkeur geniet. Wat waren nou uitgangspunten waar wij mee aan de slag zijn gegaan? Dat is uiteraard het verlagen van die structurele last op de exploitatie. Dat was dus het ZBB-besluit uit 2021.	Dankeschön, Vizepräsidentin. Nach der Introduktion gehe ich durch ein paar Slides, in denen vielleicht etwas überlappt ist. Wenn ihr denkt, sie geht zu schnell, oder sie geht zu langsam, dann ruft dann kurz. Das können wir länger miteinander in Gesprächen. Ich denke, dass ich den Herrn Mayer nicht mehr vorstelle. Auf meiner rechten Hand, vor euch links, steht Mark van der Vliet. Er ist einer der externe Projektleiter von Hefo. In diesem Fall der externe Projektleiter von Hefo. die uns begeleiten bei der Szenario-Studie, um zu kommen, dass es ein gedegenes Verhältnis gibt. Mein Name ist Behrte Metz, und ich habe hier den Stokken von Jan Derksen, die Sie bei der vorigen Bildformende Abend gesehen haben, übernommen. Und ich nehme nun die Rolle als Projektleiter von diesem Teil des Trajekts auf. Schauen wir mal, das haben wir gerade schon einiges behandelt. Es geht um die ZBB-Beschläge, eine Besparung von 2 Tonnen auf den Exploitationen von der Häusfestung der öffentlichen Organisation in 2023 und ab 2024 2,5 Tonnen. Die Untersuchung wurde nach mehr Vierkantemetern verhüren, das war nicht erlaubt. Und dann wurde die Untersuchung nach alternativen Möglichkeiten gemacht, um trotzdem zu dieser Besparung zu kommen. Da sind Sie auf den 15. November darüber informiert. Es waren sechs. Die sind zurückgebracht nach drei. Drei Vorcursszenario's. Da ist eine Marktkonsultation losgelaten, mit als Ziel, zu schauen, ob etwas erhaltbar war. Wir haben nicht alle die Szenario's der Markt aufgebracht und geschaut, gooi, oh, komm, du bist mit allen möglichen Offertes, um es komplett durchzureknen, aber was ist erhaltbar? Und was trägt bei an das Krieren von Werten, in welchem Sinne auch? Net so gut als, was trägt bei an das Sparen auf die Explotatielasten? 1 und 3 sind in Prinzip erhebbar, 2 sehr unsicher, also sind wir weitergegangen mit dem Ausarbeiten von 1 und 3. Bei 3 werden wir gleich sehen, warum 3 in unserer Optik den meisten Vorkur genießen. Was waren nun die Ausgangspunkte, mit denen wir an den Schlag gegangen sind? Das ist natürlich das Verlagen von den strukturellen Kosten auf die Exploitation. Das war das ZBB-Bescheid aus 2021.	After the introduction, I will go through a number of slides in which there may be some overlap. If you think she's going too fast or too slowly, just call her. We can discuss that for a long time. I don't think I can introduce Mr. Mayer. On my right hand for your left is Mark van de Vliet. He is one of the external project leaders of HEVO. In this case, the external project leader of HEVO. who will accompany us in the scenario study to come up with a well-designed story. My name is Beert de Metz and I have taken over the role of Jan Derksen, who you saw in the previous video. I will now take over the role as project leader of this part of the project. Let's have a look. We have already dealt with this a bit. This is about the ZBB decision, a savings of 2,000,000 on the exploitation burden of the housing of the municipal organization in 2023 and from 2024 2,500,000. Research has been conducted to rent more square meters, which was not feasible. And then research has been conducted on alternative possibilities to get to that savings. You were informed about this on the 15th of November. There were six of them. They were brought back to three. Three pre-cursus scenarios. A market consultation was released to see if something was achievable. We didn't look at all those scenarios in the market. You come with all kinds of offers to calculate it, but what is achievable? And what contributes to creating value, in whatever sense? Just as well as what contributes to saving on the exploitation burden. 1 and 3 are in principle achievable. 2 is very uncertain. So we have continued to work out at least 1 and 3. And with 3 we will soon show why 3 in our view enjoys the most advantage. What were the starting points that we started with? That is of course the reduction of the structural burden on the exploitation. That was the ZBB decision from 2021.	Grazie, leggera. Dopo l'introduzione andrò in un paio di slides in cui forse c'è alcun risultato. Se pensate che si fa troppo velocemente o troppo velocemente, chiedeteci. Ci parleremo più tempo. Penso che non posso mostrare il signore Mayer. A mia right-hand for your left is Mark van der Vliet. He is one of the external project leaders of HEVO. In this case, THE external project leader of HEVO. che ci aiuterà alla studia di scenario per arrivare a un vero vero storico. Mi chiamo Beert de Metz, e in questo modo ho preso il posto di Jan Derkse, che vi ha visto nel precedente video. E io ho preso il posto come leader di questa parte del trajetto. Guardate, questo abbiamo già trattato un po'. Questo riguarda il deciso della ZBB, una spara di 2.000 euro per la costa di esploitazione della riuscita della comunità in 2023, e da 2024 2.500 euro. La risposta è che la risposta è che la risposta è che la risposta è che la risposta è che la risposta è che la risposta è che la risposta è che la risposta è che la risposta è che la risposta è che la risposta è che la risposta è che la risposta è che la risposta è che la risposta è che la risposta è che la risposta è che la risposta è che la risposta è che la risposta è che la risposta è che la risposta è che la risposta è che la risposta è che la risposta è che la risposta è che la risp L'ho informato il 15 novembre. Ci erano sei, che sono trasportati a tre, tre scenario di pre-cursi, in cui una consultazione del mercato è stata risolta, con l'obiettivo di guardare se una cosa era pericolosa. Non abbiamo presentato tutti questi scenario al mercato e ci siamo guardati, con tutte le possibili offerte per raccogliere tutto, ma cosa è pericolosa? E cosa contribuisce a creare di valore, in qualsiasi modo, nello stesso modo come cosa contribuisce a risparmiare le coste di esploitazione? 1 e 3 sono in principio vissibili. 2 è molto non sicuro. Quindi siamo andati a sviluppare, in ogni caso, 1 e 3. E con 3 vedremo subito perché 3, in nostra optica, gira di più. Quali erano i punti per cui siamo andati a sviluppare? C'è ovviamente l'uploadio delle strutture di risposta sull'exploitazione. Questo era il scelto della ZBB del 2021.

Recognition by Whisper and Translations with DeepL

Dutch (recognition)	German (translation)	English (translation)	Italian (translation)
Dank u wel, wethouder. Na de introductie ga ik door een aantal slides heen, waarin wellicht wat overlaps zit. Als u denkt, ze gaat te snel of ze gaat te langzaam, roep dan even. Dan kunnen we langer met elkaar in gesprek. Ik denk dat ik de heer Meijer niet moe voor te stellen. Op mijn rechterhand voor u links zit Mark van de Vliet. Hij is één van de externe projectleiders van Hefo. In dit geval dé externe projectleider van Hefo. die ons begeleiden bij de scenario-studie om te komen tot een in ieder geval gedegen verhaal. Mijn naam is Beert de Metz en ik heb in ieder geval voor hier het stokje overgenomen van Jan Derksen die u bij de vorige beeldvormende avond heeft gezien. En ik neem nu even de rol als projectleider van dit deel van het traject op. Even kijken, dit hebben we net al wel een soort van behandeld. Dit gaat over het ZBB-besluit en de besparing van 2 ton op de exploitatielasten van de huisvesting van de gemeentelijke organisatie in 2023 en vanaf 2024 2,5 ton. Onderzoek is verricht naar meer vierkante meters verhuren, dat bleek niet haalbaar. En toen is er onderzoek gedaan naar alternatieve mogelijkheden om toch tot die besparing te komen. Daar bent u op 15 november over geïnformeerd. Er waren er zes. Die zijn teruggebracht naar drie. Drie voorkeursscenario's. En daar is een marktconsultatie op losgelaten met als doel te kijken of iets haalbaar was. We hebben niet al die scenario's de markt opgebracht en gekeken. Gooi, hoe komt u eens met alle mogelijke offertes om het helemaal door te rekenen. Maar wat is haalbaar? En wat draagt bij aan het creëren van waarde in welke zin dan ook? Net zo goed als wat draagt bij aan het besparen op de exploitatielasten? 1 en 3 zijn in principe haalbaar. 2 is zeer onzeker. Dus zijn we verder gegaan met het uitwerken van in ieder geval 1 en 3. En bij 3 zullen we zo meteen laten zien waarom 3 in ons optiek het meeste de voorkeur geniet. Wat waren nou uitgangspunten waar wij mee aan de slag zijn gegaan? Dat is uiteraard het verlagen van die structurele last op de exploitatie. Dat was dus het ZBB-besluit uit 2021.	Vielen Dank, Herr Stadtrat. Nach der Einführung werde ich eine Reihe von Folien durchgehen, wobei es zu Überschneidungen kommen kann. Wenn Sie der Meinung sind, dass sie zu schnell oder zu langsam vorgeht, rufen Sie bitte laut. Dann können wir ein längeres Gespräch führen. Ich glaube nicht, dass ich Herrn Meijer vorstellen muss. Zu meiner Rechten, vor Ihnen auf der linken Seite, sitzt Mark van de Vliet. Er ist einer der externen Projektleiter von Hefo. In diesem Fall ist es der externe Projektleiter von Hefo. Er wird uns durch die Szenariostudie führen, damit wir eine zumindest solide Geschichte entwickeln können. Mein Name ist Beert de Metz, und ich habe, zumindest für diesen Fall, die Nachfolge von Jan Derksen angetreten, den Sie beim letzten Bilderbuchabend gesehen haben. Und ich übernehme jetzt kurz die Rolle des Projektleiters für diesen Teil der Route. Mal sehen, damit haben wir uns ja gerade schon beschäftigt. Es geht um den ZBB-Beschluss und die Einsparung von 2 Tonnen bei den Betriebskosten für die Unterbringung der Gemeindeorganisation im Jahr 2023 und ab 2024 2,5 Tonnen. Es wurden Untersuchungen durchgeführt, um mehr Quadratmeter zu vermieten, was sich als nicht machbar erwies. Daraufhin wurden alternative Möglichkeiten untersucht, um diese Einsparungen dennoch zu erreichen. Darüber wurden Sie am 15. November informiert. Es gab sechs. Sie wurden auf drei reduziert. Drei bevorzugte Szenarien. Und zu diesen wurde eine Marktkonsultation eingeleitet, um zu sehen, ob etwas machbar ist. Wir haben nicht alle diese Szenarien auf den Markt gebracht und geprüft. Werfen Sie mal einen Blick darauf, wie Sie überhaupt auf alle möglichen Angebote kommen, um das alles durchzurechnen. Aber was ist machbar? Und was trägt in irgendeinem Sinne zur Wertschöpfung bei? Ebenso wie das, was zur Einsparung von Betriebskosten beiträgt? 1 und 3 sind im Prinzip machbar. 2 ist höchst unsicher. Also haben wir zumindest an 1 und 3 weiter gearbeitet. Und zu 3 werden wir gleich zeigen, warum 3 aus unserer Sicht am besten geeignet ist. Was waren also die Ausgangspunkte, an denen wir zu arbeiten begannen? Es geht natürlich darum, die strukturelle Belastung der Betriebe zu verringern. Das war also der ZBB-Beschluss aus dem Jahr 2021.	Thank you, councillor. After the introduction, I will go through a number of slides, which may contain some overlap. If you think, she's going too fast or she's going too slow, please call out. Then we can have a longer conversation. I don't think I need to introduce Mr Meijer. On my right in front of you on the left is Mark van de Vliet. He is one of Hefo's external project leaders. In this case, Hefo's external project leader. who will guide us through the scenario study to come up with an at least solid story. My name is Beert de Metz and, at least for here, I have taken over from Jan Derksen, whom you saw at the previous image-forming evening. And I am now briefly taking on the role of project leader for this part of the route. Let's see, we did sort of deal with this just now. This is about the ZBB decision and the saving of 2 tonnes on the operating costs of housing the municipal organisation in 2023 and from 2024 onwards 2.5 tonnes. Research was done to rent out more square metres, which proved not feasible. And then alternative options were investigated to still achieve those savings. You were informed about that on 15 November. There were six. They were reduced to three. Three preferred scenarios. And a market consultation was launched on those with the aim of seeing if something was feasible. We didn't take all those scenarios into the market and look. Throw, how do you even come up with all the possible bids to calculate it all the way through. But what is feasible? And what contributes to value creation in any sense? As much as what contributes to saving on operating costs? 1 and 3 are feasible in principle. 2 is highly uncertain. So we have continued to work on at least 1 and 3. And on 3, we will show in a moment why 3 is most preferable in our view. So what were starting points that we started working on? That is, of course, to reduce that structural burden on operations. So that was the ZBB decision from 2021.	Grazie, consigliere. Dopo l'introduzione, passerò in rassegna una serie di diapositive, che potrebbero contenere alcune sovrapposizioni. Se pensate che stia andando troppo veloce o che stia andando troppo piano, vi prego di segnalarlo. Poi potremo avere una conversazione più lunga. Non credo di dover presentare il signor Meijer. Alla mia destra, di fronte a voi sulla sinistra, c'è Mark van de Vliet. È uno dei responsabili dei progetti esterni di Hefo. In questo caso, il capo progetto esterno di Hefo. che ci guiderà attraverso lo studio dello scenario per arrivare a una storia almeno solida. Mi chiamo Beert de Metz e, almeno per ora, ho preso il posto di Jan Derksen, che avete visto nella precedente serata di formazione delle immagini. E ora assumo brevemente il ruolo di capo progetto per questa parte del percorso. Vediamo, ce ne siamo occupati poco fa. Si tratta della decisione della ZBB e del risparmio di 2 tonnellate sui costi di gestione degli alloggi dell'organizzazione comunale nel 2023 e dal 2024 in poi di 2,5 tonnellate. È stata fatta una ricerca per affittare più metri quadrati, che si è rivelata non fattibile. Sono state quindi studiate opzioni alternative per ottenere comunque questi risparmi. Ne siete stati informati il 15 novembre. Erano sei. Sono state ridotte a tre. Tre scenari preferiti. E su questi è stata avviata una consultazione di mercato con l'obiettivo di verificare se qualcosa fosse fattibile. Non abbiamo portato tutti questi scenari sul mercato e abbiamo cercato. Come si fa a trovare tutte le offerte possibili e a calcolarle fino in fondo? Ma cosa è fattibile? E cosa contribuisce alla creazione di valore in un certo senso? Quanto quello che contribuisce a risparmiare sui costi operativi? 1 e 3 sono fattibili in linea di principio. Il punto 2 è altamente incerto. Abbiamo quindi continuato a lavorare almeno su 1 e 3. E per quanto riguarda il 3, mostreremo tra poco perché, a nostro avviso, il 3 è preferibile. Quali sono stati i punti di partenza su cui abbiamo iniziato a lavorare? Ovviamente, ridurre l'onere strutturale sulle operazioni. Questa è stata la decisione della ZBB a partire dal 2021.

A new ASR tool: aTrain

Update Whisper Large Model

How Might We Create Better Benchmarks for Speech Recognition?

How researchers digitally preserve Holocaust evidence

The dubbing artist: 'That's how Artificial Intelligence stole my voice'

Exploring the possibilities of Thomson’s fourth paradigm transformation—The case for a multimodal approach to digital oral history?

The State of Automatic Speech Recognition

OH and Speech Technology News