EchoNet++: A multilingual soccer match audio commentary dataset

Sci Rep. 2026 Feb 17;16(1):8884. doi: 10.1038/s41598-026-39884-8. ABSTRACT We present a robust, multilingual, and multistage audio analysis pipeline for automatic processing of professional soccer match broadcasts. Rather than proposing a new standalone ASR model, this work intr…

Open original articleExtraction: feed_summaryCached 11 May 2026, 6:39 am

Actions

Reader

Sci Rep. 2026 Feb 17;16(1):8884. doi: 10.1038/s41598-026-39884-8.

ABSTRACT

We present a robust, multilingual, and multistage audio analysis pipeline for automatic processing of professional soccer match broadcasts. Rather than proposing a new standalone ASR model, this work introduces a reproducible end-to-end system and systematic component-level analysis that reveal how preprocessing, segmentation, transcription, and translation jointly affect performance under real-world broadcast conditions. The pipeline performs audio extraction, denoising, speech segmentation, transcription, and translation in a unified framework, enabling scalable, language-agnostic audio understanding. Audio is extracted from match videos using FFmpeg, transformed into the frequency domain via FFT, and filtered with a Butterworth bandpass filter (300 Hz-7 kHz) to isolate speech-relevant frequencies. We apply deep learning-based denoising using Demucs and segment speech regions using Silero VAD, followed by categorization of commentator and spectator streams based on temporal and density cues. Each speech segment is transcribed using multiple Automatic Speech Recognition (ASR) models, including Whisper variants (Medium, Large, Turbo) and Insanely Fast Whisper, and non-English transcripts are translated into English. All results, including transcriptions, translations, and metadata (timestamps, segment labels, and language codes), are exported as match-structured JSON files. We evaluate our pipeline on full-match videos from Europe's top leagues, achieving low word error rates, accurate language detection, and consistent segmentation across diverse acoustic conditions. To support reproducible research and downstream tasks such as summarization and analytics, we will publicly release the dataset and processing pipeline.

PMID:41703173 | PMC:PMC12988046 | DOI:10.1038/s41598-026-39884-8