Company
Bringing video into translation evaluation at WMT26
Bringing video into General MT evaluation at WMT26
For anyone working in dubbing, the WMT26 General MT question is simple: how do you test translation when the source is spoken dialogue and a transcript is not enough? A good line has to fit the speaker, the timing, the image, and the performance signals a transcript will miss.
"It's so much harder than just text translation," says Anton Dvorkovich, Dubformer's CEO and co-founder.
Dubformer is participating in the WMT26 General MT task, in the Spoken Dialogue domain. This is the main General MT shared task. For Dubformer, this is where the video question belongs: translation tested against spoken material with original video and machine-generated transcripts.
Since 2022, WMT's main findings papers have tracked the field through the LLM era. Anton co-authored the WMT22 findings paper. Anton and Sergey Dukanov, Dubformer's CTO and co-founder, later co-authored the WMT25 findings paper. At WMT26, Sergey is helping organize the Spoken Dialogue domain by collecting the test set, helping annotate the data, and writing guidelines.
Why WMT matters
WMT gives the machine translation field a common place to test systems under shared rules. That matters because translation quality is easy to describe and hard to measure. A system can look strong on short text samples and still fail when the source is spoken dialogue, a noisy transcript, or a video scene where meaning comes from what the viewer can see.
Anton describes the problem WMT is built to address: "It measures machine translation quality, which is a kind of a hard thing to define exactly, but we all have general understanding of what is a good translation, what is a bad translation. WMT is solving the problem of how do you put this feeling into numbers."
For media translation, speech includes "stuff we communicate when speaking which is not put into words." A dubbing translation has to respect timing, voice, speaker intent, and scene context.
Dubformer and WMT
The paper titles tell the main-task story. In 2022, Anton co-authored Findings of the 2022 Conference on Machine Translation (WMT22). At WMT23, he co-authored Findings of the 2023 Conference on Machine Translation (WMT23): LLMs Are Here But Not Quite There Yet. WMT24 continued with Findings of the WMT24 General Machine Translation Shared Task: The LLM Era Is Here but MT Is Not Solved Yet. WMT25, co-authored by Anton and Sergey, named the next evaluation problem: Findings of the WMT25 General Machine Translation Shared Task: Time to Stop Evaluating on Easy Test Sets.
The WMT record is more useful than any single ranking because it shows what works, what fails, and which test conditions still need research.
Dubformer's WMT record spans submissions, partnership, organizer work, and General MT participation. Anton organized General MT in 2024 and 2025. Sergey joined the WMT25 organizing team, co-authored that edition's findings paper, and continues on WMT26 General MT. This year, his work is focused on the Spoken Dialogue domain.
The WMT24 findings paper notes that a proprietary Dubformer engine was used in the preparation of English-language speech material. In WMT24 General MT, our submission ranked among the top systems across five language pairs. On the speech-domain tests, it placed first among machine translation systems for English to Spanish and English to Russian.
Even perfect fluency can be wrong. Anton says, "In a lot of cases people are not perfect when they speak. And if you're doing the perfectly fluent version, that's actually going to be really weird." A dub should sound "very natural, not robotic."
What changes at WMT26
WMT26 will be co-located with EMNLP 2026 in Budapest. The General MT task continues the move toward harder evaluation with document-level test sets and a Spoken Dialogue domain released with original video and machine-generated transcripts.
Human evaluation for the Spoken Dialogue domain will use the original video, so judges can assess translations with the context that makes speech difficult. The transcript is still present, but it is generated by speech recognition and can contain errors. That makes the task closer to dubbing work: systems need to handle imperfect transcripts, speaker intent, visual context, and document-level context.
This is the relevant WMT26 task for Dubformer. The question is not only whether a sentence is fluent. It is whether the translation fits the spoken moment.
Sergey Dukanov, Dubformer CTO and co-founder and WMT26 General MT organizer, says:
"Spoken dialogue is where machine translation stops being only a text problem. A transcript alone cannot tell you why a line should be hesitant, clipped, emotional, or timed to a face on screen. By helping organize this WMT26 domain, I want to make those constraints part of the shared test, so teams building for speech, voice, and video are measured against the same hard conditions."
That work supports a clearer shared test for teams building machine translation for speech, voice, and video.
WMT26 details for participants
WMT26 General MT: test data release on 18 June 2026.
Translation submissions close on 2 July 2026.
The task includes document-level test sets.
The Spoken Dialogue domain is released with original video and machine-generated transcripts.
Human evaluation uses the original video for the Spoken Dialogue domain.
The task is relevant for teams working on MT for speech, voice, video, and document-level context.
Take part
Researchers building machine translation systems can join the WMT26 General MT task. Teams working on dubbing, voice, and video translation should pay particular attention to the Spoken Dialogue domain.






