Circle U, NLPL, & OpenEuroLLM 2026 Winter School - Skeikampen, Norway

I gave a lecture at the Circle U, NLPL, & OpenEuroLLM 2026 Winter School on Multilinguality in LLM Development and Evaluation in February at Skeikampen, Norway.

The title of the talk was “Challenges in Evaluating Generative Models”. In case you are interested, here are the slides.

Abstract: In this talk, we will discuss the evaluation of generative models, in particular Large Language Models (LLMs). Given that such models produce open-ended output, their evaluation requires different techniques than static evaluations such as simple question-answering benchmarks. We will first discuss human annotations and their use in leaderboards such as LMArena and ComparIA. We will then focus on automatic evaluation relying on LLM judges. In particular, we will describe current challenges with LLM judges before discussing their application in multilingual settings.

Thanks a lot to the organizers for inviting me!

Written on February 4, 2026