National Museums Liverpool case study: A guide to producing cost-effective multilingual audio guide with Text-to-Speech

12 Mar

Scaling inclusivity: How Podego worked with National Museums Liverpool to launch five-language audio guides in under a month.

National Museum Liverpool’s 2025 campaign

As stated in their campaign, National Museum Liverpool aims to become a museum for everyone. Our team was honored to get the chance to work with National Museums Liverpool (NML), an organization who is bold, inclusive and welcoming. Aligned with our mission at Podego, we worked together with NML to achieve this goal by expanding their audio guides into six languages, for three of their venues: the World Museum, Walker Art Gallery, and the Museum of Liverpool.

"Fantastic, brilliant. We appreciate the level of service and speed provided, especially when we need a quick turnaround. The pricing is reasonable for a public organization like ours with a tight budget.”
- Gianna Gomes, Digital team lead, Museum of Liverpool

In this case study, we elaborate on how Podego used Large Language Models (LLM) in concert with a native speaker reviews in a cost-effective workflow. If you are considering using AI or Text-to-Speech (TTS) in your process, here are some insights you should know.

Working together towards inclusivity with multilingual content

The National Museums of Liverpool (NML) consists of a seven free museums and galleries in the area of Liverpool, England. Together, they boast collections that range from modern to antique.

To expand its educational content into multiple languages for its diverse community, NML faced significant hurdles. Translation agencies quoted prices at quadruple the cost of other solutions, while individual freelance translators estimated a long completion time.

Podego aims to reduce the hurdles required to achieve similar results. Here’s the workflow that you can implement:

Write the scripts first. Our use of LLMs and Natural Language Processing (NLP) allow for faster turnaround times. This enables us to craft translations easily, giving everyone more breathing room to perfect the content.
Produce audio with Text-to-Speech voices. It ensures an agile, scalable, and affordable process.
And afterwards, native speakers review the content to ensure high quality results.

The Findings

Prompt systems made with native speakers are vital for cultural nuance
The first step was to translate the scripts into French, Mandarin, German, Arabic, and Polish from English. At Podego, we don’t just feed a script into ChatGPT or Google Translate and say “translate.” The team consulted with native speakers for each language and came up with a prompting structure to write and adapt the text with sensitivity to cultural nuance and metaphors.
Large Language Models and Text-To-Speech performance varies by language.
We tested several LLMs and learned that some are more effective at certain languages than others. After the scripts were created, we reviewed them again with native speakers and used their feedback to tighten up the translations so they could sound even more authentic.

Expressions, idioms, and imagery change drastically between cultures. The same concept can be conveyed differently depending on the language.
For example, in Chinese culture, one translation read: “they often serve as models for their father’s paintings.” We revised this to “they are often the muses on their father’s canvas.” In Chinese, describing beautiful people as muses is a familiar metaphor, while calling someone a “model” comes across as overly blunt.
Choose Text-To-Speech voices with consistent tone to ensure brand consistency
We shared with NML a few voice options for each language to choose from. Options were selected based on how they fit their brand. We were delighted to find that the folks at NML had a flare for the fun stuff.
Avoid language-mixing and always double check the Text-To-Speech output.
For example, in the Polish phrase “Ten ołtarzowy,” the word is written correctly, but the model intepreted “Ten” as the English number 10 (dziesięć) rather than the Polish word for "this."
Avoid abbreviations in Text-To-Speech scripts
For instance, in Polish, the abbreviation “sw.” is a universal shortcut for święty, święta, or świętego (holy/saint). The word changes based on whether the subject is masculine or feminine. While humans use context to read "sw." correctly, a Text-To-Speech model often gets confused about which specific version of the word to pronounce. Therefore, abbreviations should be written out in full.

This process still requires human input to direct the flow of speech, tone, and voice quality. However, thanks to the inherent speed of the technology, testing and iterating was easy.

NML received a complete catalogue of audio guides in five languages (French, Polish, Arabic, Mandarin, and German) in less than a month content that sounds like it was recorded by a local, not AI.

Curious about the result? Check it out at Bloomberg Connects.

Want to give your audio guides a wider audience?

If your museum's content is only available in one language, there's an audience you're not reaching. We can change that - quickly, affordably, and easily.

Drop us a line at hello@podego.com and we’ll provide you with a free trial access to our CMS, which utilizes the workflow mentioned above. We can also put together a sample in your preferred language.

We're already keeping an eye on the NML results to see how the new guides land with their diverse visitors. We'll update this piece as the data comes in!

Danna Mulya

National Museums Liverpool case study: A guide to producing cost-effective multilingual audio guide with Text-to-Speech

Working together towards inclusivity with multilingual content

The Findings

Want to give your audio guides a wider audience?

Podego and Kids in Museums Launch Pilot to Co-create Interactive Digital Experience for Teen Visitors

Podego makes the ‘100 To-Watch’ list!