Museums today face mounting pressure to deliver accessible, engaging, and multilingual interpretation, all while grappling with shrinking budgets and leaner teams. The traditional audio guide, though beloved, is increasingly becoming impractical for many institutions. But can artificial intelligence (AI) truly step in?

Let's explore how AI can responsibly expand museum interpretation without completely replacing the human touch.

Rising Pressure, Limited Budget

Museums worldwide are feeling the pinch:

A recent overview of the SHIFT surveys in heritage institutions found that 57% of respondents named budget constraints as a key barrier to adopting new digital technologies
Visitor expectations have evolved: audiences demand inclusive, flexible content accessible from personal devices.
Producing comprehensive audio guides, especially multilingual, audio-described, or BSL-interpreted versions, is notoriously costly and time-consuming.

This is leading museums to ask: “How can we deliver inclusive interpretation on a shrinking budget?”

What traditional audio guides offer

Audio guides have been core to museum interpretation for decades, offering polished content but at a steep cost:

High Production Costs: With studio rentals, voice talent, and editing, costs spiral quickly, especially when creating inclusive exhibitions with multiple languages.
Lengthy Production Timelines: Scriptwriting and curatorial approval can stretch weeks or even months, limiting flexibility and making it difficult to iterate, localize, and scale content.
Limited Audience Adaptation: Research (Koide et al., 2015; Pihko et al., 2011) shows novices and experts interact differently with exhibitions, suggesting one-size-fits-all content isn’t optimal.

What we know AI can do today

AI isn't yet a full substitute for traditional processes, but it can significantly:

Cut costs (depending on implementation and scale, this could reduce prices by 80%)*
Accelerate timelines
Improve accessibility

Tools such as ChatGPT can be used to draft and iterate scripts quickly, and ElevenLabs or Descript can be used for quick audio generation, already transforming exhibition project management. These technologies make iterative, multilingual, and accessible content feasible, particularly for temporary exhibitions.

But if you're considering AI for your interpretation strategy, it may save you time, but you’ll also encounter some specific challenges along the way.

Can AI fully replace audio guides?

While the promise is real, generative AI also brings well-documented challenges, especially in sensitive museum contexts where accuracy, tone, and trust matter.

At Podego, we've seen firsthand where AI still needs human oversight, and where we’ve had to build solutions through trial and error. Here are the most common issues we’ve faced:

1. Quality and Tone Issues with Synthetic Voice

One of the most immediate challenges is pronunciation. Synthetic voices often stumble over pronunciation, intonation, and cultural nuances, elements critical for visitor engagement. Our team has spent hours perfecting single words to ensure quality, proving that human oversight remains very important.

*Hencas, our Audio Tech lead, contemplating his life choices after attempting to recreate a regional accent pronunciation.*

While AI voice technology is improving, there’s still a gap when it comes to intonation, warmth, and cultural nuance, all of which are vital in engaging interpretation.

2. Risks of Generic or Non-Contextual Content

Asking AI to "just write" an interpretation doesn't work, especially in a museum context. Without being grounded in curatorial sources or exhibition-specific material, AI tends to produce bland, overly generic, or outright inaccurate content. This type of content can also be filled with false information (known as hallucinations). AI hallucinations are when AI generates false information that sounds plausible. It might cite sources that don’t exist, misattribute facts, or fabricate quotes. In some cases, it even provides a reference link or book title, but when you follow up, the source simply isn’t there.

AI claiming that parachutes don’t actually prevent injuries during skydiving

Accuracy is non-negotiable in museums. Podego addresses this by integrating Augmented Generation (RAG), verifying each AI-produced sentence against curated sources.

Concerned about data privacy? At Podego, your internal content remains confidential and never feeds into general-purpose AI models.

3. The Ongoing Need for Human Editing and Cultural Sensitivity

AI-generated text often falls into repetitive patterns and struggles with cultural awareness or the ability to navigate tone shifts across different exhibitions or visitor communities. Our human writers, prompters, and audio editors step in to refine and adapt the content, ensuring that it feels genuinely authored, appropriately voiced, and contextually respectful.

For example, in Chinese culture, one translation read: “they often serve as models for their father’s paintings.” We revised this to “they are often the muses on their father’s canvas.” In Chinese, describing beautiful people as muses is a familiar metaphor, while calling someone a “model” comes across as overly blunt.

4. Accessibility Compliance Isn't Automatic

While AI can accelerate the production of audio, transcripts, or even BSL-ready scripts, it won’t automatically meet accessibility standards. Formatting, pacing, and navigation all require human quality control to ensure they truly support blind or partially sighted visitors.

This is why Podego follows a hybrid approach. We don’t believe in full automation when it comes to accessibility. Every piece of generated content is reviewed and refined by our native audio specialists, ensuring it meets both technical standards and real-world user needs. AI helps us move faster, but humans make sure we get it right.

5. Audience Perception of Authenticity

Visitors can tell when something feels robotic or disconnected. Authenticity matters, especially when telling stories, representing lived experiences, or amplifying underrepresented voices. AI can support that process, but it can’t replace human judgment and emotional intelligence.

Conclusion: It's not about replacement, it's about expanding interpretation

So, can AI fully replace human-produced audio guides?

The answer is no, AI can’t fully replace audio guides

AI isn't here to replace audio guides, it’s here to expand what’s possible.

By dramatically lowering production barriers and costs, AI allows museums to:

Reach previously excluded audiences
Offer multilingual and diverse interpretation
Experiment with content differentiation

The future of interpretation isn’t just faster or cheaper, it’s broader, deeper, and more inclusive than we imagined.

If you think we’re building something helpful for you, we’d love to hear.

At Podego, we're dedicated to simplifying your interpretation workflow and helping you scale content creatively and inclusively.

We'd love your insights at hello@podego.com

What are your current interpretation challenges? How might AI best support your team? What features would you find most helpful in our platform?

Your input directly shapes our solutions, and to thank you, we'll offer three months of free access upon its release.

Let’s build better guides, together.

*Sources:

https://heritagemanagement.org/embracing-technology-in-cultural-heritage-overcoming-barriers-to-engagement-and-accessibility/

https://mysmartjourney.com/en-ca/post/how-much-does-it-cost-to-create-a-professional-tour

https://centersmarttourism.world/for-tourism-organisations/audio-guides/prices/

https://www.nubart.eu/audio-guides/implementing-pwa-museum-guide.html

Can museums fully replace audio guides with AI?