What is multimodality?
This post explores how short viral videos use multiple modes – sound, image, and text – to tell powerful little stories, and why recognising this can help us make smarter use of video in the classroom.

A confident pig
Last year, a video appeared on TikTok.
It shows a pig walking purposefully across a field. With each step, its large body bounces from side to side and its ears flap up and down.
👉 Click on the image to watch the video on TikTok.
There’s something comical about the way it moves – which is probably why it was shared in the first place.
Even if we watch a video like this in silence, we are not just passive viewers. Human beings are creatures of semantics, and when we engage with a video, we make meaning.
Making meaning involves telling stories. Whether you realise it or not, you ask questions about the pig, and your imagination provides you with answers:
- Is this a farm animal or someone’s pet?
- Where is the pig going and what is it thinking?
- What is the pig’s story?
My own story is a simple one: Someone is standing at the edge of the field, filming the scene. Whoever it is, they’ve met this pig before and they’re visiting it to give it a treat – a juicy apple, perhaps. The pig is happy about this and is walking purposefully across the field to see its human friend and get its reward.
Of course, I have no way of knowing if this is true. But that’s the way it works.
What about you? What sort of story did you create when you watched the video? Whatever it was, I guarantee it will change when you see a second video of the same pig.
This one was uploaded to Instagram a few days later by the same user (@Angelo_dorny) who revealed a great discovery: the pig moves perfectly in time to 'Stayin’ Alive', the 1977 pop song that provides the new audio track.
👉 Click on the image to watch the video on Instagram.
Notice how the music influences the story we create. Suddenly, the pig becomes a strutting, confident character. Maybe he’s heading out for a night on the town. Maybe he’s got a date and a night of dancing and sweet piggy love ahead.
Combined elements
Video combines different elements or ‘modes’ to create meaning – most commonly, moving images with sound.
In our pig video, the visuals and the music each contribute their own layer of meaning. But together, they create a story greater than either could tell alone.
This is multimodality at work – the interplay of multiple modes (images, text and sound, for example) to co-construct meaning.
As a concept, it can apply to all kinds of media: websites, picture books, posters and packaging. But for now, we’re sticking with video.
A self-defeating sheep
Let’s switch from pigs to sheep.
In this next video, a helpful boy pulls a sheep out of a ditch. Upon being freed, the sheep runs in a circle, then takes a spectacular leap – diving headfirst into the same ditch.
A caption at the top of the video reads: Me trying to save my friend from a toxic relationship.
👉 Click on the image to watch the video on Instagram.
In this case, it’s the moving images and the text that co-construct meaning.
The moving images provide a strong visual narrative involving a self-defeating sheep. Meanwhile, the caption gives us a new context and a clever metaphor for a very human situation.
Teaching English
So how does all this relate to our profession? Why is multimodality an up-and-coming buzzword in English language teaching? How important is it to teach multimodal literacy?
As language teachers, we already have enough on our plate. Personally, I’m not sure we need to add to an ever-growing list of professional responsibilities – especially ones that go beyond just teaching English.
That said, recognising how video combines different modes of meaning can help us make better use of it in the classroom. In other words, it has practical implications – and this is something we’ll explore in more detail on my course. It would be great to have you on board 🚢

For a clear example of multimodality in action, I recommend this lesson plan:




Responses