Schibsted invests in text-to-speech to engage audiences

By Paula Felps

INMA

Nashville, Tennessee, United States

Connect      

As audio opportunities evolve, news publishers are looking at new ways to include them as part of their content strategy. At the same time, audiences are becoming increasingly dependent on audio experiences, further driving the need for news media organisations to explore how to implement audio as an option.

During this week’s Webinar, INMA members heard from Lena Beate Hamborg Pedersen, product manager at the Norwegian media company Schibsted, who shared the company’s text-to-speech journey with Aftenposten, its largest publication. That journey began more than three years ago, at a time when few companies were using text-to-speech. Now, Pedersen said, it is something more media companies are starting to offer.

The audio explosion

In 2020, a national report on podcasts in Norway found 38% of people listened to podcasts at least once a year. By 2023, that number has jumped to 46%. In 2022, 15% of Norwegians said they listened to podcasts daily; this year, that has increased to 21%.

“When we look at who is listening, most listeners in Norway are between 30 and 39 years old,” Pedersen said. “We have, increasingly, growth among the younger audience, and that’s an important target group for us.”

Podcasts are driving the popularity of other audio formats. During this week's Webinar, Lena Beate Hamborg Pedersen explained how Schibsted is leveraging text-to-speech technology.
Podcasts are driving the popularity of other audio formats. During this week's Webinar, Lena Beate Hamborg Pedersen explained how Schibsted is leveraging text-to-speech technology.

But it isn’t just young audiences clamouring to audio; the biggest growth so far in 2023 is among listeners 50+. What is particularly interesting, Pedersen said, is that Aftenposten is a subscription newspaper with many customers in that age group. The fact that they are more financially secure and willing to pay for news and content is a winning combination for news media companies.

“Another fact … that we have learned through talking to our subscribers and others who are interested in audio is that listeners stay subscribed for longer,” Pedersen said. “You can read your newspaper, but if you also can listen to the news, then it’s more opportunities in your life to use your subscription. There are so many opportunities where you can listen where you can’t read. So you feel that you get more value for money.”

If news publishers have access to text-to-speech technology, it makes sense for them to offer an audio alternative when they publish an article. It can, for example, keep audiences engaged on long drives — a time when they could not consume anything on a screen or in print.

Betting on audio

Schibsted placed its bets on text-to-speech because it allows it to produce more content while saving time and money. One question Pedersen said she hears often is why doesn’t the company use journalists to read their own stories?

“The fact is, [they] are not trained in reading. They are experts in writing stories. And you need to have a pleasant, trained voice that makes it easy for you to record without stumbling in the words.”

Human-recorded tracks also need to be edited and sound-adjusted before being uploaded — all time-consuming tasks that take them away from what they’re best at: researching and writing about problems in the world around us and exploring what can be done.

Another factor is the transient nature of news itself, which changes minute by minute. Making updates would require journalists to record new content, then put the audio through the same editing and uploading process. With text-to-speech technology, all it requires is for an editor to change a word or a title, and the text is automatically changed.

Changing consumption habits

To illustrate how Aftenposten uses text-to-speech, Pedersen showed the audience an unpublished story and explained that when it is published, an audio file created using a cloned voice will also be available. On the desktop, listeners can see how long it will take to listen to the article and can choose the speed at which they listen to it. That is an important feature, she said, because it allows people to consume more content.

Providing the option of an audio track gives audiences more ways to consume content and can keep them engaged longer.
Providing the option of an audio track gives audiences more ways to consume content and can keep them engaged longer.

“[Listeners] prefer to consume news on high speed. So it’s interesting to follow if this is increasing habits,” Pedersen said. In addition to listening faster, they are also getting more out of the story, she said: “People tell us that when they read an article, they skim it, but when they listen to an article, they get more details.”

This is beneficial for longer stories, but also is resonating with different audience segments, such as those with learning disabilities who struggle with reading or immigrants who are learning the language: “It’s much easier to learn [to speak Norwegian] if you can listen and not read.”

Students with attention disorders also fare better with audio stories, but Pedersen said kids and teens overall prefer listening to reading — which is something publishers should pay attention to.

“This is a really important fact because kids and teens tend to take their media habits with them into adulthood. So looking into the future when these kids become adults and are going to buy a subscription, maybe they just want to listen.”

How Aftenposten found its voice

Once it decided to use text-to-speech technology instead of humans, Aftenposten had to determine what its voice should sound like. Rather than using a synthetic voice, it decided to clone the voice of its well-known podcast host Anne Lindholm. Working with technology partner Beyond Words, she recorded 6,812 sentences, which were then used to create a new voice that could read the articles.

Aftenposten partnered with Beyond Words to clone the voice of its popular podcast host, Anne Lindholm.
Aftenposten partnered with Beyond Words to clone the voice of its popular podcast host, Anne Lindholm.

When it launched, the voice was instantly familiar to listeners, and Pedersen said they had to be told it was a cloned voice. Aftenposten will continue improving the pronunciations and clarity of the voice “because if people are going to listen for half an hour, 15 minutes, then it needs to be super perfect.”

One of the next steps it will take is to look at creating playlists and make it possible for listeners to queue up audio articles so they don’t have to look at their audio device and find the next story when one ends.

Currently, about 8% of subscribers are using the text-to-speech feature. While that’s not a huge number, it remains stable: “Those who are interested are continuing to use it and we have learned that if we are making a good opportunity to queue, people are interested in listening more. So that’s our main focus.”

The next thing Schibsted will explore is whether people are willing to pay for audio articles: “There is an increasing willingness to pay for audio, and what we are looking into now is how willing are they to pay for text-to-speech? We haven’t learned enough there yet.” 

If you’d like to subscribe to my bi-weekly newsletter, INMA members can do so here.

About Paula Felps

By continuing to browse or by clicking “ACCEPT,” you agree to the storing of cookies on your device to enhance your site experience. To learn more about how we use cookies, please see our privacy policy.
x

I ACCEPT