Looking for a bargain? – Check out the best tech deals in Australia

The Internet Is Not Forever: 38% of Web Pages From 2013 No Longer Exist

A Pew report on 'digital decay' also finds tweets to be an increasingly ephemeral medium.

(Credit: Igor Nikushin/Getty Images)

A new study on the state of link rot suggests that a floppy disk might have better odds of surviving the next decade with its information intact than a web page published today. 

The Pew Research Center finds that 38% of the pages extant in 2013 were no longer accessible in October 2023 and that a quarter of the pages that were online at some point over that period have now vanished.

(Credit: Pew Research Center )

Pew’s study, based on a sample of almost a million pages recorded by the nonprofit archive Common Crawl, also documents how this “digital decay” has eroded the utility of news and government sites as well as Wikipedia, with links at those places increasingly serving up only 404 error messages.

On a sample of 500,000 government sites, 21% of those pages featured at least one broken link. Across 2,063 news sites, that fraction was 23%. And among 50,000 English-language Wikipedia pages sampled, 54% harbored at least one busted link in their “References” section.

(Credit: Pew Research Center )

Twitter Is Even Worse

In addition to the web, Pew’s researchers inspected X, still called Twitter at the time of their survey, and that's even more ephemeral. Of 4.8 million tweets Pew collected from March 8 to April 27 last year using a platform API, 18% were no longer publicly visible by June 15.

Just over 60% of those tweets disappeared because the accounts behind them had also vanished from public sight (the report doesn’t break down how many had been set to private or deleted by their owners versus suspended by the platform), while the remaining vanished tweets came from accounts that remained accessible.

Tweets in certain languages had a shorter half-life than others, with 49% of Turkish tweets and 42% of Arabic tweets disappearing over the study period. And tweets from accounts with default bios or profile pictures were also more likely to have gone poof over those weeks. 

But Pew’s report also saw that 6% of the tweets it collected first disappeared and then blinked back into public view, either because the account holder switched it from public to private and back to public or because Twitter itself reinstated the account. 

Why Is This Happening?

Pew’s short report doesn’t explore why so much content vanishes so rapidly, but I’ve seen a huge chunk of my own online work go down the drain for two common reasons: The publication switches to a new content management system without preserving links created under the old CMS, or the publication itself goes out of business.   

Sometimes, a news site’s collapse is followed immediately or within hours by its owner deleting the site, a last act of corporate vandalism that leaves newly unemployed journalists struggling to share their latest work with possible future employers.

Web archives such as Common Crawl and the Internet Archive, however, can often serve up a copy of a deleted page (the latter also has an impressive collection of digitized copies of analog media that includes this publication’s oldest print editions). Many Wikipedia articles point to both original links and Internet Archive copies of them, one of many sound practices at that collectively maintained reference site.

The Internet Archive also collects tweets; for example, you can use it to browse tweets from the @ElonJet account that shared real-time flight data of Elon Musk’s Gulfstream G650ER private jet until Musk banned that account. And if you want to ensure that your own tweets remain viewable even if you let your account go dormant and risk possible deletion, you can download an archive of them from X and then upload that file to the Archive.

About Rob Pegoraro