Category Archives: Wayback Machine – Web Archive

New Ways to Search Archived Music News

Posted on July 10, 2024 by Mark Graham

First crawl of CMT News on January 10, 2002.

When MTVNews.com went offline in late June, Internet users were quick to discover that some (but sadly, not all) of the site had been archived in the Internet Archive’s Wayback Machine. While you can no longer browse MTV News directly on the web, the archived pages are available via the Wayback Machine, starting with the first crawl of the site on July 5, 1997.

The same is true for CMT (Country Music Television) News, which was first crawled by the Internet Archive on January 10, 2002.

In response to patron requests, our engineers have created new search indexes for each site:

Search more than 470,000 pages from MTV News (here’s a sample search for Peter Gabriel): https://web.archive.org/mtv.com/search/%22Peter%20Gabriel%22

Search more than 70,000 pages from CMT News (here’s a sample search for Dolly Parton): https://web.archive.org/cmt.com/search/%22Dolly%20Parton%22

Why provide search indexes to music news? Because, as Michael Alex, founding editor of MTV News Digital, wrote in an op-ed for Variety, “the archives of MTV News and countless other news and entertainment organizations have a similar value: They’re a living record of entertainment history as it happened.”

It’s important to remember that these collections were captured as a routine part of the daily work conducted by more than one thousand libraries and archives collaborating with the Internet Archive to archive the web. For centuries, libraries have been the trusted repositories of culture and knowledge. As our news and information sources move increasingly digital, the role of libraries like the Internet Archive and our partners has changed to meet these new demands. This is why libraries like ours exist, and why web archiving is critical for preserving our shared digital culture.

Internet Archive and the Wayback Machine under DDoS cyber-attack

Posted on May 28, 2024 by Chris Freeland

The Internet Archive, the nonprofit research library that’s home to millions of historical documents, preserved websites, and media content, is currently in its third day of warding off an intermittent DDoS (distributed denial-of-service) cyber-attack. According to library staff, the collections are safe, though service remains inconsistent. Access to the Internet Archive Wayback Machine – which preserves the history of more than 866 billion web pages – has also been impacted.

Since the attacks began on Sunday, the DDoS intrusion has been launching tens of thousands of fake information requests per second. The source of the attack is unknown.

“Thankfully the collections are safe, but we are sorry that the denial-of-service attack has knocked us offline intermittently during these last three days,” explained Brewster Kahle, founder and digital librarian of the Internet Archive. “With the support from others and the hard work of staff we are hardening our defenses to provide more reliable access to our library. What is new is this attack has been sustained, impactful, targeted, adaptive, and importantly, mean.”

Cyber-attacks are increasingly frequent against libraries and other knowledge institutions, with the British Library, the Solano County Public Library (California), the Berlin Natural History Museum, and Ontario’s London Public Library all being recent victims.

In addition to a wave of recent cyber-attacks, the Internet Archive is also being sued by the US book publishing and US recording industries associations, which are claiming copyright infringement and demanding combined damages of hundreds of millions of dollars and diminished services from all libraries.

“If our patrons around the globe think this latest situation is upsetting, then they should be very worried about what the publishing and recording industries have in mind,” added Kahle. “I think they are trying to destroy this library entirely and hobble all libraries everywhere. But just as we’re resisting the DDoS attack, we appreciate all the support in pushing back on this unjust litigation against our library and others.”

End of Term Web Archive – Preserving the Transition of a Nation

Posted on May 8, 2024 by Antoine

It’s that time again. The 2024 End of Term crawl has officially begun! The End of Term Web Archive #EOTArchive hosts an initiative named the End of Term crawl to archive U.S. government websites in the .gov and .mil web domains — as well as those harder-to-find government websites hosted on .org, .edu, and other top level domains (TLDs) — as one administrative term ends and a new term begins.

End of Term crawls have been completed for term transitions in 2004, 2008, 2012, 2016, and 2020. The results of these efforts is preserved in the End of Term Web Archive. In total, over 500 terabytes of government websites and data have been archived through the End of Term Web Archive efforts. These archives can be searched full-text via the Internet Archive’s collections search and also downloaded as bulk data for machine-assisted analysis.

The purpose of the End of Term Web Archive is to preserve a record of government websites for historical and research purposes. It is important to capture these websites because they can provide a snapshot of government messaging before and after the transition of terms. The End of Term Web Archive preserves information that may no longer be available on the live web for open access.

The End of Term Archive is a collaborative effort by the Internet Archive along with the University of North Texas (UNT), Stanford University, Library of Congress (LC), U.S. Government Publishing Office (GPO), and National Archives and Records Administration (NARA). Past partners include the University of CA’s California Digital Library (CDL), George Washington University, and the Environmental Data and Governance Initiative (EDGI).

Four images of Whitehouse.gov captured between 2008 and 2020 — *Whitehouse.gov captures from: 2008 Sept. 15; 2013 Mar. 21; 2017 Feb. 3; and 2021 Feb. 25*

We are committed to preserving a record of U.S. government websites. But we need your help to complete the 2024 End of Term crawl.

NASA/KSC We Need You Mars Poster — *NASA We Need You Mars Poster*

How can you help?!

We have a list of top level domains from the General Services Administration (GSA) and from previous End of term crawls. But we need volunteers to help us out. We are currently accepting nominations for websites to be included in the 2024 End of Term Web Archive.

Submit a url nomination by going to digital2.library.unt.edu/nomination/eth2024/.
We encourage you to nominate any and all U.S. federal government websites that you want to make sure get captured. Nominating urls deep within .gov/.mil websites helps to make our web crawls as thorough and complete as possible.

Individuals and institutions nominating seed urls are recognized on the individual contributors leaderboard and the institutions leaderboard!

Explore the End of Term Web Archive with full text search and download the data!

Reliving the Past & Redesigning the Present with Animated GIFs

Posted on March 25, 2024 by Chris Freeland

As an editorial strategist and tech journalist, JD Shadel spends a lot of time thinking about how the content on the internet continues to rapidly evolve. One telling example they’ve followed closely is the evolution of GIFs. Two decades ago, the web was filled with millions of jittery, pixelated, handmade GIFs wherever you looked. And for many of us, there’s a nostalgia for the early days of the web when things felt a bit wilder and untamed.

That nostalgia for the version of the internet they grew up with is what first sparked Shadel’s interest in collecting old-school GIFs. During the first months of pandemic lockdowns in 2020, Shadel started spending a lot of their extra spare time diving deep into the Internet Archive’s GifCities collection. Shadel’s personal fascination began with under construction GIFs, a rich niche in the GifCities collection full of animated construction workers and tools. Then came seeking out GIFs of Furbies, Tamagotchi, and other cultural touchstones that the 33-year-old came of age with online. Over the next few years, downloading and organizing GIFs became a hobby for Shadel.

Recently, it came time to update Shadel’s professional website. “It’s one of those evergreen chores it’s easy to obsess over as a freelancer, when your website is your calling card for new work,” said Shadel, who found themself digging back through the hundreds of GIFs they’ve curated thanks to the Internet Archive.

Early cyberspace-themed GIFs became the theme for their new and somewhat unconventional portfolio, which features more than two dozen images sourced entirely from GifCities. Users can, for example, click on a spinning globe for an introduction or a British Furby to learn about Shadel’s background as an American now based in London—including editorial work for outlets such as Vice, The Washington Post, and Conde Nast Traveler and consulting for clients including Airbnb and Adidas.

“I’m so happy GifCities exists to capture that specific snapshot of the internet,” Shadel said. “It really relates, metaphorically, to a lot of my work where the real world and the internet blur, where the digital and the physical intersect.”

In addition to GifCities, the Wayback Machine has also been useful to Shadel. Professionally, it is a resource when reporting and fact checking stories. Personally, they recently found material from a band they played in years ago.

“The Internet Archive just touches my digital life in so many different ways,” Shadel said. “As a journalist, it’s a fact-checking tool. Having the ephemeral internet preserved for future researchers, writers, reporters and editors is a huge service to democracy. And it’s also just fun.”

JD Shadel

On the website with its Space Jam-like navigation, Shadel wanted to reference the history of the internet — and maybe even inspire visitors to think more actively about their own role in charting the future. “I think we can reclaim our digital lives and rekindle the notion of ourselves as ‘netizens’—citizens of the internet and not just passive participants,” Shadel said.

“That’s why the work of the Internet Archive is so important,” they continued. “Despite the fact that we have access to more information than ever before, it’s really easy to forget digital histories and the lessons that we can learn from that.”

Shadel’s writing touches on a range of intersecting topics—such as tech, travel and queerness—but the one thing they hope everyone takes away from their work is the idea that we’re all netizens with a role to play in shaping what we want these shared public spaces to be.

“If we all have some shared sense of ownership of the internet, which is so involved in our lives, I believe we have a greater chance to make it better.” Sometimes, that can start in simple ways—in this case, building a DIY website with a bunch of old GIFs reminded one tech journalist in London that there are lessons we can take from the early internet. “We all have a part to play in making the internet a better place.” And at the least, they hope you enjoy the GIFs they’ve selected.

Digital archives: a time machine for the web

Posted on March 5, 2024 by Chris Freeland

This post was originally published in a newsletter by Project Liberty, February 20, 2024. Image by Project Liberty.

In the summer of 2023, the New York Times ran an article titled “Ways You Can Still Cancel Your Federal Student Loan Debt.”

The article outlined six ways to cancel student debt, with the final being:

“Death
This is not something that most people would choose as a solution to their debt burden.”

At least that was the sixth reason until the New York Times revised it with a stealth edit. When you read the article today, choosing death as a solution to a debt burden has been replaced, but there’s no mention that this article was revised. The timestamp is still the day it was originally published.

If not for Internet Archive’s Wayback Machine, this discrepancy wouldn’t have been caught. The Wayback Machine is a digital archive of the internet, and as such, it captured multiple previous versions.

The internet is constantly being revised in ways that allow history to be rewritten and a shared sense of truth to be questioned. With AI-generated disinformation, the potential to exert control over the future by rewriting the past has never been greater.

This week we’re exploring how digital archives are crucial in developing a record of truth in an ever-changing web.

The need for digital archives

Mark Graham, Director of the Wayback Machine, spoke with the Project Liberty Foundation and shared the key reasons why there’s an even greater need for digital archives:

The importance of the internet. So much of what humanity publishes and makes available lives only on the internet. Given how much time we spend online, the internet has become a central medium of human expression, history, and culture.

The fragile and ephemeral nature of the internet. Graham shared two stats that underscore how fragile today’s internet is:

A study found that of the two million hyperlinks in New York Times articles from 1996 to 2019, 25% of all links were broken (described as link rot).
The Wayback Machine has fixed 20 million broken links in Wikipedia articles with the correct ones.

“The web itself is a living thing. Webpages change. They go away on quite a frequent basis. There’s no backup system or version control system for the web,” Graham explained. That is, except for archives like the Wayback Machine.

The Wayback Machine

The Wayback Machine is a “time machine for the web,” in Graham’s words. It allows users to trace the evolution (or disappearance) of a webpage over time, enabling them to establish a record of what happened on the internet.

For example, the Apple.com URL has been archived 539,000 times since its first archived page in October 1996.
The Wayback Machine has archived over 866 billion webpages in its 28-year history. Today, it archives hundreds of millions of webpages every day and has become one of the most important archives of online content in the world.

How it works

The Wayback Machine “crawls” the web and downloads publicly accessible information. Webpages, documents, and data are stored with a time-stamped URL.
For information that’s not publicly accessible, Internet Archive offers web archiving services through Archive-It for 1,200 organizations in 24 countries around the world (from libraries to research institutions).
The Wayback Machine supports everyday people to help it archive the internet. Anyone can go to Save Page Now to archive a webpage or article.
The Wayback Machine partners with 1,200 fact-checking organizations globally to help it reference material on the web that was the source of disinformation. It has built a library of more than 200,000 examples where a claim has been made, and the Wayback Machine has provided additional context on if that claim is true (known as a review of the claim).

Archive of facts

Fixing links, archiving webpages, and fact-checking digital articles are part of a deeper, more important project to chronicle digital history and establish a record of facts.

Last month, the archive of press releases from a sitting member of Congress, New York’s Elise Stefanik, vanished after she came under scrutiny. The Wayback Machine documented this erasure and provided a time-stamped record of past versions of her website and press releases.
In 2018, a US Appeals court ruled that the Wayback Machine’s archive of webpages can be used as legitimate legal evidence.
The Internet Archive has countless examples of when the press have referenced the Wayback Machine to correct disinformation and dispel rumors. In one example from last year, the Associated Press relied on the Wayback Machine to set the record that the CDC did not say the polio vaccine gave millions of Americans a “cancer virus.”

With the rise of AI-generated disinformation, there’s reason to believe such attempts at rewriting history (even if that history is just yesterday) will become more prevalent and the social contract that has governed web crawlers is coming to an end.

A citizen-powered web

Building digital archives is a bulwark against those attempting to rewrite history and spread misinformation. An archived, time-stamped webpage is not just unimpeachable evidence, it’s a foundational building block of a shared sense of reality.

In 2014, when Malaysia Airlines Flight 17 went down over Ukraine, the Wayback Machine captured evidence that a pro-Russian group was behind the missile attack. But it wasn’t the Wayback Machine’s algorithms that captured the evidence by crawling the internet; it was an individual who found an obscure blog post from a Ukrainian separatist leader touting the shooting down of a plane. That individual identified the blogpost as important enough to be archived, and it became a critical piece of evidence, even after that post disappeared from the internet.

As Graham said, “You don’t know what you got until it’s gone. If you see something, save something.”

What pages can you help archive? Archive them with the Wayback Machine on Save Page Now.

Fair Use in Action at the Internet Archive

Posted on March 1, 2024 by Lila Bailey

As we celebrate Fair Use/Fair Dealing Week, we are reminded of all the ways these flexible copyright exceptions enable libraries to preserve materials and meet the needs of the communities they serve. Indeed, fair use is essential to the functioning of libraries, and underlies many of the ordinary library practices that we all take for granted. In this blog post, we wanted to describe a few of the ways the fair use doctrine has helped us build our library.

Fair use in action: Web Archives and the Wayback Machine

The Internet Archive has been archiving the web since the mid-1990’s. Our web collection now includes more than 850 billion web pages, with hundreds of millions added each day. The Wayback Machine is a free service that lets people visit these archived websites. Users can type in a URL, select a date range, and then begin surfing on an archived version of the web.

Web archives are used for a variety of important purposes, many of which are themselves fair uses. News reporting and investigative journalism is one such use of the Wayback Machine. Indeed, thousands of news articles have relied upon historical versions of the web from the Wayback Machine. Just last week, 13 links to the Wayback Machine were used in a CNN story about an Ohio GOP Senate candidate’s previous statements that were critical of former President Trump. Our web archive also becomes an urgent backup for media sites that are shut down suddenly, whether by authoritarian governments or for other reasons, often becoming the only accessible source both for the authors of these stories and for the public. Another important purpose web archives can serve is as evidence in legal disputes. Attorneys use the Wayback Machine in their daily practice for evidentiary and research purposes. In 2023 alone, the Internet Archive attested to 450 affidavits in cases where Wayback Machine captures were used as evidence in court.

The Wayback Machine also makes other parts of the web, such as Wikipedia, more useful and reliable. To date, the Internet Archive has been able to repair over 19 million broken links, URLs, that had returned a 404 (Page Not Found) error message, from 320 different Wikipedia language editions. There are many reasons, including bit rot and content drift, why links stop working. Restoring links ensures that Wikipedia remains an accurate and verifiable source of information for the public good. And we hope to build new tools and partnerships to help create a more dependable knowledge ecosystem as more and more content on the web is created by generative AI.

The Fair Use doctrine is broadly considered to be what makes web archiving possible. Without it, much of our knowledge and cultural heritage–huge amounts of which are now artifacts in digital form–would be at risk. In today’s chaotic information ecosystem, safeguarding this material in an open, accessible, and transparent way is vital for history and vital for democracy.

Fair use in action: Manuals collection

Whether you are an individual who has rendered an appliance useless because you lost the instructions, or a professional mechanic looking to fix an old vehicle, owners’ manuals are invaluable. As the right to repair movement has amply demonstrated, copyright should not stand as an obstacle to using machines you’ve bought and paid for. This is a place where fair use can shine.

Over the years, the Internet Archive has received manuals, instruction sheets and informational pamphlets of all kinds. The Manuals collection has well over a million items—or users to access 24/7 at no cost. This resource gives people the right to repair and extend the life of their products. Whether you are a rocket scientist needing to operate your space shuttle, a mechanic who needs to repair a vintage VW Bug, or a curious kid trying to fix up your mom’s old computer, having free online access to the technical documentation you need is essential. And in many cases, there would appear to be no other way to get access to this crucial information.

Some preserved manuals are a single printed page with poorly constructed diagrams. Others are multi-volume tomes that give exacting details on operation of a complex piece of machinery. These materials are more than instructions or a list of components. They reflect the priorities and approaches that companies and individuals take with products, as well as the artistic and visual efforts to make an item clear to the reader.

This collection is a cool example of how fair use provides a framework for the Internet Archive to share critical knowledge with consumers. At the same time, it provides a historical timeline of sorts for innovation and the development of technology.

From preserving our digital history to providing access to manuals of obsolete devices, fair use helps libraries like ours serve our community. And while there are no doubt a variety of commercial projects that properly rely on fair use, fair use is at heart about the public good. As we celebrate Fair Use week, we should remember the crucial role it plays, and ensure that we preserve and protect fair use for the good of future generations. For more on events and news on Fair Use/Fair Dealing Week, visit FairUseWeek.org.

Genealogist uncovers family histories with help of Internet Archive

Posted on December 11, 2023 by Caralee Adams

In tracing her family history, Taneya Koonce discovered stories about her African American ancestors in records going back to the late 1700s. Many were enslaved. She followed the path of some descendants from North Carolina to New York in the Great Migration.

Taneya Koonce

The Internet Archive is among the many sources that Koonce has relied on in her research. From her home in Tampa, Florida, she regularly accesses the collection’s online yearbooks, newspapers, location histories, and government records to piece together her family’s story—and has also contributed material in hopes of helping others.

“As a genealogist and family historian, the breadth of digitized materials in the Internet Archive is essential to my research and an invaluable source of information in my family history quest,” said Koonce, who works as an information scientist at an academic medical center.

Koonce began to record stories in her family by interviewing her grandmothers nearly 30 years ago. She learned about several siblings of her maternal grandmother who died in infancy and the hardships they faced in life. Rediscovering her notes from those conversations after they died, Koonce began to dive into genealogy in earnest in 2005.

Her interest turned from a hobby to a passion in recent years. Koonce maintains a family genealogy website, created a web database for research of Koonce surnames from all over the country, publishes on her genealogy blog, and runs a collaborative genealogy-focused online community, the Academy of Legacy Leaders.

Having found so many historical items on the Internet Archive, Koonce teaches others how to use the collection in their own research. She’s active in genealogy societies, frequently presenting to others about the wealth of materials online.

Koonce applauded the Archive for preserving New York voter lists that helped her find one of her ancestors. After researching slaveholders by the name of Koonce, she connected with a man in Wisconsin who had published a “Koonce to Koonce” newsletter on the family’s history. With his cooperation, Taneya digitized and uploaded the newsletter to the Archive to preserve it for others. She always documents her findings, should they be of interest to others pursuing their family history.

“I specialize in helping family historians be very cognizant about planning for the future and leaving a legacy,” said Koonce, who has presented about the importance of saving family history research for the next generation. “One strategy is sharing material on the Internet Archive. I want to help educate people that it is a library. It’s dedicated to preserving content for the future. If we can contribute information to the collection, we can spread the word about what we’re doing and make sure it’s long lasting.”

Using the Wayback Machine to Understand the Cultural Roots of New Technologies

Posted on November 6, 2023 by Caralee Adams

As an academic librarian helping connect students and faculty with the research materials they need, Sanjeet Mann has turned to the Internet Archive many times.

“I really value having the Wayback Machine as an additional tool in my librarian’s toolbox,” Mann said. “Information preservation is an essential, but often overlooked, part of the infrastructure for teaching and learning.”

Mann, currently working as the Systems & Discovery Librarian at California State University, San Bernardino (CSUSB), said he first learned about the value of the Internet Archive in 2006 during his library science master’s program.

Over his career, Mann has worked at various libraries, tapping into the Archive on the job.

Assisting budding writers, composers and artists as Arts Librarian at University of Redlands, Mann found that the vast amount of free information online, including biographies, can shape students’ projects.

“We can draw on the Archive whenever we need inspiration for creative work, or when we need to understand how current scholarship and the issues that we’re facing now aren’t completely new—they’re based on this history of work by scholars, by politicians, by citizens active in the public interest,” he said. “These issues tend to recur over time. As a society, we need to know where we have been in order to meet the challenges of the future.”

At CSUSB, Mann also helps computer science and business students use the Archive’s collections to better understand the cultural roots of new technologies—the historical context for their innovations.

“It is the only entity I’m aware of that preserves the Internet’s scholarly and historical record at this scale,” Mann said.

“I really value having the Wayback Machine as an additional tool in my librarian’s toolbox.”
Sanjeet Mann, librarian

On a practical note, Mann leveraged information through the Wayback Machine when he was researching how to set up a campus laptop loaner program for University of Redlands. This can be an essential service that libraries provide students who have trouble with their computers.

Mann wanted to understand policies at other universities, such as how they handled the return of damaged laptops. Looking at archived versions of university library websites through the Wayback Machine, Mann was able to learn about other approaches and find contacts to follow up for additional details.

The Internet Archive is a source to verify information that is no longer listed on websites, he said.

“Companies themselves don’t have any incentive to archive the history of their website. New products get launched. The platform gets migrated from one platform to another,” Mann said. “An organization like the Internet Archive, being a library, is uniquely positioned to meet the need in society of ensuring some kind of continuity of memory and having a public record. Especially with the government being very partisan these days, I think there’s value in the Internet Archive being an independent, not-for-profit that operates in the public interest.”

Mann added: “Without the Archive, we would lose decades of information about our society at a crucial turning point in its development, eroding trust in online systems and requiring educators, students and researchers to reconsider the way we do our work and share it with others.”

Moving Getty.edu “404-ward” With Help From The Internet Archive API

Posted on November 2, 2023 by jefferson

This is a guest post from Teresa Soleau (Digital Preservation Manager), Anders Pollack (Software Engineer), and Neal Johnson (Senior IT Project Manager) from the J. Paul Getty Trust.

Project Background

Getty pursues its mission in Los Angeles and around the world through the work of its constituent programs—Getty Conservation Institute, Getty Foundation, J. Paul Getty Museum, and Getty Research Institute—serving the general interested public and a wide range of professional communities to promote a vital civil society through an understanding of the visual arts.

In 2019, Getty began a website redesign project, changing the technology stack and updating the way we interact with our communities online. The legacy website contained more than 19,000 web pages and we knew many were no longer useful or relevant and should be retired, possibly after being archived. This led us to leverage the content we’d captured using the Internet Archive’s Archive-It service.

We’d been crawling our site since 2017, but had treated the results more as a record of institutional change over time than as an archival resource to be consulted after deletion of a page. We needed to direct traffic to our Wayback Machine captures thus ensuring deleted pages remain accessible when a user requests a deprecated URL. We decided to dynamically display a link to the archived page from our site’s 404 error “Page not found” page.

Getty.edy 404 page — *Getty.edu 404 error “Page not found” message including the dynamically generated instructions and Internet Archive page link.*

The project to audit all existing pages required us to educate content owners across the institution about web archiving practices and purpose. We developed processes for completing human reviews of large amounts of captured content. This work is described in more detail in a 2021 Digital Preservation Coalition blog post that mentions the Web Archives Collecting Policy we developed.

In this blog post we’ll discuss the work required to use the Internet Archive’s data API to add the necessary link on our 404 pages pointing to the most recent Wayback Machine capture of a deleted page.

Technical Underpinnings

Implementation of our Wayback Machine integration was very straightforward from a technical point of view. The first example provided in the Wayback Machine APIs documentation page provided the technical guidance needed for our use case to display a link to the most recent capture of any page deleted from our website. With no requirements for authentication or management of keys or platform-specific software development kit (SDK) dependencies, our development process was simplified. We chose to incorporate the Wayback API using Nuxt.js, the web framework used to build the new Getty.edu site.

Since the Wayback Machine API is highly performant for simple queries, with a typical response delay in milliseconds, we are able to query the API before rendering the page using a Nuxt route middleware module. API error handling and a request timeout were added to ensure that edge cases such as API failures or network timeouts do not block rendering of the 404 response page.

The only Internet Archive API feature missing for our initial list of requirements was access to snapshot page thumbnails in the JSON data payload received from the API. Access to these images would allow us to enhance our 404 page with a visual cue of archived page content.

Results and Next Steps

Our ability to include a link to an archived version of a deleted web page on our 404 response page helped ease the tough decisions content stakeholders were obliged to make about what content to archive and then delete from the website. We could guarantee availability of content in perpetuity without incurring the long term cost of maintaining the information ourselves.

The API brings back the most recent Wayback Machine capture by default which is sometimes not created by us and hasn’t necessarily passed through our archive quality assurance process. We intend to develop our application further so that we privilege the display of Getty’s own page captures. This will ensure we’re delivering the highest quality capture to users.

Google Analytics has been configured to report on traffic to our 404 pages and will track clicks on links pointing to Internet Archive pages, providing useful feedback on what portion of archived page traffic is referred from our 404 error page.

To work around the challenge of providing navigational affordances to legacy content and ensure web page titles of old content remains accessible to search engines, we intend to provide an up-to-date index of all archived getty.edu pages.

As we continue to retire obsolete website pages and complete this monumental content archiving and retirement effort, we’re grateful for the Internet Archive API which supports our goal of making archived content accessible in perpetuity.

Grad Student Finds Nostalgic ‘Treasure Trove of Goodies’ Through the Internet Archive

Posted on October 23, 2023 by Caralee Adams

As Elena Rowan researches the ways that activist archivers gather and make sense of data, she often relies on the Internet Archive. She is a graduate student in sociology at Concordia University in Montreal, Canada, with an interest in the debate around copyright and e-books in public libraries.

“I look at why archives and libraries are important to society and culture as a whole,” said Rowan, who uses materials preserved in the Wayback Machine and the lnternet Archive. “Without the Internet Archive, so much of the knowledge and information on the Internet would be lost, and most of my research would be impossible.”

Rowan is in her second year of her master’s program and works as a research assistant at the Data Justice Hub. It is a collaborative research project that pursues data-related skills development for social activists, critical researchers and the general public, and aims to understand how data activists gather and make sense of data.

The Internet Archive has been valuable, she said, in providing information for the project and its podcast, Data Decoded.

For a recent class on sociology theory, Rowan said she’s found it useful to search for work by early researchers such as W.E.B. Du Bois in the Internet Archive’s collection. Her university library has a wealth of materials, but she says there are times when she can only find an older book through the Archive and, being digital, it’s easier to locate.

With an event sponsored by the Milieux Institute, which offers programs at the intersection of fine arts, digital culture, and information technology, Rowan leveraged the Internet Archive in another way. She created a one-hour Curating Nostalgia workshop where participants could explore resources in the digital collection to create their own personal nostalgia archive.

Logging into the Internet Archive, Rowan taught people how to search for historical documents and pop culture items. For example, she found a beloved video game that came in a cereal box from her childhood, as well as an audio walking tour of her neighborhood from a decade earlier before gentrification changed the landscape. Other workshop participants found books they read as kids, Club Penguin memorabilia and a Nancy Drew game.

“For scholarly work and nostalgia researchers, it’s a treasure trove of goodies,” Rowan says of the Internet Archive.

In her personal life, Rowan said she’s enjoyed perusing old magazines and obscure cookbooks. She’s found recipes for ambitious cakes, sewing patterns and vintage designs that give her ideas for how to pull together her eclectic mix of old furniture.

“The colors, writing and patterns of the past offer infinite inspiration for creative hobbies and help cultivate domestic bliss,” she said. “I am grateful to everyone at the Internet Archive for creating, maintaining and continuing to expand and fight for this truly amazing public resource!”

Internet Archive Blogs

A blog from the team at archive.org