What’s in a Digital Archive? An Entire Petabyte of Knowledge Preserved at UGA Libraries

Submitted by Camie on

In the 21st century, libraries don’t just store information on shelves — they also use servers. UGA Libraries recently reached a major milestone in its digital preservation of unique materials, eclipsing 1 petabyte of storage in its ARCHive.

The digital preservation storage ARCHive was established in 2017 and named for the beloved symbol of the country’s first state-chartered public university. It serves as a digital gateway to materials from UGA’s three Special Collections units — the Brown Media Archives and Peabody Awards Archive, Hargrett Rare Book and Manuscript Library, and Russell Library for Political Research and Studies. It’s also the virtual home to the Digital Library of Georgia, New Georgia Encyclopedia, and Map and Government Information Library, all of which provide materials freely on the Internet to users across the world.

The ARCHive’s 1 millionth gigabyte was reached in December 2023, likely with the upload of a news film reel from the WSB Newsfilm Collection, one of the largest publicly available collections of newsfilm in the country. Or the byte may have been taken by a digitized edition of the Atlanta Georgian, a Hearst-owned newspaper from the turn of the 20th century, preserved through the Georgia Newspaper Project

“I can't tell for sure which files put us over the top,” said Adriane Hanson, head of digital stewardship for UGA Libraries. “Anecdotally, I know of a few places with 1 PB or more,” she said, pointing to the huge storage capabilities of the Library of Congress and National Archives along with a few of the nation’s largest media libraries. “But I also get a lot of shocked expressions when I share how much we have when I meet people at conferences. I think it's pretty rare still.”

To put it in perspective, the standard iPhone holds 128 GB of storage, so it would take about 7,800 iPhones to reach the capacity of the ARCHive. In terms of 1 GB flash drives, if you line them up end to end, a petabyte’s worth would stretch over 92 football fields. That’s more yardage than Stetson Bennett’s career passing yards at UGA.

According to Hanson, the largest chunk of digital storage at UGA Libraries goes to the Brown Media Archives and Peabody Awards Collection. The library is the third largest archive devoted to audio and moving images in the country, and it preserves everything from 100-year-old home movies to radio programs from 1940s, local and national television programming from the beginning of the technology, video tapes in a variety of formats, and the visual and audio history of the University of Georgia.

“Digitizing audiovisual content is vitally important because formats become obsolete as technology changes. The machines and the playback units just aren’t made anymore, and this means a real need to save what we can,” said Ruta Abolins, director of the Brown Media Achives and Peabody Awards Collection “Digitizing content is also needed to provide access to the unique home movies, local news content, folk recordings, and other special items in our collections. It is what all faculty, students, and researchers expect from a university like UGA.”

Digital preservation allows historic images to remain today, and with the work of the Brown Media archivists, film clips have survived to be viewed again through documentaries and other projects, including the Academy Award-winning documentary Summer of Soul in 2021 and the new multi-part documentary James Brown: Say It Loud, streaming on Hulu beginning this month. In addition, a new exhibition on display in the National Baseball Hall of Fame entitled "The Souls of the Game: Voices of Black Baseball" includes rare footage from the Pebble Hill Plantation Film Collection of black players on a baseball field in Georgia from 1919 or 1920.

The ARCHive also stores hundreds of oral histories, telling personal stories related to music, politics, and the every-day lives of Georgians, historic maps and newspapers, high-quality images of diaries, journals, photos, and other unique materials that otherwise would only be accessible by a trip to the Special Collections Libraries in Athens. It also provides a storage place for 21st century born-digital materials such as websites, digital art, and blogs and vlogs that are no longer available through their original creators due to the cost of upkeep.

The work of digital preservation doesn’t begin and end when the materials are uploaded onto a server, Hanson explained.

 “In practical terms, preservation storage means having good metadata, having multiple copies in multiple locations and kinds of storage media, and proactively checking that the files are unchanged,” she said, giving credit to a team of archivists, catalogers, and IT professionals who served important roles in reaching the petabyte milestone. “This all gives us the best chance of still having a good copy when an error is detected or a portion of the hardware is damaged, such as when a server room floods during a storm. We use this system for our highest priority digital content and the goal is to keep the files usable indefinitely.”

A petabyte of archival information requires a lot of work to keep the knowledge safe and accessible for future generations of researchers and scholars, and that doesn’t even count the fact that the ARCHive will continue to grow as new material is added.

“It is a major accomplishment to have this much content identified as important, prepared and described, and safely put into storage,” Hanson said. “To keep it safe and usable, we'll need to keep upgrading the system, replacing components as they break or stop being supported. And we'll need to address formats that become unusable over time, usually by making new copies in something more modern.”

In the end, the digital archives are just as important as the books on the shelf and the ephemera in the Special Collections vault to the UGA Libraries’ mission of preservation of knowledge, said Toby Graham, associate provost and university librarian.

”Our digital archives have become as necessary to preserve any of our rare or unique special collections,” Graham said. “The 1 PB matters because of what it contains: A vast record of American broadcasting history and Georgia history, much of it derived from sources that won’t survive in their original physical forms.”