Web and Social Media Preservation: Capturing Today’s Websites for Future Archival Research

Websites are important records of institutional history, but they are also always being updated, redesigned, or taken down. How do we access important information from outdated versions of websites? The Archives is currently using Archive-It, a tool created by the Internet Archive, to capture Smithsonian websites and social media accounts for future use. Archive-It uses a crawler - a program that browses the Internet like Google - to replicate a website at that specific moment. These “crawls” are later accessible using the Wayback tool. While the research potential for these crawls is enormous, two areas stand out in particular; to document the evolution of website features and to capture public participation during a specific event or program through social media.

Crawls show the progress of how technology is used and how websites have evolved over time. Above and below, we have two examples from the National Museum of Natural History (NMNH). This is the Virtual Echinoderm Newsletter, which was last updated in 2002. Though it may seem simplistic to us today, this is very representative of a typical website from the early 2000s.

Fast-forward to 2014: With the new Human Origins Initiative website. We have a slideshow of features, live updates from Facebook and Twitter, and a text box that allows visitors to participate in the project - all located on the first page. While both of these sites are pretty typical for the respective years they were created in, they also are demonstrative of how much websites have changed in just over a decade.

The Archive-It tool is also being used to capture certain programs and events using social media. A great example of this is the crawl of the National Museum of American History’s #HistoryTalkBack Tumblr page. This site documented an ongoing project at the museum where curators invited visitors to respond to a question every day and to post their answers on a wall at the museum. The Tumblr page broadcasts some of the favorite posts and then invites commenters to respond to the question as well. We were pleased with the amount of public participation captured in our crawl - not only do we have the visitors’ comments, but because the site is Tumblr-based, we also captured the number of likes and re-blogs. Now that this site is defunct, this crawl becomes important for documenting the scope and impact of this project.

I especially like these social media crawls. Social media - instantaneous, constantly updated, and therefore often thought of as transient - is transformed into something more lasting. By looking at crawls from blogs, Facebook, Twitter, Tumblr, and Flickr, we can examine the public’s response to a project and the strategies museums use to engage with their audiences. The #HistoryTalkBack crawl shows this. Tumblr users spread these images, sharing the posts to express their own love of history to friends and followers, while the National Museum of American History used this platform to engage both their real-life and virtual visitors. Capturing these moments using social media gives us a greater understanding of how the public participates in museum programs, and also how museums reach out to people.

The Archive-It tool promises incredible potential in the coming years, especially as the Archives continue to grow. If you’d like to learn more, you can check out the Archives’ Archive-It crawls.

Related Resources

Smithsonian Now Using Archive-It to Crawl Websites, The Bigger Picture blog, Smithsonian Institution Archives
Connecting the Dots: Issues with Preserving Complex Websites, The Bigger Picture blog, Smithsonian Institution Archives
Saving the Smithsonian’s Web, The Bigger Picture blog, Smithsonian Instituion Archives

Related Collections

Accession 14-039 - National Museum of American History, Website Records, 2011-2013, Smithsonian Institution Archives
Accession 14-079 - National Museum of Natural History, Website Records, 2013, Smithsonian Institution Archives

View the discussion thread.

Produced by the Smithsonian Institution Archives. For copyright questions, please see the Terms of Use.

Search Google Appliance

Web and Social Media Preservation: Capturing Today’s Websites for Future Archival Research

Related Resources

Related Collections

Leave a Comment