Web and Social Media Preservation: Capturing Today’s Websites for Future Archival Research

Websites are important records of institutional history, but they are also always being updated, redesigned, or taken down. How do we access important information from  outdated versions of websites? The Archives is currently using Archive-It, a tool created by the Internet Archive, to capture Smithsonian websites and social media accounts for future use. Archive-It uses a crawler - a program that browses the Internet like Google - to replicate a website at that specific moment. These “crawls” are later accessible using the Wayback tool. While the research potential for these crawls is enormous, two areas stand out in particular; to document the evolution of website features and to capture public participation during a specific event or program through social media.

A screenshot of the website for the Virtual Echinoderm Newsletter, crawled June 25, 2014, Accession

Crawls show the progress of how technology is used and how websites have evolved over time. Above and below, we have two examples from the National Museum of Natural History (NMNH). This is the Virtual Echinoderm Newsletter, which was last updated in 2002. Though it may seem simplistic to us today, this is very representative of a typical website from the early 2000s. 

A screenshot of the website for the Virtual Echinoderm Newsletter, crawled June 25, 2014, Accession

Fast-forward to 2014: With the new Human Origins Initiative website. We have a slideshow of features, live updates from Facebook and Twitter, and a text box that allows visitors to participate in the project - all located on the first page. While both of these sites are pretty typical for the respective years they were created in, they also are demonstrative of how much websites have changed in just over a decade. 

A screenshot of the website for the Human Origins Initiative, crawled November 22, 2013, Accession 1

The Archive-It tool is also being used to capture certain programs and events using social media. A great example of this is the crawl of the National Museum of American History’s #HistoryTalkBack Tumblr page. This site documented an ongoing project at the museum where curators invited visitors to respond to a question every day and to post their answers on a wall at the museum. The Tumblr page broadcasts some of the favorite posts and then invites commenters to respond to the question as well. We were pleased with the amount of public participation captured in our crawl - not only do we have the visitors’ comments, but because the site is Tumblr-based, we also captured the number of likes and re-blogs. Now that this site is defunct, this crawl becomes important for documenting the scope and impact of this project.

A screenshot of the website for the NMAH #TalkBackHistory Tumblr, crawled June 6, 2013, Accession 14

I especially like these social media crawls. Social media - instantaneous, constantly updated, and therefore often thought of as transient - is transformed into something more lasting. By looking at crawls from blogs, Facebook, Twitter, Tumblr, and Flickr, we can examine the public’s response to a project and the strategies museums use to engage with their audiences. The #HistoryTalkBack crawl shows this. Tumblr users spread these images, sharing the posts to express their own love of history to friends and followers, while the National Museum of American History used this platform to engage both their real-life and virtual visitors. Capturing these moments using social media gives us a greater understanding of how the public participates in museum programs, and also how museums reach out to people. 

A screenshot of the website for the NMAH #TalkBackHistory Tumblr, crawled June 6, 2013, Accession 14

The Archive-It tool promises incredible potential in the coming years, especially as the Archives continue to grow. If you’d like to learn more, you can check out the Archives’ Archive-It crawls

Related Resources


Related Collections

Produced by the Smithsonian Institution Archives. For copyright questions, please see the Terms of Use.