Why would a fashion video appear on an environmental blog? That’s the question I found myself asking while performing web quality assurance (aka web QA) on the blog for the Smithsonian Environmental Research Center (SERC), which has more to do with fish than it does with fashion.
As an intern in the Digital Services Division at the Smithsonian Institution Archives, my work in web preservation often involves reviewing crawls (digital captures of a website), using the Wayback QA tool, and running patch crawls (retrieving uncaptured documents). While reviewing the crawl of the SERC blog, a video on river herring conservation is supposed to appear under the article “eDNA emerges as powerful tool for tracking threatened river herring in Chesapeake Bay.” However, in its place is an eighteen-minute video from the Fashion HD Channel. Where did it come from? Where is the original video?
By consulting the list of videos captured in the Archive-It Wayback QA, I found that the video on river herring conservation was not archived. After patching the site and returning to review it once more, the video was finally listed. This means that it is now on the archived blog, even though it is still covered up by the fashion video. Sadly, it is still not visible on the Internet Archive Wayback Machine. I suppose the location from where the Fashion HD Channel is being pulled will have to remain a mystery for now.
Embedded videos from YouTube are difficult to crawl. If particular rules on the Archive-It software are not set in place, it is very easy to capture endless content that is not related to the Smithsonian. On the other hand, occasionally we need to capture more, which is why quality assurance must be done.
Before my internship I was unaware of this troublesome part of the job and just how frustrating it can be. However, now I understand that because the Internet supplies an infinite amount of information and entertainment, web preservation will continue to combat the invasion of unrelated content.
- Searching and Using Web Archives, The Bigger Picture blog, Smithsonian Institution Archives
- Smithsonian Now Using Archive-It to Crawl Websites, The Bigger Picture blog, Smithsonian Institution Archives
- Web and Social Media Preservation: Capturing Today’s Websites for Future Archival Research, The Bigger Picture blog, Smithsonian Institution Archives