The Bigger Picture: Visual Archives and the Smithsonian
Posts tagged with: Web/Tech
- We have a date - The Renwick Gallery will be reopening on November 13, 2015! [via EyeLevel blog, SAAM]
- Years in the making - Sir Arthur C. Clarke's personal papers are acquired by the National Air and Space Museum Archives. [via AirSpace blog, NASM]
- Revolutionary war veterans - These are the few that lived long enough to have their portraits taken. [via PetaPixel]
- Your questions are answered about nests and their avian architects. [via Smithsonian Science News]
- Now online from Louisiana State University's Libraries, Special Collections is the collaborative digital collection: Free People of Color in Louisiana: Revealing an Unknown Past, a project "to digitize, index, and provide free access to family papers, business records, and public documents pertaining to free people of color in Louisiana and the lower Mississippi Valley." [via Jennifer Wright, SIA]
- Digital preservation - A sneek peak inside the Digital Art Vault at the Museum of Modern Art. [via Inside/Out blog, MOMA]
- Vindication at last - John Harrison, one of the world's greatest clockmakers, invented what he claimed to be the perfect pendulum clock in the mid-18th century. His peers at the time chastised and ridiculed Harrison's plan. At the Royal Observatory in Greenwich, Clock B as it's known - was recently constructed to Harrison's specifications - and has vindicated him by losing only five-eighths of a second over a period of 100 days setting a Guinness World Record. [via The Verge]
- Mark your calendars, Fall 2015 - The Smithsonian hopes to open the Arts and Industries Building to host special events. [via Washington Post]
- The American Library Association released their State of America's Libraries Report for 2014. [via InfoDocket]
- New technology embraced in the past leads to unitended consequences as paintings conservator, Dawn Rogala, discovers that the cause of cracks in some mid-century paintings was the result of artists' use of newly available at the time commercial paints in their works. [via The Torch, SI]
- Getting dirty - April is National Garden Month and the Smithsonian Gardens has a website, Community of Gardens, that hopes to serve as place for people to share their stories about their gardens. [via Smtihsonian Gardens blog]
- The Archive of Recorded Poetry and Literature debuted on Library of Congress website this week. [via InfoDocket]
- An important question: "Where is my flying car?" gets answered for the time being as the dream is still very much alive. [via AirSpace blog, NASM]
- The Papers of Abraham Lincoln project to identify, image, transcribe, annotate, and publish all documents written by or to Abraham Lincoln during his lifetime; [via InfoDocket]
- The journey of digital collections from donor to repository as told by the Library of Congress. [via The Signal: Digital Preservation, LC]
- In case you haven't visited your local public library lately, hopefully the video below will inspire you to check it out! [via InfoDocket]
Archive-It 5.0 Changes and New Features
As a web preservation intern at the Smithsonian Institution Archives, I capture and preserve the Smithsonian’s web presence using the Archive-It crawling service. In October 2014, Archive-It released Phase 1 of Archive-It 5.0, which featured the roll-out of a new interface and more robust data collection for post-crawl reports. Currently the service allows users to switch between the 4.9 and 5.0 versions. Archive-It offers ten new features for reports, which include quick text box filter, infographics, the ability to add notes, and the option to compare two crawl reports side by side. The reports generated by web crawls play a large part in the Archives’ web collection packages and quality assurance (QA), so the changes between versions 4.9 and 5.0 are important for us to understand as we attempt to preserve the record of the ephemeral web.
Version 4.9 Crawl Report
Version 5.0 Crawl Report
Archive-It One-Time IDs
The snapshots above were taken of the same crawl report, one in 4.9 and the other in 5.0. The new format and interface are not the only differences. The one-time IDs (identifiers) are different. For this crawl version 4.9 was assigned 20150320165024358, and version 5.0 was assigned 149112. While the Archives does not fully rely on these numbers as identifiers for crawls, they are attached to the file name when a summary/overview report and the WARC files are downloaded for our collections. Currently, the ability to switch back and forth between 4.9 and 5.0 makes this issue moot, but once this capability is removed those reports and WARC files downloaded with the 4.9 ID will be more difficult to locate and identify in the new Archive-It reports. Archive-It does not mention this change on the Wikis it has provided regarding the roll-out of 5.0. This change could be problematic for those organizations who use these IDs to identify crawls.
Report Summary Data
Part of our web collection packages include downloading the host data and the report summary from the post-crawl report. The host download provides the URLs that were archived from each host as well as other information such as new data, documents blocked by robots.txt, and out-of-scope documents. When switching between 4.9 and 5.0, the only change is the interface and the ability to browse hosts by seed for more robust data.
When viewing 5.0, the report summary is now called an overview but with the same type of data. However, I noticed a few discrepancies. The data is not consistent when switching between the two versions. The snapshots of the same crawl above show different numbers for the Total Documents Archived. Version 4.9 archived 12,440 documents while version 5.0 archived 12,386 documents. It is unclear why the data is different when switching between the two versions.
New Features Overall
The interface of the 5.0 reports page is an improvement. The one-time IDs are now visible, however the collection name is cut off if the collections name is too long, requiring users to hover over the name to see it in its entirety. The reports page quick text box filter is a helpful feature. The search function is more flexible than 4.9, which only allowed searches by collection name or date.
The new view feature provides users with a link from the reports page directly to the Wayback Machine to view the URL without having to navigate to this resource through the access tab. This feature can help improve our quality assurance (QA) workflow. QA involves ensuring our crawl and capture of the site accurately represents what the website displayed at the time of the crawl. Wayback allows us to view the crawl results visually in website form unlike the reports and hosts which provide numerical data about the crawl.
Overall, the 5.0 features are an improvement on this service, which is an important tool for archiving the record of the Smithsonian today.
- Acquisitions of note - The Library of Congress acquired 500 images from the collection of Robin Stanford of Houston which depict a Civil War era United States and slavery and Yale's Beinecke Library and the Library of Congress acquired the Meserve-Kunhardt Collection of more than 73,000 items that document American history from the Civil War through the end of the nineteenth century and record the rise of photography as an common practice. [via Washington Post and InfoDocket]
- Check out the beta version of the redesigned records section of The National Archives (United Kingdom), Help With Your Research, which is streamlined into eleven new categories to help researchers find what they are looking for. [via The National Archives blog]
- The art and science of conservation at the Freer|Sackler Galleries. [via The Guardian]
- New York City is on display in an online gallery of over 900,000 historical images of the city. [via Open Culture]
- Going to great lengths - The Frontier Nursing Service helped provide medical services for pregnant mothers and their babies who otherwise were at risk of death or serious medical complications. [via O Say Can You See? blog, NMAH]
- Google doubles its online database of street art with 5000 new pieces. [via The Verge]
- A rare peak look at the archives and artifacts held at the NBCUniversal Archives & Collections. [via LAWeekly]
- Workflows and born-digital collections involve continually evolving processes that need to be revised and reworked as software formats are discovered or file sizes increase. [via The Signal: Digital Preservation, LOC]
- You probably noticed it, but this past week Google recognized Anna Atkins, regarded as the world's first woman photographic artist, with a Google Doodle. [via Washington Post]
- The ease with which to create, store, and destroy digital information in the form of email, documents, spreadsheets, PDFs, etc. leaves the historical record for the future at risk of loss. [via The New York Times]
- Help a brother out . . . by voting for your favorite name for the National Zoo's newly born Andean bear brother cubs. [via NZP]
- It's in your hands - You tell your story with StoryCorp's new app that allows you and your smartphone to fully facilitate the interview experience with easy-to-use tools, the ability to record high-quality audio from your device, and upload your conversation to an archive at the Library of Congress where they can be listened to by other app users. [via Cool Hunting]
- Perfect for Women's History Month - An app that that makes you phone buzz when you get near a place where women made history. [via Good Magazine]
- Graduate conservation students in University of Delaware's art conservation program help restore family photographs after a horrible tragedy. [via PetaPixel]