The Bigger Picture: Visual Archives and the Smithsonian
Posts tagged with: Web/Tech
"Smithsonian Enters Cyberspace with Information-Packed World-Wide Web Home Page" announced the press release.
Tomorrow marks the 20th anniversary of the Smithsonian's first "internet 'web' site" on May 8, 1995. The web site included more than 1,500 pages and overviews of the site were available in Spanish, German, and French. In addition to text and graphics, the pages also included images, audio, and video. Peter House, the National Science Foundation staff member who was detailed to the Smithsonian for the technical development of the website, considered the site to be very large at the time.
The Smithsonian Home Page was designed to allow users to visit the Smithsonian in much the same way as they would in person. Users can begin by viewing general information pages, just as many visitors begin with the information center in the Castle, or they can go directly to page for an individual museum. Many of the Smithsonian's museums and other facilities established home pages at the same time.
A sneak preview of "Ocean Planet On-Line" was available several weeks ahead of the Smithsonian Home Page. It was demonstrated during the press preview of the "Ocean Planet" exhibition at the National Museum of Natural History, held April 20, 1995. The website was a joint project of the Smithsonian's Environmental Awareness Program and the National Aeronautics and Space Administration (NASA). Gene Feldman, an oceanographer at the Goddard Space Flight Center and creator of the online exhibition, described the site as "one of the most comprehensive and advanced exhibitions available through the Internet via the World Wide Web." He believed it had "capabilities that will amaze even the tekkies." Although hosted on a NASA server, "Ocean Planet On-Line" was considered to be a component of the larger Smithsonian website. It still exists today in close to its original form.
The Smithsonian Home Page included multimedia messages from the Secretary, general information, frequently asked questions (known as "Encyclopedia Smithsonian"), press releases, museum highlights, online exhibitions, virtual museum tours, a staff directory, and the "electronic Shopping Mall." The "Perspectives" section of the site allowed users to search for specific topics across the entire website. Many of these features still exist, in an updated form, in the current Smithsonian website.
In the first 24 hours after the home page was launched, it received approximately 100,000 hits, some as far away as Japan. By May 17, 9 days after the launch, there had been over 600,000 hits.
Secretary Heyman noted that "James Smithson's goal of the 'increase and diffusion of knowledge' has been reborn for a new century."
According to House, "The Smithsonian has been waiting 150 years for the Internet. What we do here is perfect for it."
- Tracking Down the Elusive 'Treasure House of Learning', The Bigger Picture blog, Smithsonian Institution Archives
- Accession 98-094 - Office of the Secretary, Smithsonian Website Records, 1995, Smithsonian Institution Archives
- Accession 01-081 - Smithsonian Institution, Office of Public Affairs, The Torch, 1994-1999, Smithsonian Institution Archives
- Accession 12-545 - National Museum of Natural History, Office of Public Affairs, Press Releases, 1992-2002, Smithsonian Institution Archives
- Historic Smithsonian Home Pages on the Internet Archive Wayback Machine and on Archive-It
- For the first time - The National Air and Space Museum lowers the Bell X-1 to the floor for the first time since the museum opened in 1976. [via SI Newsdesk]
- From the National Postal Museum - The 10 most common and preventable problems that can damage collections, both in a museum and at home. [via Pushing the Envelope blog, NPM]
- In 2008, photographer Anita Cobin embarked on a 10-year project to take portraits of women in the United Kingdom who were the first to achieve something in their field to celebrate in 2018 the 100th anniversary of women's suffrage in the UK. [via PetaPixel]
- Everyone needs some guidance - Helping Congress archive their personal digital records. [via The Signal: Digital Preservation, LC]
- No available as Link Open Data - The Getty's Union List of Artist Names. [via The Getty Iris]
- May is Asian Pacific Heritage month and the Smithsonian will kick off the month with "Korea Day: A Family Festival" which will be hosted by the Freer Gallery of Art and Arthur M. Sackler Gallery, on Sunday, May 3, from 11 a.m. to 4 p.m. [via SI Newsdesk]
- The University of Virginia in partnership with the Fred W. Smith National Library for the Study of George Washington at Mount Vernon, will embark on a project to publish Martha Washington's letters in fully edited and annotated volumes. [via InfoDocket]
- Opening today at the Whitney Museum of American Art is it's inaugural exhibition in its new home: "America is Hard to See." [via Cool Hunting]
- We have a date - The Renwick Gallery will be reopening on November 13, 2015! [via EyeLevel blog, SAAM]
- Years in the making - Sir Arthur C. Clarke's personal papers are acquired by the National Air and Space Museum Archives. [via AirSpace blog, NASM]
- Revolutionary war veterans - These are the few that lived long enough to have their portraits taken. [via PetaPixel]
- Your questions are answered about nests and their avian architects. [via Smithsonian Science News]
- Now online from Louisiana State University's Libraries, Special Collections is the collaborative digital collection: Free People of Color in Louisiana: Revealing an Unknown Past, a project "to digitize, index, and provide free access to family papers, business records, and public documents pertaining to free people of color in Louisiana and the lower Mississippi Valley." [via Jennifer Wright, SIA]
- Digital preservation - A sneek peak inside the Digital Art Vault at the Museum of Modern Art. [via Inside/Out blog, MOMA]
- Vindication at last - John Harrison, one of the world's greatest clockmakers, invented what he claimed to be the perfect pendulum clock in the mid-18th century. His peers at the time chastised and ridiculed Harrison's plan. At the Royal Observatory in Greenwich, Clock B as it's known - was recently constructed to Harrison's specifications - and has vindicated him by losing only five-eighths of a second over a period of 100 days setting a Guinness World Record. [via The Verge]
- Mark your calendars, Fall 2015 - The Smithsonian hopes to open the Arts and Industries Building to host special events. [via Washington Post]
- The American Library Association released their State of America's Libraries Report for 2014. [via InfoDocket]
- New technology embraced in the past leads to unitended consequences as paintings conservator, Dawn Rogala, discovers that the cause of cracks in some mid-century paintings was the result of artists' use of newly available at the time commercial paints in their works. [via The Torch, SI]
- Getting dirty - April is National Garden Month and the Smithsonian Gardens has a website, Community of Gardens, that hopes to serve as place for people to share their stories about their gardens. [via Smtihsonian Gardens blog]
- The Archive of Recorded Poetry and Literature debuted on Library of Congress website this week. [via InfoDocket]
- An important question: "Where is my flying car?" gets answered for the time being as the dream is still very much alive. [via AirSpace blog, NASM]
- The Papers of Abraham Lincoln project to identify, image, transcribe, annotate, and publish all documents written by or to Abraham Lincoln during his lifetime; [via InfoDocket]
- The journey of digital collections from donor to repository as told by the Library of Congress. [via The Signal: Digital Preservation, LC]
- In case you haven't visited your local public library lately, hopefully the video below will inspire you to check it out! [via InfoDocket]
Archive-It 5.0 Changes and New Features
As a web preservation intern at the Smithsonian Institution Archives, I capture and preserve the Smithsonian’s web presence using the Archive-It crawling service. In October 2014, Archive-It released Phase 1 of Archive-It 5.0, which featured the roll-out of a new interface and more robust data collection for post-crawl reports. Currently the service allows users to switch between the 4.9 and 5.0 versions. Archive-It offers ten new features for reports, which include quick text box filter, infographics, the ability to add notes, and the option to compare two crawl reports side by side. The reports generated by web crawls play a large part in the Archives’ web collection packages and quality assurance (QA), so the changes between versions 4.9 and 5.0 are important for us to understand as we attempt to preserve the record of the ephemeral web.
Version 4.9 Crawl Report
Version 5.0 Crawl Report
Archive-It One-Time IDs
The snapshots above were taken of the same crawl report, one in 4.9 and the other in 5.0. The new format and interface are not the only differences. The one-time IDs (identifiers) are different. For this crawl version 4.9 was assigned 20150320165024358, and version 5.0 was assigned 149112. While the Archives does not fully rely on these numbers as identifiers for crawls, they are attached to the file name when a summary/overview report and the WARC files are downloaded for our collections. Currently, the ability to switch back and forth between 4.9 and 5.0 makes this issue moot, but once this capability is removed those reports and WARC files downloaded with the 4.9 ID will be more difficult to locate and identify in the new Archive-It reports. Archive-It does not mention this change on the Wikis it has provided regarding the roll-out of 5.0. This change could be problematic for those organizations who use these IDs to identify crawls.
Report Summary Data
Part of our web collection packages include downloading the host data and the report summary from the post-crawl report. The host download provides the URLs that were archived from each host as well as other information such as new data, documents blocked by robots.txt, and out-of-scope documents. When switching between 4.9 and 5.0, the only change is the interface and the ability to browse hosts by seed for more robust data.
When viewing 5.0, the report summary is now called an overview but with the same type of data. However, I noticed a few discrepancies. The data is not consistent when switching between the two versions. The snapshots of the same crawl above show different numbers for the Total Documents Archived. Version 4.9 archived 12,440 documents while version 5.0 archived 12,386 documents. It is unclear why the data is different when switching between the two versions.
New Features Overall
The interface of the 5.0 reports page is an improvement. The one-time IDs are now visible, however the collection name is cut off if the collections name is too long, requiring users to hover over the name to see it in its entirety. The reports page quick text box filter is a helpful feature. The search function is more flexible than 4.9, which only allowed searches by collection name or date.
The new view feature provides users with a link from the reports page directly to the Wayback Machine to view the URL without having to navigate to this resource through the access tab. This feature can help improve our quality assurance (QA) workflow. QA involves ensuring our crawl and capture of the site accurately represents what the website displayed at the time of the crawl. Wayback allows us to view the crawl results visually in website form unlike the reports and hosts which provide numerical data about the crawl.
Overall, the 5.0 features are an improvement on this service, which is an important tool for archiving the record of the Smithsonian today.