Smithsonian Institution Archives
  • Collections
  • Services
  • Smithsonian History
  • About
  • Education
  • Blog
  • Forums
  • Press
  • Audiences
  • Donate

The Bigger Picture: Visual Archives and the Smithsonian

Archive: 08/2011

Link Love: 8/26/2011

by Catherine Shteynberg on August 26, 2011

Middlegate Japanese Gardens, Pass Christian, Mississippi.

  • How photos from the Smithsonian’s Archives of American Gardens help preserve the memory of gardens (such as the Middlegate Japanese Gardens pictured above) that are now gone.
  • The Museum of the Future has a great roundup of videos and blogs about museums, technology, and media.
  • An update on earthquake damage at the Smithsonian, and hear Smithsonian Secretary (and earthquake expert) Wayne Clough speak about the earthquake.
  • And an update on the Smithsonian’s Haiti Cultural Recovery Project—a project in which Smithsonian experts are are helping to restore Haitian artwork, artifacts, documents, media and architecture that were damaged in the earthquake there.
  • A very interesting post from the American Social History Project Blog looking at how photographs were distorted in their translation to engravings used to illustrate 19th century newspapers, demonstrating “the discrepancies between photographs and their adaptations into mass-produced formats”. (In this case the translation of a Civil War photo to a newspaper illustration poignantly expresses racial biases of the time period.)
  • How an important letter written by President Lincoln after the Battle of Antietam, and then stolen from the War Department records, was recently returned to the National Archives:

"Missing Lincoln Documents returned to National Archives," The letter and Lincoln's endorsement had apparently been removed from Edwards' Commission Branch file at some unknown time in the past, perhaps when the records were still in the custody of the War Department. Bill Panagopulos of Alexander Auctions, Inc., when informed the documents were part of a file at the National Archives, agreed to return them. Courtesy of the National Archives YouTube Channel.

 

Categories: What Gets Saved
Tags: American History, Web/Tech, Archive, Photo History, Film/Video, Link Love
Comments: View 1 comments, or Give us yours!
All comments are moderated and subject to approval. Further information is available in The Bigger Picture’s Commenting Guidelines.

See Here: 8/26/2011

by The Bigger Picture on August 26, 2011

Second Annual Women's Week: LaVerne Love, Women's Program Coordinator and Wilma Scott Heide (NOW) on 26 August 1974, by Unidentified photographer, Photographic print, Smithsonian Institution Archives Record Unit 371 Box 2, Negative Number: 74-9447-6A.

*PS: Happy Women's Equality Day!

Categories: Collections in Focus
Tags: American History, See Here, Politics/Government, World History
Comments: View comments, or Give us yours!
All comments are moderated and subject to approval. Further information is available in The Bigger Picture’s Commenting Guidelines.

Saving the Smithsonian’s Web

by Robin C. Davis, Intern on August 25, 2011

This post is an update to Lynda Schmitz Fuhrig's post “Archiving the Smithsonian’s Presence on the Internet” from September 2, 2010.

The evolution of the websites of the National Museum of Natural History, National Portrait Gallery, National Air and Space Museum, and Hirshhorn Museum, From left to right, how they looked in 1998/2000, 2003, and 2011. Credit: Smithsonian Institution Archives and the Internet Archive.

The Smithsonian Institution has had a presence on the Internet for more than sixteen years. It’s come a long way since then. Documenting the Smithsonian’s various websites falls under the purview of the Smithsonian Institution Archives...but how do we do it?

As a web preservation intern at the Archives this summer, I’ve helped to develop the workflow for preserving Smithsonian-affiliated web content. Our goal is to take an annual “snapshot” of all Smithsonian public websites to be kept in the Archives.

Why do we preserve websites?

Institutional websites are important to preserve because they are:

  • records of institutional activity;
  • publications exposed in the public sphere; and
  • artifacts of historical and heritage value.

(Adapted from PoWR: Preservation of Web Resources Handbook, JISC, 2008.)

How do we preserve websites?

While each unit or office within the Smithsonian maintains and backs up the web content they create, the best way for the Archives to get a comprehensive snapshot of all the websites as they appear online is to use a web crawler. Crawlers, or spiders, are programs that browse the Internet by following trails of links, typically to index or save the content they encounter.

We use Heritrix, open-source crawling software developed by the Internet Archive, to conduct focused captures of individual websites according to our specifications and schedule. Heritrix bundles all the web content it crawls into .WARC files, an archival file format

A screenshot of Heritrix’s progress crawling the Smithsonian Marine Station website.

We need special software to view the content of the WARC files and perform a quality control check to make sure everything looks right and nothing is missing. We’re using the Wayback application, also developed by the Internet Archive. The local application looks and acts just like the Wayback Machine online. Once we’re satisfied with the captured website, we accession the WARC files and they’re officially part of the Archives’ holdings.

A screenshot of the American Art Museum’s Eyelevel Blog as reviewed in Wayback.

Future researchers will also have to use Wayback or other WARC-reading software to view preserved web collections. They might be interested in the content of web-published news releases, the structure of the Smithsonian’s extensive online image collections, or what was deemed worthy of a blog post (!).

Issues encountered

The road to web preservation is not without a few bumps. A few issues we’ve encountered are:

  • Estimating the size of site. Seemingly small, innocuous websites can actually contain many thousands of documents. One of the largest single crawls so far was the website of the National Museum of Natural History’s Botany department, which took 49 hours and 57 minutes to capture 78,922 files. To budget our time, we need to estimate how big a website is, and we use specific software tools like link validation programs to do that.
  • Deciding what external content to capture. How do you tell a web crawler that you want it to follow a link in a blog post to a useful article elsewhere on the Smithsonian website, but not to follow a link to a spam site in the comments? For blogs, we configure Heritrix to accept embedded off-domain content, like photos from Flickr, but not to scrape linked off-domain sites. For non-blog Smithsonian sites, we don’t capture any off-domain content at all. In both instances, we can also specify any URL patterns that are acceptable.

We’re still learning how best to use these tools to fit the needs of the Archives, and in the past two months, we’ve made a lot of progress:

114 crawls performed

541 hours of crawling

684,264 pieces of content captured (includes HTML pages, JPEG images, MP3 audio, etc.)

That means that so far, we’ve reached about two-thirds of this year’s snapshot goal.

 

Categories: What Gets Saved
Tags: Web/Tech, Archive, Digitization, Behind the Scenes
Comments: View 9 comments, or Give us yours!
All comments are moderated and subject to approval. Further information is available in The Bigger Picture’s Commenting Guidelines.

Sneak Peek: 8/24/2011

by Marguerite Roby on August 24, 2011

Peccary skeleton being assembled for exhibit, c. 1966, by Unidentified photographer, Photographic negative, Smithsonian Institution Archives, Negative Number: OPA-878-A.

 

Categories: Collections in Focus
Tags: Science, Sneak Peek
Comments: View comments, or Give us yours!
All comments are moderated and subject to approval. Further information is available in The Bigger Picture’s Commenting Guidelines.

New Field Book Sets on the Smithsonian Flickr Commons

by Catherine Shteynberg on August 24, 2011

If you happen to follow the Smithsonian’s Flickr Commons stream very closely, you may have noticed that two new sets of photos were uploaded last week: a set from thePacific Ocean Biological Survey Program, as well as a set of Field Book Lantern Slides.

While the name may sound dry, the biological survey photos, as you can see above, are full of strikingly beautiful gems—abstract patterns of frigates fluttering across the horizon off the coast of the Phoenix Islands, and elegantly curved bird profiles. The photos document a biological survey of plants and animals of the Pacific completed by Smithsonian employees during the 1960s and 70s.

And the Field Book Lantern slides above are a series of image slides used  by researchers to present their work to colleagues and the general public. They include some especially colorful slides documenting the Smithsonian-Roosevelt African Expedition 1909 (and the “specimens” they collected), as well as an incredible series of early 20th century slides of the preparation and installation of dinosaur specimens and other mammals from the Smithsonian’s Division of Vertebrate Paleontology.

Both sets of photos come from our collections at the Archives, and are a part of the the Field Book Project—a joint venture of the National Museum of Natural History and us, the Smithsonian Institution Archives—to create one online location for scholars and others to search for field books and other field research materials. Summer interns for the Field Book Project curated both sets and write in detail about their content on the Field Book Blog. Read more in their post, “On Land and at Sea: Two Intern Flickr Sets on The Commons.” You can follow the progress of the project on the Field Book blog.

PS- Did you know that you can subscribe to an RSS feed for the Smithsonian’s Flickr Commons stream?

 

Categories: Collections in Focus
Tags: American History, Flickr Commons, Science, Environment, slideshow, Field Book Project
Comments: View 4 comments, or Give us yours!
All comments are moderated and subject to approval. Further information is available in The Bigger Picture’s Commenting Guidelines.
  • ‹‹
  • 2 of 7
  • ››

Produced by the Smithsonian Institution Archives. For copyright questions, please see the Terms of Use.

Stay in touch!

Facebook Twitter Flickr YouTube SlideShare
Join our eNewsletter

About

Connecting you to America’s past with a behind-the-scenes exploration of the Smithsonian’s history, treasures, and the challenges that Archives face preserving collections. More details...

Smithsonian on Flickr Commons

Topics/Tags

  • See Here (611)
  • American History (542)
  • Science (431)
  • Archive (331)
  • Cities/Places (279)
  • Exhibitions (234)
  • Web/Tech (210)
  • Photo History (189)
  • Link Love (153)
  • Politics/Government (153)

Blog Roll

All Smithsonian blogs
American Historical Association Blog
American Institute of Conservation Blog
Archives Next
Archives of American Art
Around the Mall
Field Book Project
Hanging Together
Library of Congress Blogs
National Archives (US) Blogs
National Museum of American History, O say can you see?
Smithsonian Collections Blog
Smithsonian Libraries
Teaching American History

Categories

  • Collections in Focus (990)
  • What Gets Saved (337)
  • Behind the Scenes (212)
  • Smithsonian History (136)

Recent Posts

  • "If you feed them, they will come."
  • Women in Science Wednesday: Mary Alice McWhinnie
  • Twenty-Six and Blooming!
  • Sneak Peek 5/20/2013
  • See Here: 5/17/2013

Monthly Archive

  • May 2013 (24)
  • April 2013 (26)
  • March 2013 (26)
  • February 2013 (26)
  • January 2013 (28)
  • December 2012 (26)
  • November 2012 (28)
  • October 2012 (32)
  • September 2012 (26)
  • August 2012 (31)
  • July 2012 (26)
  • June 2012 (27)
  • May 2012 (27)
  • April 2012 (27)
  • March 2012 (28)
  • February 2012 (27)
  • January 2012 (26)
  • December 2011 (31)
  • November 2011 (28)
  • October 2011 (35)
  • September 2011 (31)
  • August 2011 (35)
  • July 2011 (41)
  • June 2011 (43)
  • May 2011 (33)
  • April 2011 (40)
  • March 2011 (43)
  • February 2011 (35)
  • January 2011 (36)
  • December 2010 (42)
  • November 2010 (40)
  • October 2010 (44)
  • September 2010 (37)
  • August 2010 (39)
  • July 2010 (38)
  • June 2010 (37)
  • May 2010 (42)
  • April 2010 (44)
  • March 2010 (47)
  • February 2010 (40)
  • January 2010 (39)
  • December 2009 (43)
  • November 2009 (34)
  • October 2009 (11)
  • September 2009 (11)
  • August 2009 (12)
  • July 2009 (14)
  • June 2009 (10)
  • May 2009 (12)
  • April 2009 (14)
  • March 2009 (10)
  • January 2009 (1)
Smithsonian Institution Archives
eNewsletter Facebook Twitter Flickr Historypin YouTube SlideShare Browsealoud
Smithsonian Institution
  • Privacy
  • Copyright
  • Contact