Smithsonian Institution Archives
  • Collections
  • Services
  • Smithsonian History
  • About
  • Education
  • Blog
  • Forums
  • Press
  • Audiences
  • Donate

The Bigger Picture: Visual Archives and the Smithsonian

Archive: 11/2011

Sneak Peek: 11/30/2011

by Marguerite Roby on November 30, 2011
"The Torpedo Man" transports nitroglycerine on a horse-drawn cart.
Categories: Collections in Focus
Tags: American History, Sneak Peek
Comments: View comments, or Give us yours!
All comments are moderated and subject to approval. Further information is available in The Bigger Picture’s Commenting Guidelines.

A Peek into an Electronic Records Archivist’s Toolbox

by Lynda Schmitz Fuhrig on November 29, 2011

When it comes to electronic records there is no magic button that makes them readable or usable on a computer. Electronic records archivists rely on all types of hardware, software, and operating systems. Many pieces of software, which function as an archivist’s toolbox, can help files remain available or become usable again. Here is a small list of some open-source and/or freely available software we use at the Smithsonian Institution Archives. Keep in mind that tools are not perfect and should be used with caution. Don’t forget to have backups of your files. Before we incorporate a piece of software into our processes at the Archives, we research it by making sure it is from a reputable group and thoroughly test it on copy sample sets. This post is not an endorsement of any products listed by the Smithsonian Institution. 

The Collaborative Electronic Records Project (CERP) parser outputs XML preservation copies of email.

The CERP parser

From 2005–2008, the Smithsonian Institution Archives and the Rockefeller Archive Center conducted the grant-funded Collaborative Electronic Records Project. The two institutions researched the long-term preservation challenges of messages and attachments within email collections.

CERP also was able to work with another email project called Preservation of Electronic Mail Collaboration Initiative (EMCAP), and both groups co-developed an XML preservation schema for email accounts. Essentially this work resulted in the ability of taking an email account in a proprietary format, such as PST from Microsoft Outlook, and creating a XML preservation copy of the entire account of messages and attachments. XML was chosen as the preservation format because it is human-readable, open, and self-describing.

CERP developed a parsing tool written in the Smalltalk programming language that creates the XML preservation copy following the schema noted above that includes sender, date, subject, message body, etc. The CERP parser was created so small- to mid-sized organizations could download the software to use with their email account/s.

A crawl of the Archives of American Art’s website with Heritrix.

Heritrix

We have written about Heritrix previously on the blog. This tool crawls websites and creates preservation containers of the output known as WARCs (Web ARChive). The Archives uses Heritrix to crawl the nearly one hundred public websites maintained by the Smithsonian’s various museums, research centers, and other offices.

Benefits of Heritrix include:

  • WARCs are an international archival standard
  • WARCs contain useful information such as date, record id, content type, and other data
  • WARCs are easier to manage than hundreds of thousands of separate documents, pages, and assets from a website that was downloaded or copied
BWF MetaEdit allows metadata entry with audio files.

BWF MetaEdit

Our collections at the Archives also include audio, covering everything from Smithsonian concerts to workshop planning files to oral histories. These files are preserved as WAVs (Waveform Audio File Format), which is considered one audio preservation format because it is uncompressed; works in Windows, Mac, and Linux; and is widely used. BWF, or Broadcast Wave Format, is the European adaption of Microsoft’s WAV and contains embedded metadata, which makes it more desirable as a preservation option.

BWF MetaEdit, which was developed by the Federal Agencies Digitization Guidelines Initiative (FADGI), allows users to create Broadcast Wave files from WAVs. Metadata can be added through its graphical user interface (GUI) or command line to create a valid Broadcast Wave file. These metadata fields include organization name, description, origination time of the file, and the software used to create the original WAV.

JHOVE and DROID

JHOVE and DROID are both useful file format identification tools used by archives, libraries, and other organizations. JHOVE is a collaboration between JSTOR and the Harvard University Library while DROID was developed by The National Archives of the United Kingdom.

These tools can be used together in some cases to determine an unknown file format. For example, when we receive digital files from other Smithsonian offices, sometimes older files are missing the three letter identifying extension at the end of the file name. Without this information, it’s difficult to know whether a file called “budget” is a WordPerfect file or a spreadsheet.

The Archives also developed a Java-based script that automates analyses of digital files using both JHOVE and DROID. The script generates outputs and file lists that help an archivist determine possible issues, such as a file with the wrong identifying extension.

Note: This script uses older versions of JHOVE and DROID but newer versions are currently being tested at the Archives.

The Duke Data Accessioner tool assists with copying and analyzing digital files.

Duke Data Accessioner

Many electronic files come to the Archives on removable media (CDs, DVDs, and, yes, 3.5” diskettes), which require that we transfer the content to our backed-up servers for preservation and access. The Duke Data Accessioner (DDA) from Duke University is software that assists us with the initial work of ingesting (copying) the files off the media. After entering some information about the collection and media, the tool creates the same directory structure of the files from the media and copies the records. DDA also runs JHOVE and DROID (see above) against the files for analysis and creates an XML file of this output with some additional preservation metadata known as PREMIS.

Other tips

If you are interested in only viewing a file and not opening it, try searching viewer and old files on the Internet. For more information on detecting file formats via the Internet, search file identification. Some online tools will attempt to detect what a mystery file might be.

Software that enables digital files to last for the long term with authenticity and integrity intact can be a lifesaver. Nevertheless, they are not a replacement for copies, backups, and migrations to new software and hardware of important files.

Categories: Behind the Scenes, What Gets Saved
Tags: Web/Tech, Digitization, Behind the Scenes, Archives
Comments: View 2 comments, or Give us yours!
All comments are moderated and subject to approval. Further information is available in The Bigger Picture’s Commenting Guidelines.

See Here: 11/28/2011

by The Bigger Picture on November 28, 2011
Dr. Elisabeth Gantt, by Hofmeister, Richard K, 1979, Smithsonian Archives - History Div, SIA2011-1158 and 79-14206-12A.
Categories: Collections in Focus
Tags: See Here, Science
Comments: View comments, or Give us yours!
All comments are moderated and subject to approval. Further information is available in The Bigger Picture’s Commenting Guidelines.

Link Love: 11/25/2011

by Catherine Shteynberg on November 25, 2011
Plymouth Rock Piece, 1620, Photo: Harold Dorwin.
  • Link Love Thanksgiving Edition: Learn more about the origins of your favorite Thanksgiving dishes from a Smithsonian anthropologist and check out the Smithsonian’s  piece of Plymouth Rock?
  • The National Archives has a new Tumblr blog about preservation at the Archives.
  • How do we make sure that the electronic material we access is trustworthy? The Library of Congress tackles this question over at their digital preservation blog.
  • When one of your mentors shows up in the archives—over at the Archives of American Art blog.
  • Very exciting! The Smithsonian has a new mobile app, which is perfect for planning your visit this weekend or during the upcoming holidays!
  • I can’t resist including these prints from some of William Blake’s books of poetry, as they were favorites of mine growing up. Who can resist his tigers?
  • This fall and winter, the Smithsonian will present two exhibitions about Thomas Jefferson, and in honor of this, we’re highlighting Jefferson objects around the Smithsonian, including a letter written by him in the collections of the Archives. (You may remember the letter from this blog post some time ago.)
  • And speaking of the Smithsonian’s incredible Jefferson objects? Check out the history of the Jefferson Bible, which recently underwent conservation treatments at the Smithsonian’s National Museum of American History:

 

Smithsonian Magazine Video, www.smithsonianmag.com/video
Categories: What Gets Saved
Tags: American History, Link Love, Conservation
Comments: View 2 comments, or Give us yours!
All comments are moderated and subject to approval. Further information is available in The Bigger Picture’s Commenting Guidelines.

See Here: 11/25/2011

by The Bigger Picture on November 25, 2011
Ann Leven Using the First ATM in the National Air and Space Museum, by Avino, Mark, July 14, 1986, Smithsonian Archives - History Div, SIA2011-1372 and 86-12001-2.
Categories: Collections in Focus
Tags: See Here, Behind the Scenes
Comments: View 4 comments, or Give us yours!
All comments are moderated and subject to approval. Further information is available in The Bigger Picture’s Commenting Guidelines.
  •  
  • 1 of 6
  • ››

Produced by the Smithsonian Institution Archives. For copyright questions, please see the Terms of Use.

Stay in touch!

Facebook Twitter Flickr YouTube SlideShare
Join our eNewsletter

About

Connecting you to America’s past with a behind-the-scenes exploration of the Smithsonian’s history, treasures, and the challenges that Archives face preserving collections. More details...

Smithsonian on Flickr Commons

Topics/Tags

  • See Here (612)
  • American History (544)
  • Science (431)
  • Archive (332)
  • Cities/Places (279)
  • Exhibitions (235)
  • Web/Tech (211)
  • Photo History (189)
  • Link Love (154)
  • Politics/Government (153)

Blog Roll

All Smithsonian blogs
American Historical Association Blog
American Institute of Conservation Blog
Archives Next
Archives of American Art
Around the Mall
Field Book Project
Hanging Together
Library of Congress Blogs
National Archives (US) Blogs
National Museum of American History, O say can you see?
Smithsonian Collections Blog
Smithsonian Libraries
Teaching American History

Categories

  • Collections in Focus (991)
  • What Gets Saved (338)
  • Behind the Scenes (212)
  • Smithsonian History (136)

Recent Posts

  • See Here: 5/24/2013
  • Link Love: 5/24/2013
  • "If you feed them, they will come."
  • Women in Science Wednesday: Mary Alice McWhinnie
  • Twenty-Six and Blooming!

Monthly Archive

  • May 2013 (26)
  • April 2013 (26)
  • March 2013 (26)
  • February 2013 (26)
  • January 2013 (28)
  • December 2012 (26)
  • November 2012 (28)
  • October 2012 (32)
  • September 2012 (26)
  • August 2012 (31)
  • July 2012 (26)
  • June 2012 (27)
  • May 2012 (27)
  • April 2012 (27)
  • March 2012 (28)
  • February 2012 (27)
  • January 2012 (26)
  • December 2011 (31)
  • November 2011 (28)
  • October 2011 (35)
  • September 2011 (31)
  • August 2011 (35)
  • July 2011 (41)
  • June 2011 (43)
  • May 2011 (33)
  • April 2011 (40)
  • March 2011 (43)
  • February 2011 (35)
  • January 2011 (36)
  • December 2010 (42)
  • November 2010 (40)
  • October 2010 (44)
  • September 2010 (37)
  • August 2010 (39)
  • July 2010 (38)
  • June 2010 (37)
  • May 2010 (42)
  • April 2010 (44)
  • March 2010 (47)
  • February 2010 (40)
  • January 2010 (39)
  • December 2009 (43)
  • November 2009 (34)
  • October 2009 (11)
  • September 2009 (11)
  • August 2009 (12)
  • July 2009 (14)
  • June 2009 (10)
  • May 2009 (12)
  • April 2009 (14)
  • March 2009 (10)
  • January 2009 (1)
Smithsonian Institution Archives
eNewsletter Facebook Twitter Flickr Historypin YouTube SlideShare Browsealoud
Smithsonian Institution
  • Privacy
  • Copyright
  • Contact