Three Cheers for Embedded Metadata

I love metadata (data about data) because it makes my professional life as a digital archivist, as well as my personal life, easier. Of course, creating and embedding it also requires time and effort. As a case in point, Mike Ashenfelder from the Library of Congress has written about how challenging the process of adding metadata to digital photos can be. Indeed, the challenges are numerous.

For example, files from only a few years ago can become forgotten or obsolete as formats, operating systems or software changes. Adding to the difficulty is the fact that these files are everywhere—online, on computers and servers, and on CDs, DVDs, and thumbdrives tucked in drawers and closets.

Image information is often lost as many digital items end up having multiple lives as they are repurposed on various websites, blogs, and social media sites.

Digital images appear in unexpected places, and Internet searches can take unexpected turns. For instance, my own recent Internet search for images of Midcentury Modern decorating led me to images of a restaurant I used to frequent (the explanation: a blogger with a penchant for the Midcentury, also was a fan of the kitchy décor of the familiar Cozy Dog Drive In). In these cases, if you don’t have any context, you might not know what you are looking at.

A file name can help if it is descriptive enough, such as "2012_01_09_Smithsonian_Final_Report.doc." Captions with images also are important. Taking this one step further is embedded metadata, that is, metadata that is part of the file itself. This information will live with a photo or file no matter where that digital asset is stored, unless the data is removed. Embedded metadata can tell you who shot the photo, camera model, F-stop, ISO speed, where it was taken, copyright, and, if you are lucky, who or what is in the photo.

Here is an example of an image from Smithsonian Institution Archives Accession 11-281 with its embed

At the Smithsonian Institution Archives, we receive a variety of born-digital and scanned images that become part of our permanent collections that tell stories about the Smithsonian behind the scenes. These images range from objects and exhibition spaces, to museum special events, to researchers and students working in the field. File names are sometimes descriptive and helpful while others are just the generic name assigned by the camera (IMG_0001). In some cases, the actual media (CDs, DVDs, disks) that house the files contain some useful information. And sometimes a separate word-processing or PDF file with captions is provided.

The more information we have from the source the better. While technical metadata is provided by the digital camera (for example, the camera manufacturer and photo date, if set properly), descriptive information (keywords/tags, captions) has to be added manually. This can be done once the images are transferred to a computer with specific software, many of which offer batching functionality that makes the process more efficient—enabling us, for instance, to not have to enter in the photographer’s name every time.

Embedded metadata is not only for images. Many word-processing, presentation, and spreadsheet applications also have the ability to include the author, keywords, title, and other information. In a Windows 7 environment, right click on a file and select Properties and then Details to view and edit fields. Older Windows operating systems require opening the file itself in software that allows viewing of Description metadata. On a Mac, right click on the file and select Get Info for some basic information. By taking the time to add metadata to your own digital files, you’ll ensure that important and identifying information remains at your fingertips.

Here is the same image with its embedded metadata.

There is even an embedded metadata group made up of photography and advertising organizations that has released the Embedded Metadata Manifesto. The five principles follow:

1) Metadata is essential to describe, identify and track digital media and should be applied to all media items which are exchanged as files or by other means such as data streams.
2) Media file formats should provide the means to embed metadata in ways that can be read and handled by different software systems.
3) Metadata fields, their semantics (including labels on the user interface) and values, should not be changed across metadata formats.
4) Copyright management information metadata must never be removed from the files.
5) Other metadata should only be removed from files by agreement with their copyright holders

Thanks to embedded metadata in an image, a researcher looking at a food blog in 2015 will know who those seven attendees were from a scientific conference in 2005.

Here is the embedded information from the image: Photograph by Chip Clark, Smithsonian Institution. Preparations and move of the Right whale model and skull into the Natural History Museum through the Mall entrance and steps on the evening of 18 March 2008. Involves removing the doors, big flatbed trucks, a crane and traffic control. SIA Accession 11-281, National Museum of Natural History, Office of Public Affairs, Images, c. 1992-2010.

This photo is part of a series of images taken during the National Museum of Natural History’s construction of The Sant Ocean Hall that opened September 2008. Thanks to the Library of Congress for the inspiration for this “test.”

Produced by the Smithsonian Institution Archives. For copyright questions, please see the Terms of Use.