Art image from Project Star: Teacher to Teacher Series. Smithsonian Institution Archives, Acc. 17-067.

Archiving Born-Digital Media from the Smithsonian Astrophysical Observatory

We’ve been processing audiovisual media collections from the Science Media Group division of the Smithsonian Astrophysical Observatory. 

The Smithsonian Institution Archives received a large (just under eight terabytes) collection of born-digital video and other formats from the Science Media Group, responsible for educational programming, at the Harvard-Smithsonian Center for Astrophysics. As a digital archiving contractor for the Archives, I just completed a two-year-long process of archiving this collection and making a large portion of it available on the Smithsonian Digital Asset Management System (DAMS).

Students learning science from the Essential Science for Teachers project.

The  Science Media Group

The Science Media Group (SMG) was an educational program of the Harvard-Smithsonian Astrophysical Observatory (SAO) from 1989 to 2013. The team was tasked with creating materials for educators, both in the classroom and for their own professional development. All of the content was created to support science education and supply educators with information and resources.  

The collection contains a wide variety of videos, audio, images and animations, some of which were in the final productions, as well as supporting material and B-roll footage not available anywhere else. SMG produced many different projects, some of which you can view here. It also operated a television service called the Annenberg Channel.

Expedition photo by Lonnie Thompson, Senior Research Scientist, Ohio State University from The Habit

The Archiving Process

Archiving this collection started with transferring the files from physical media like CDs, hard drives, DVDs, and even a few floppy disks to temporary external hard drives from three separate accessions. Bagger was the program of choice for this since it generated an MD5 hash for each file and verified the authenticity of the transfer. Other software tools we used helped us determine file types, creation dates, file sizes, and other important information. Duplicates also were found and removed with a workflow process as consistent as possible. We used all of this information to create a comprehensive database with information about every single file. Any information that came from the creators of the content was also added to the database such as project names, copyright information, and descriptions.

Next, it was time to start watching some videos—more than 5,000 of them. This is called quality assurance or QA. The goal of the QA process was to see if there were any problems in the playback of videos. Issues such as pixilation, audio problems, skipping, etc. were noted in the database as well as a short description of the video. Though many were short clips or animations, some longer videos were just viewed at the beginning, middle, and end or just enough to get a general description.

Since these are older born-digital videos, playback was a concern. Many videos did have some sort of error or perceived error. Errors in playback can be caused by different things, not necessarily because the video is bad. As you can probably imagine, there are a lot of digital video production software programs and there have been for many years. They each have their own way of doing things. Some of them might even not be around anymore and the codec, or method of encoding and decoding the video, may not be readable by all video players. This makes it difficult for the digital archivist, or anyone working with video, because there is no standard format. VLC Media Player, an open-source media player from VideoLAN, was able to play the majority of the videos in the collection.             

File names were a significant issue in processing the collection. While it is nice to have a long, descriptive file name, especially when scanning a folder for the clip you want, if the name is too long and exceeds the character limit, you may be unable to do basic things like moving and copying. Many file names were shortened using Metamorphose, Adobe Bridge, or manually. In one instance, the file name was so long that the computer would not even allow it to be shortened, which caused some problems. Special characters and spaces were also abundant. These needed to be removed before transferring to the DAMS. Original file names and the shortened file names were recorded in the database as a cross-reference, should there ever be any confusion. Overall, the SMG did a good job naming files with the program codes, numbers, and keywords and keeping items organized in descriptive folders.

Art image from Project Star: Teacher to Teacher Series.


One of the most popular programs the SMG produced was a professional development program for educators called A Private Universe, which explored common misconceptions about science by students. I have to admit, I also had some of these misconceptions, so the topic is still relevant today even some thirty years after its debut. You can view the Annenberg Channel’s A Private Universe Series website here along with its follow-up program Minds of Our Own. Other popular programs include The Habitable Planet: A Systems Approach to Environmental Science, which provided classroom content to educators and Project Star: Teacher to Teacher Series, a professional development program for educators. The collection images included in this post are representative of the main themes of education, astronomy and earth science. 

The Science Media Group was a part of the Smithsonian mission to increase and diffuse knowledge and the results of their work are now preserved at the Smithsonian Institution Archives.

Related Resources

Produced by the Smithsonian Institution Archives. For copyright questions, please see the Terms of Use.