The Bigger Picture: Visual Archives and the Smithsonian
Category: Behind the Scenes
Volunteers have been an integral part of the Smithsonian since the beginning. As our historian Pamela Henson likes to say, we have always relied on the kindness of strangers. Our first Secretary, Joseph Henry, coordinated a group of about 600 people across North America to send in weather data which he posted on a map in the Smithsonian’s Castle (this program eventually led to the founding of the National Weather Service.) Our second Secretary, Spencer Fullerton Baird, created a network of collecting volunteers who sent biological specimen to the Smithsonian for study and inclusion in its first U.S. National Museum.
Today, on site volunteers number almost the same as staff; 6,373 staff and 5451 volunteers. Since kicking off our first pan-Smithsonian digital volunteer website in June 2013, the Smithsonian Transcription Center, we nearly doubled our volunteer base by 4,919! The number steadily climbs and it is likely to soon outnumber our in-person volunteers, and eventually our staff.
Although digital volunteers work from all over the world, there is a sense of community amongst the volunteers through social media and the Transcription Center itself. I regularly field questions/comments from volunteers in very different time zones. It also seems like serving as a digital volunteer yields the same sense of purpose as our in-person volunteers:
“…I was also keen because anything I do helps to open up access to the Smithsonian collections and this results in improved connections and knowledge for everyone. Scientists, citizen scientists, historians, school children, from anywhere with an internet connection. The fact that anyone can view the transcriptions and that there is open access to the transcribed data was a very important factor in me donating my time,” Transcription Center Volunteer
“ What drives me, in particular, is the preservation of the study of astronomy. There were countless hours spent in freezing observatories with eyes glued to instruments and eyepieces hoping for good tracking and sky conditions. All during this was the painstaking logging of notes - figures and frustrations alike. This must never be lost, for it shows determination, drive, perseverance . . . and a great deal of hope. Thank you for the opportunity,” Transcription Center Volunteer
From working our information desks to transcribing primary source documents, our volunteers are large contributors in making the Smithsonian all that it is. It is delightful to think that people all over the world now have more opportunities to contribute from wherever they are. Below is a list of Smithsonian projects that rely on the kindness of strangers (a.k.a. crowdsourcing projects) that I compiled back in September 2014. If one appeals to you, come aboard and help us to achieve our mission of increasing and diffusing knowledge. And please know that we very much appreciate your work, not just during Volunteer Appreciation Month, but throughout the year. Please listen to a message of thanks from our Director, Anne Van Camp.
- Digital Volunteer Certificate, Smithsonian Institution Archives
- Baird’s Network, Bigger Picture Blog
- Volunteer for the Smithsonian Institution Archives
- Growing to a Community of Volunpeers: Communication & Discovery, Bigger Picture Blog
Volunteer now for any of these Smithsonian projects!
- Access American Stories – Crowdsourcing audio descriptions of exhibition for accessibility
- Agriculture Innovation and Heritage Archive – Crowdsourcing oral histories related to agriculture
- Biodiversity Heritage Library Machine Tagging – Crowdsourcing machine tags for inclusion in the Encyclopedia of Life
- Community of Gardens – Crowdsourcing oral histories and media related to community gardens
- eMammal – Crowdsourcing camera-trap images to survey of wildlife
- Global Treebanding Project – Crowdsourcing scientific data about tree biomass and the impact of climate change
- Leafsnap – Crowdsourcing tree data set for mobile app
- Our American Journey - Crowdsourcing oral histories of American experience
- People and the Post: A DigitalMemory Book - Crowdsourcing oral histories from postal workers
- Smithsonian Transcription Center– Crowdsourcing transcriptions of historic documents and collection records
- Stories from Main Street – Crowdsourcing oral histories of rural America
- Wikipedia edit-a-thons - Crowdsourcing Wikipedia Articles about Smithsonian collections and resources
- Will to Adorn - Crowdsourcing oral histories about dress
Archive-It 5.0 Changes and New Features
As a web preservation intern at the Smithsonian Institution Archives, I capture and preserve the Smithsonian’s web presence using the Archive-It crawling service. In October 2014, Archive-It released Phase 1 of Archive-It 5.0, which featured the roll-out of a new interface and more robust data collection for post-crawl reports. Currently the service allows users to switch between the 4.9 and 5.0 versions. Archive-It offers ten new features for reports, which include quick text box filter, infographics, the ability to add notes, and the option to compare two crawl reports side by side. The reports generated by web crawls play a large part in the Archives’ web collection packages and quality assurance (QA), so the changes between versions 4.9 and 5.0 are important for us to understand as we attempt to preserve the record of the ephemeral web.
Version 4.9 Crawl Report
Version 5.0 Crawl Report
Archive-It One-Time IDs
The snapshots above were taken of the same crawl report, one in 4.9 and the other in 5.0. The new format and interface are not the only differences. The one-time IDs (identifiers) are different. For this crawl version 4.9 was assigned 20150320165024358, and version 5.0 was assigned 149112. While the Archives does not fully rely on these numbers as identifiers for crawls, they are attached to the file name when a summary/overview report and the WARC files are downloaded for our collections. Currently, the ability to switch back and forth between 4.9 and 5.0 makes this issue moot, but once this capability is removed those reports and WARC files downloaded with the 4.9 ID will be more difficult to locate and identify in the new Archive-It reports. Archive-It does not mention this change on the Wikis it has provided regarding the roll-out of 5.0. This change could be problematic for those organizations who use these IDs to identify crawls.
Report Summary Data
Part of our web collection packages include downloading the host data and the report summary from the post-crawl report. The host download provides the URLs that were archived from each host as well as other information such as new data, documents blocked by robots.txt, and out-of-scope documents. When switching between 4.9 and 5.0, the only change is the interface and the ability to browse hosts by seed for more robust data.
When viewing 5.0, the report summary is now called an overview but with the same type of data. However, I noticed a few discrepancies. The data is not consistent when switching between the two versions. The snapshots of the same crawl above show different numbers for the Total Documents Archived. Version 4.9 archived 12,440 documents while version 5.0 archived 12,386 documents. It is unclear why the data is different when switching between the two versions.
New Features Overall
The interface of the 5.0 reports page is an improvement. The one-time IDs are now visible, however the collection name is cut off if the collections name is too long, requiring users to hover over the name to see it in its entirety. The reports page quick text box filter is a helpful feature. The search function is more flexible than 4.9, which only allowed searches by collection name or date.
The new view feature provides users with a link from the reports page directly to the Wayback Machine to view the URL without having to navigate to this resource through the access tab. This feature can help improve our quality assurance (QA) workflow. QA involves ensuring our crawl and capture of the site accurately represents what the website displayed at the time of the crawl. Wayback allows us to view the crawl results visually in website form unlike the reports and hosts which provide numerical data about the crawl.
Overall, the 5.0 features are an improvement on this service, which is an important tool for archiving the record of the Smithsonian today.
Cue the music! We invite you to our third "She Blinded Me with Science" Women in Science Wikipdia Edit-a-thon III.
As was the case for the last two edit-a-thons, you can participate both in-person at the Archives, and on-line by joining us in a Google Hangout and etherpad (links to come on the event page linked above.) By participating, you will receive a tour of the Archives, a talk on popular media's role in the history of women in science, an introduction for beginners on editing in Wikipedia, coffee & lunch (if you join us in-person,) and the satisfaction of writing a female scientist into digital history.
In years past, we have focused on women in the history of science which has resulted in the creation of more than 50 new articles on groundbreaking geologists, anthropologists, botanists and more. Let's take a look at some of these women:
Ursula B. Marvin, planetary geologist from the Harvard-Smithsonian’s Astrophysical Observatory, won several awards for her research (1997 Lifetime Achievement Award from Women in Science and Engineering, 1986 History of Geology Award from the Geological Society of America, and the 2005 Sue Tyler Friedman Medal), and had an Antarctic mountain named after her.
Ornithologist Roxie Laybourne basically founded the field of forensic ornithology. Laybourne was very interested in aeronautics and even took an aeronautics correspondence course after not being able to attend aviation school because she was female. She used the Smithsonian's vast bird collection and scanning electron microscopy to identify birds involved in plane crashes. She helped to improve air travel safety working in conjunction with the Federal Aviation Administration and the National Transportation Safety Board.
For this year, we have added 35 more female scientists to our to-do list. Some of them were uncovered by our digital volunteers while transcribing scientific field books in the Smithsonian's Transcription Center. The list also contains many current female scientists at the Smithsonian who are working on everything from the conservation of wild canids to high-energy astrophysics. Join us in writing these women into digital history.
- Sign up for the "She Blinded Me with Science III," Women in Science Wikipedia Edit-a-thon III, Friday, March 27.
- Roxie Collie Laybourne: Remembering a Groundbreaker, Bigger Picture Blog
- Documenting a Geologist's Adventures, Bigger Picture Blog
- Women in Science Wednesdays, Bigger Picture Blog
- Science Service Records at the Smithsonian Institution Archives
When asked what the Smithsonian Institution Archives collects, we say we hold records about the history of the Smithsonian and its people, programs, research, and activities. While accurate, this doesn't really give anyone a clue about what is actually in those records.
The Smithsonian Institution Archives Reference Team handles an average of around 6,000 queries per year, and if you ask us what people have been researching at the Archives recently, you'll get some pretty interesting responses. Although not comprehensive, here's a snapshot of the diverse range of information encompassed by the history of the world's largest museum complex!
Over the past three months, researcher queries have included:
- Smithsonian taxidermists
- Smokey the Bear
- Tropical botany in southeast Asia and the Pacific
- The Smithsonian Gallery of Art design competition
- The Tunguska meteorite
- Mary Henry diaries
- Alexander Von Humboldt
- The cultural history of the dingo
- Russian fur trade
- The Enola Gay exhibit
- Smithsonian Office of International Relations
- Latino scientists
- 20th century small arms
- The Davenport Tablets
- Access and interpretation in museum gardens.
Permissions to upcoming publications using our photos or documents include:
- Wouter Montfrooij, Astronomy! A conceptual introduction from the Big Bang…
- Anthony Burton, The Locomotive Pioneers, 1801-1851
- Norton & Co. for Glenda Gilmore's These United States
- Edward R. Landa, Assessment of Atmospheric Sulfate Deposition and its Historical Roots in Soil Science
- Left/Right Productions, The History Channel's Search for Lost Giants
- Michael G. Littman, Journal of the Washington Academy of Science
- The Avemco Insurance Co., for its digital newsletter On Approach
- Penguin Books, No Way, Way Road Trip!
- Reference services at the Smithsonian Institution Archives
What happens when an organization turns to the Internet 'crowd' for help to make its online collections as accessible as possible? The Archives is several years into its crowd-sourcing initiatives: tagging photographs and solving mysteries on Flickr Commons and transcribing text-oriented materials on the Smithsonian Transcription Center. Our goals are focused on enabling people to virtually look inside these materials and apply data mining and other techniques, enriching and speeding their own work.
In just the past 18 months, over two thousand new volunteers plus an untold number of anonymous contributors have given us a big boost, and the results are remarkable. While the quality and quantity of the effort is impressive – over 300 transcription projects and hundreds more photos available to tag on the Flickr Commons, I am more excited by how I see volunteers' passion for knowledge grow, having an empowering and domino effect.
Looking for the Inside Stories
As the institutional archives documenting the Smithsonian's history of acquiring and disseminating knowledge, we hold a wide variety of both scientific and humanities oriented primary source material that reflects that diversity of the Smithsonian's activities from its earliest days over 169 years ago.
As we selected material for our digital volunteers, I expected them to engage with it, gaining insight and appreciation for the personal efforts and experiences of the individuals behind them. However, volunteers soon uncovered additional, noteworthy individuals and events buried inside those texts.
Going one step further, they began to find connections between different Archives projects, such as the professional and personal relationships between scientists and examples of their work.
Amidst all of these discoveries, the depth of access these volunteers have helped us create has enabled researchers to include these historical sources in computer-driven longitudinal studies.
#WeLearnTogether: The Domino Effect
#welearntogether is a Twitter hash tag these 'volunpeers' have taken to when discussing the projects they are working on. It reflects the community culture we have striven for since the first days of our crowd-sourcing initiatives. So what's this domino effect?
Domino 1: Our volunpeers are using the information they have found, finding links to data held by museums, libraries, and archives at the Smithsonian and helping us to connect those resources to each other.
Domino 2: The volunteers are reaching out to other organizations, and sharing what they have learned so those organizations, too, can update and enrich their own information catalogs. These include JSTOR and the United States National Herbarium.
In the end, the knowledge of our collections has grown, their accessibility improved, resulting in tangible benefits for today’s and tomorrow’s Smithsonian collections users. It is so rewarding to watch these volunteers’ voyages of discovery stoke a passion to discover more and fire an enthusiasm about these collections that has proven to be contagious.
- Record Unit 7148 - David Crockett Graham Papers, 1923-1936, Smithsonian Institution Archives
- Record Unit 7272 - Frederick Vernon Coville Papers, 1888-1936 and undated, Smithsonian Institution Archives
- Record Unit 7267 - Vernon Orlando Bailey Papers, 1889-1941 and undated, Smithsonian Institution Archives
- Record Unit 7417 - Florence Merriam Bailey Papers, 1865-1942, Smithsonian Institution Archives