The Bigger Picture: Visual Archives and the Smithsonian
Category: Behind the Scenes
Archive-It 5.0 Changes and New Features
As a web preservation intern at the Smithsonian Institution Archives, I capture and preserve the Smithsonian’s web presence using the Archive-It crawling service. In October 2014, Archive-It released Phase 1 of Archive-It 5.0, which featured the roll-out of a new interface and more robust data collection for post-crawl reports. Currently the service allows users to switch between the 4.9 and 5.0 versions. Archive-It offers ten new features for reports, which include quick text box filter, infographics, the ability to add notes, and the option to compare two crawl reports side by side. The reports generated by web crawls play a large part in the Archives’ web collection packages and quality assurance (QA), so the changes between versions 4.9 and 5.0 are important for us to understand as we attempt to preserve the record of the ephemeral web.
Version 4.9 Crawl Report
Version 5.0 Crawl Report
Archive-It One-Time IDs
The snapshots above were taken of the same crawl report, one in 4.9 and the other in 5.0. The new format and interface are not the only differences. The one-time IDs (identifiers) are different. For this crawl version 4.9 was assigned 20150320165024358, and version 5.0 was assigned 149112. While the Archives does not fully rely on these numbers as identifiers for crawls, they are attached to the file name when a summary/overview report and the WARC files are downloaded for our collections. Currently, the ability to switch back and forth between 4.9 and 5.0 makes this issue moot, but once this capability is removed those reports and WARC files downloaded with the 4.9 ID will be more difficult to locate and identify in the new Archive-It reports. Archive-It does not mention this change on the Wikis it has provided regarding the roll-out of 5.0. This change could be problematic for those organizations who use these IDs to identify crawls.
Report Summary Data
Part of our web collection packages include downloading the host data and the report summary from the post-crawl report. The host download provides the URLs that were archived from each host as well as other information such as new data, documents blocked by robots.txt, and out-of-scope documents. When switching between 4.9 and 5.0, the only change is the interface and the ability to browse hosts by seed for more robust data.
When viewing 5.0, the report summary is now called an overview but with the same type of data. However, I noticed a few discrepancies. The data is not consistent when switching between the two versions. The snapshots of the same crawl above show different numbers for the Total Documents Archived. Version 4.9 archived 12,440 documents while version 5.0 archived 12,386 documents. It is unclear why the data is different when switching between the two versions.
New Features Overall
The interface of the 5.0 reports page is an improvement. The one-time IDs are now visible, however the collection name is cut off if the collections name is too long, requiring users to hover over the name to see it in its entirety. The reports page quick text box filter is a helpful feature. The search function is more flexible than 4.9, which only allowed searches by collection name or date.
The new view feature provides users with a link from the reports page directly to the Wayback Machine to view the URL without having to navigate to this resource through the access tab. This feature can help improve our quality assurance (QA) workflow. QA involves ensuring our crawl and capture of the site accurately represents what the website displayed at the time of the crawl. Wayback allows us to view the crawl results visually in website form unlike the reports and hosts which provide numerical data about the crawl.
Overall, the 5.0 features are an improvement on this service, which is an important tool for archiving the record of the Smithsonian today.
Cue the music! We invite you to our third "She Blinded Me with Science" Women in Science Wikipdia Edit-a-thon III.
As was the case for the last two edit-a-thons, you can participate both in-person at the Archives, and on-line by joining us in a Google Hangout and etherpad (links to come on the event page linked above.) By participating, you will receive a tour of the Archives, a talk on popular media's role in the history of women in science, an introduction for beginners on editing in Wikipedia, coffee & lunch (if you join us in-person,) and the satisfaction of writing a female scientist into digital history.
In years past, we have focused on women in the history of science which has resulted in the creation of more than 50 new articles on groundbreaking geologists, anthropologists, botanists and more. Let's take a look at some of these women:
Ursula B. Marvin, planetary geologist from the Harvard-Smithsonian’s Astrophysical Observatory, won several awards for her research (1997 Lifetime Achievement Award from Women in Science and Engineering, 1986 History of Geology Award from the Geological Society of America, and the 2005 Sue Tyler Friedman Medal), and had an Antarctic mountain named after her.
Ornithologist Roxie Laybourne basically founded the field of forensic ornithology. Laybourne was very interested in aeronautics and even took an aeronautics correspondence course after not being able to attend aviation school because she was female. She used the Smithsonian's vast bird collection and scanning electron microscopy to identify birds involved in plane crashes. She helped to improve air travel safety working in conjunction with the Federal Aviation Administration and the National Transportation Safety Board.
For this year, we have added 35 more female scientists to our to-do list. Some of them were uncovered by our digital volunteers while transcribing scientific field books in the Smithsonian's Transcription Center. The list also contains many current female scientists at the Smithsonian who are working on everything from the conservation of wild canids to high-energy astrophysics. Join us in writing these women into digital history.
- Sign up for the "She Blinded Me with Science III," Women in Science Wikipedia Edit-a-thon III, Friday, March 27.
- Roxie Collie Laybourne: Remembering a Groundbreaker, Bigger Picture Blog
- Documenting a Geologist's Adventures, Bigger Picture Blog
- Women in Science Wednesdays, Bigger Picture Blog
- Science Service Records at the Smithsonian Institution Archives
When asked what the Smithsonian Institution Archives collects, we say we hold records about the history of the Smithsonian and its people, programs, research, and activities. While accurate, this doesn't really give anyone a clue about what is actually in those records.
The Smithsonian Institution Archives Reference Team handles an average of around 6,000 queries per year, and if you ask us what people have been researching at the Archives recently, you'll get some pretty interesting responses. Although not comprehensive, here's a snapshot of the diverse range of information encompassed by the history of the world's largest museum complex!
Over the past three months, researcher queries have included:
- Smithsonian taxidermists
- Smokey the Bear
- Tropical botany in southeast Asia and the Pacific
- The Smithsonian Gallery of Art design competition
- The Tunguska meteorite
- Mary Henry diaries
- Alexander Von Humboldt
- The cultural history of the dingo
- Russian fur trade
- The Enola Gay exhibit
- Smithsonian Office of International Relations
- Latino scientists
- 20th century small arms
- The Davenport Tablets
- Access and interpretation in museum gardens.
Permissions to upcoming publications using our photos or documents include:
- Wouter Montfrooij, Astronomy! A conceptual introduction from the Big Bang…
- Anthony Burton, The Locomotive Pioneers, 1801-1851
- Norton & Co. for Glenda Gilmore's These United States
- Edward R. Landa, Assessment of Atmospheric Sulfate Deposition and its Historical Roots in Soil Science
- Left/Right Productions, The History Channel's Search for Lost Giants
- Michael G. Littman, Journal of the Washington Academy of Science
- The Avemco Insurance Co., for its digital newsletter On Approach
- Penguin Books, No Way, Way Road Trip!
- Reference services at the Smithsonian Institution Archives
What happens when an organization turns to the Internet 'crowd' for help to make its online collections as accessible as possible? The Archives is several years into its crowd-sourcing initiatives: tagging photographs and solving mysteries on Flickr Commons and transcribing text-oriented materials on the Smithsonian Transcription Center. Our goals are focused on enabling people to virtually look inside these materials and apply data mining and other techniques, enriching and speeding their own work.
In just the past 18 months, over two thousand new volunteers plus an untold number of anonymous contributors have given us a big boost, and the results are remarkable. While the quality and quantity of the effort is impressive – over 300 transcription projects and hundreds more photos available to tag on the Flickr Commons, I am more excited by how I see volunteers' passion for knowledge grow, having an empowering and domino effect.
Looking for the Inside Stories
As the institutional archives documenting the Smithsonian's history of acquiring and disseminating knowledge, we hold a wide variety of both scientific and humanities oriented primary source material that reflects that diversity of the Smithsonian's activities from its earliest days over 169 years ago.
As we selected material for our digital volunteers, I expected them to engage with it, gaining insight and appreciation for the personal efforts and experiences of the individuals behind them. However, volunteers soon uncovered additional, noteworthy individuals and events buried inside those texts.
Going one step further, they began to find connections between different Archives projects, such as the professional and personal relationships between scientists and examples of their work.
Amidst all of these discoveries, the depth of access these volunteers have helped us create has enabled researchers to include these historical sources in computer-driven longitudinal studies.
#WeLearnTogether: The Domino Effect
#welearntogether is a Twitter hash tag these 'volunpeers' have taken to when discussing the projects they are working on. It reflects the community culture we have striven for since the first days of our crowd-sourcing initiatives. So what's this domino effect?
Domino 1: Our volunpeers are using the information they have found, finding links to data held by museums, libraries, and archives at the Smithsonian and helping us to connect those resources to each other.
Domino 2: The volunteers are reaching out to other organizations, and sharing what they have learned so those organizations, too, can update and enrich their own information catalogs. These include JSTOR and the United States National Herbarium.
In the end, the knowledge of our collections has grown, their accessibility improved, resulting in tangible benefits for today’s and tomorrow’s Smithsonian collections users. It is so rewarding to watch these volunteers’ voyages of discovery stoke a passion to discover more and fire an enthusiasm about these collections that has proven to be contagious.
- Record Unit 7148 - David Crockett Graham Papers, 1923-1936, Smithsonian Institution Archives
- Record Unit 7272 - Frederick Vernon Coville Papers, 1888-1936 and undated, Smithsonian Institution Archives
- Record Unit 7267 - Vernon Orlando Bailey Papers, 1889-1941 and undated, Smithsonian Institution Archives
- Record Unit 7417 - Florence Merriam Bailey Papers, 1865-1942, Smithsonian Institution Archives
Hidden in Plain Sight: Reading Between the Lines with the Smithsonian Transcription Center Volunteers
The Smithsonian Transcription Center volunteers have been busy unlocking the hidden stories from the Smithsonian's collections - including the women in science hiding in plain sight in these digitized pages. From amateur collectors to seasoned gardeners, women made valuable contributions to the Smithsonian's collections. Here's what we're learning and doing together with their information.
Last July, I shared some of the progress of volunteers and their growth as a community. The highlights? Over 450 volunteers transcribed 13,412 pages including 46 different Archives projects. Since then, we have grown our community to over 4,500 volunteers and the completed text of 66,598 pages can now be indexed. This remarkable growth includes 247 completed Archives projects as well.
Updated statistics: Over 450 volunteers at that point had transcribed 46 different Archives projects. That was part of the total 956 volunteers who had completed 13,412 pages by July 2014. Since then, 2,147 volunteers have helped wrap up a whopping 247 Archives projects! The community has grown to over 4,700 volunteers total; they have worked together to completely transcribe and review 67,205 pages - to make searchable text in Smithsonian's Collections Search Center.
Through mysteries, connections, and achievements, the Archives continue to recognize the women in science in their collections. The Archives also shares field notes and books in the Smithsonian Transcription Center, where we have fully transcribed field notes and photo albums from women scientists including Doris Cochran, Cléofe Caldéron, Florence Bailey, and Mary Agnes Chase. Volunteers - whom we call #volunpeers - have also been able to identify at least 25 women who contributed specimens and were recorded in field notes by Joseph Nelson Rose.
Rose was a botanist with the U. S. Department of Agriculture and the Smithsonian Institution; his work was prolific and highlighted his great commitment to botanical work and cooperative discovery. How fitting that by transcribing his detailed notes, volunpeers would open a window: private citizens and researchers alike sharing specimens with the Smithsonian Institution. Many of the women in science we've uncovered in the pages were involved in science with informal work or non-institutional roles. The collectors in Rose's pages were professional botanists, and collecting sisters, wives, and amateurs.
In addition to notes on women cultivating botanical collections, we also see women in science in the entomological specimens labels and botanical specimens sheets that volunpeers transcribe. One challenge emerges: what can we do with the knowledge that emerges from the digitized pages? How can we acknowledge the effort of all of the collectors and honor the work of volunteers?
As Smithsonian staff begin to incorporate that information into official records and institutional narrative, we can discuss the challenges openly - in Google Hangouts, blogposts, and social media. By working with volunpeers and others, we might open the problem to group solutions. We can also acknowledge the scientific work in spaces like Wikipedia where challenges remain with representation of women. In this way, the knowledge generated from the Archives' and other Smithsonian collections can be shared with the public. As we approach Women's History Month, we have another opportunity to connect the women in science in these pages to the body of knowledge held at the Smithsonian Institution Archives and the energy of Wikipedia editors.
You can let science talk and help the stories of these women unfold in two ways: by joining the Archives in a Wikipedia edit-a-thon on March 27, 10:00-4:00 pm EST. Here is the running list of women from Joseph Nelson Rose’s field notes:
Women Without Wikipedia Representation
- Wilmatte Porter Cockerell
- Helen S. Conant
- Grace M. Cole
- Mrs. Anna W Kidder
- Miss Jesse P Rose
- Ruth C. Ross
- Miss Gertrude Sinscheimer
- Sister Mary Regina of St. Mary’s Convent (NY)
- Elsie McElroy Slater
- Mrs. Florence A Standley
- Miss Nellie Standley
- Miss V. Tasker of Pennsylvania
- Miss F. N. Vasey
- Mrs. Irene Vera
Women With Articles in Wikipedia
Women Currently Identified by Names Other Than Their Own
- Mrs. Charles Bly
- Mrs. D. D. Gaillard
- Lady Hanbury
- Mrs. Dan Hansen
- Mrs. Eugene A Harris
- Mrs. S. L. Pattison
- Mrs. L. L. Roller
- Mrs. G. M. Wolfe
Or you can share your passion for Smithsonian collections by transcribing with other volunpeers in the Transcription Center.
- 1 of 54
- next ›