The Bigger Picture: Visual Archives and the Smithsonian
Category: Behind the Scenes
Over the last several weeks, the Archives has welcomed Heather Weiss, an intern with Project SEARCH. Heather Weiss came to us from successful experiences at the Hirshhorn Museum and Sculpture Garden, the Office of Fellowships and Internships, and the National Portrait Gallery, among others, and has been assisting the archivists and the conservators with a pair of different ongoing initiatives: a finding aid data entry project with the archivists and a rehousing project with the conservators. We wanted to highlight Heather's valuable contribution to our work at the Archives, and have invited her to share her thoughts about working with us.
Hi, my name is Heather Weiss. I am an intern at a program called Project SEARCH at the Smithsonian Institution. Project SEARCH, or PSSI, is a 10-month program designed for people with disabilities who are looking to find full-time jobs. As a part of the PSSI program, I have recently been gaining a positive experience in learning about the art of preservation. So far, I have discovered that preservation comes in many different forms, such as repairing an art sculpture, checking the lighting in an art gallery, dusting picture frames, and polishing the Plexiglas on artworks. But, my most recent positive experience to date is learning about preservation at the Smithsonian Institution Archives. Archiving is important, because when you preserve art and documents, then that means that you’re preserving a part of history. And while I am learning about archiving, I have also learned about data entry and rehousing folders into new boxes. When you’re rehousing folders, that means that you’re transferring older historical documents from older boxes and folders, and then putting those documents into new and more stable boxes and folders that will last longer. Data entry is when you take the data from a paper source and then digitize that source by putting it on the computer. Eventually, people will be able to look at the information once it is available. My favorite part of this experience is getting to see the history that’s been stored from different decades within the folders. I find it very amazing.
Heather's data entry for the archivists was a testament to her detail-oriented nature. It was meticulous work, and Heather's efforts will lead to improved finding aids of our collections. She moved quickly through that project, leading the archival team to work speedily to keep her busy! Heather also accomplished the conservators' first rehousing assignment in record time, changing out all the nearly one hundred acidic boxes of one collection (Record Unit 158: United States National Museum, Curators' Annual Reports, 1881-1964). After completing both of these tasks, Heather moved on to the next portion of the finding aid project, as well as a more complex rehousing assignment (Record Unit 137: Office of the Under Secretary, Records, 1958-1973) that involved replacing both boxes and folders, necessitating careful copying of folder information from old to new, as well as removing bulky and harmful clips and staples, safely rehousing photographs in photo-safe enclosures, checking the condition of documents and flagging them for later attention as needed.
We have appreciated Heather's willingness to learn new skills, attention to detail, and inquisitive mind. It has been a pleasure to watch her take on more difficult tasks as her time with us has progressed, and to play a part in her personal growth. We wish her all the best following her graduation from Project SEARCH, and know that she will be successful at whatever she puts her mind to. Good luck, Heather!
What better way to celebrate National Rose Month than by showcasing the Kathrine Dulin Folger Rose Garden, which is located in front of the Arts and Industries Building to the east of the Smithsonian Castle. Dedicated in fall 1998, the Folger Rose Garden welcomes visitors not only with roses, but also with perennials, annuals, bulbs, and woody evergreens, as well as with an original 19th century, three-tiered fountain manufactured by the J. W. Fiske Iron Works Company in New York.
- Kathrine Dulin Folger Rose Garden, Smithsonian Gardens
A little under a year ago, we rolled out a new search for our site which is powered by the Google Search Appliance. The goal of implementing this new search was to make our content and collections more accessible, to make discovery easier, and to generally improve the user experience.
Work towards that goal didn't end a year ago.
Over the summer of 2014, work by our staff began on making PDFs of the Smithsonian staff newsletter, The Torch, text-searchable. Because these PDFs can be read by our Google Search Appliance's bots, their content can be indexed. This means that our site search will return any Torch issue that matches your search string.
Let's say you're doing some research on Smokey the Bear. So you head over to our website, and search for "Smokey." You'll be presented with a familiar search results screen (one of which is actually a link to a Torch PDF). But let's say you didn't want to see finding aids or collection items, just the PDFs. Don't worry, you can do that too.
You may have noticed there's a new link at the top of the content type filters, labeled "PDFs." In the above example, the site would return only PDFs that match the search string "Smokey," such as an article about if Smokey should be retired and the original Smoke's obituary.
- You Asked, We Listened: Introducing the Archives New Site Search, The Bigger Picture blog, Smithsonian Institution Archives
- Smithsonian Institution Archives Moves to Drupal 7, The Bigger Picture blog, Smithsonian Institution Archives
Volunteers have been an integral part of the Smithsonian since the beginning. As our historian Pamela Henson likes to say, we have always relied on the kindness of strangers. Our first Secretary, Joseph Henry, coordinated a group of about 600 people across North America to send in weather data which he posted on a map in the Smithsonian’s Castle (this program eventually led to the founding of the National Weather Service.) Our second Secretary, Spencer Fullerton Baird, created a network of collecting volunteers who sent biological specimen to the Smithsonian for study and inclusion in its first U.S. National Museum.
Today, on site volunteers number almost the same as staff; 6,373 staff and 5451 volunteers. Since kicking off our first pan-Smithsonian digital volunteer website in June 2013, the Smithsonian Transcription Center, we nearly doubled our volunteer base by 4,919! The number steadily climbs and it is likely to soon outnumber our in-person volunteers, and eventually our staff.
Although digital volunteers work from all over the world, there is a sense of community amongst the volunteers through social media and the Transcription Center itself. I regularly field questions/comments from volunteers in very different time zones. It also seems like serving as a digital volunteer yields the same sense of purpose as our in-person volunteers:
“…I was also keen because anything I do helps to open up access to the Smithsonian collections and this results in improved connections and knowledge for everyone. Scientists, citizen scientists, historians, school children, from anywhere with an internet connection. The fact that anyone can view the transcriptions and that there is open access to the transcribed data was a very important factor in me donating my time,” Transcription Center Volunteer
“ What drives me, in particular, is the preservation of the study of astronomy. There were countless hours spent in freezing observatories with eyes glued to instruments and eyepieces hoping for good tracking and sky conditions. All during this was the painstaking logging of notes - figures and frustrations alike. This must never be lost, for it shows determination, drive, perseverance . . . and a great deal of hope. Thank you for the opportunity,” Transcription Center Volunteer
From working our information desks to transcribing primary source documents, our volunteers are large contributors in making the Smithsonian all that it is. It is delightful to think that people all over the world now have more opportunities to contribute from wherever they are. Below is a list of Smithsonian projects that rely on the kindness of strangers (a.k.a. crowdsourcing projects) that I compiled back in September 2014. If one appeals to you, come aboard and help us to achieve our mission of increasing and diffusing knowledge. And please know that we very much appreciate your work, not just during Volunteer Appreciation Month, but throughout the year. Please listen to a message of thanks from our Director, Anne Van Camp.
- Digital Volunteer Certificate, Smithsonian Institution Archives
- Baird’s Network, Bigger Picture Blog
- Volunteer for the Smithsonian Institution Archives
- Growing to a Community of Volunpeers: Communication & Discovery, Bigger Picture Blog
Volunteer now for any of these Smithsonian projects!
- Access American Stories – Crowdsourcing audio descriptions of exhibition for accessibility
- Agriculture Innovation and Heritage Archive – Crowdsourcing oral histories related to agriculture
- Biodiversity Heritage Library Machine Tagging – Crowdsourcing machine tags for inclusion in the Encyclopedia of Life
- Community of Gardens – Crowdsourcing oral histories and media related to community gardens
- eMammal – Crowdsourcing camera-trap images to survey of wildlife
- Global Treebanding Project – Crowdsourcing scientific data about tree biomass and the impact of climate change
- Leafsnap – Crowdsourcing tree data set for mobile app
- Our American Journey - Crowdsourcing oral histories of American experience
- People and the Post: A DigitalMemory Book - Crowdsourcing oral histories from postal workers
- Smithsonian Transcription Center– Crowdsourcing transcriptions of historic documents and collection records
- Stories from Main Street – Crowdsourcing oral histories of rural America
- Wikipedia edit-a-thons - Crowdsourcing Wikipedia Articles about Smithsonian collections and resources
- Will to Adorn - Crowdsourcing oral histories about dress
Archive-It 5.0 Changes and New Features
As a web preservation intern at the Smithsonian Institution Archives, I capture and preserve the Smithsonian’s web presence using the Archive-It crawling service. In October 2014, Archive-It released Phase 1 of Archive-It 5.0, which featured the roll-out of a new interface and more robust data collection for post-crawl reports. Currently the service allows users to switch between the 4.9 and 5.0 versions. Archive-It offers ten new features for reports, which include quick text box filter, infographics, the ability to add notes, and the option to compare two crawl reports side by side. The reports generated by web crawls play a large part in the Archives’ web collection packages and quality assurance (QA), so the changes between versions 4.9 and 5.0 are important for us to understand as we attempt to preserve the record of the ephemeral web.
Version 4.9 Crawl Report
Version 5.0 Crawl Report
Archive-It One-Time IDs
The snapshots above were taken of the same crawl report, one in 4.9 and the other in 5.0. The new format and interface are not the only differences. The one-time IDs (identifiers) are different. For this crawl version 4.9 was assigned 20150320165024358, and version 5.0 was assigned 149112. While the Archives does not fully rely on these numbers as identifiers for crawls, they are attached to the file name when a summary/overview report and the WARC files are downloaded for our collections. Currently, the ability to switch back and forth between 4.9 and 5.0 makes this issue moot, but once this capability is removed those reports and WARC files downloaded with the 4.9 ID will be more difficult to locate and identify in the new Archive-It reports. Archive-It does not mention this change on the Wikis it has provided regarding the roll-out of 5.0. This change could be problematic for those organizations who use these IDs to identify crawls.
Report Summary Data
Part of our web collection packages include downloading the host data and the report summary from the post-crawl report. The host download provides the URLs that were archived from each host as well as other information such as new data, documents blocked by robots.txt, and out-of-scope documents. When switching between 4.9 and 5.0, the only change is the interface and the ability to browse hosts by seed for more robust data.
When viewing 5.0, the report summary is now called an overview but with the same type of data. However, I noticed a few discrepancies. The data is not consistent when switching between the two versions. The snapshots of the same crawl above show different numbers for the Total Documents Archived. Version 4.9 archived 12,440 documents while version 5.0 archived 12,386 documents. It is unclear why the data is different when switching between the two versions.
New Features Overall
The interface of the 5.0 reports page is an improvement. The one-time IDs are now visible, however the collection name is cut off if the collections name is too long, requiring users to hover over the name to see it in its entirety. The reports page quick text box filter is a helpful feature. The search function is more flexible than 4.9, which only allowed searches by collection name or date.
The new view feature provides users with a link from the reports page directly to the Wayback Machine to view the URL without having to navigate to this resource through the access tab. This feature can help improve our quality assurance (QA) workflow. QA involves ensuring our crawl and capture of the site accurately represents what the website displayed at the time of the crawl. Wayback allows us to view the crawl results visually in website form unlike the reports and hosts which provide numerical data about the crawl.
Overall, the 5.0 features are an improvement on this service, which is an important tool for archiving the record of the Smithsonian today.