The Bigger Picture: Visual Archives and the Smithsonian
Category: Behind the Scenes
Last week, we celebrated two years of using Archive-It for documenting the Smithsonian Institution's web presence. Previously, we had been using an in-house software and hardware installation in order to crawl websites and had cobbled together various less-than-ideal methods for capturing social media. Our hope was that a subscription to Archive-It would allow us to capture our web presence in a more efficient manner as well as allow us to provide better access to our crawled web content.
So how are we doing?
The Smithsonian currently has a total of 349 distinct websites and blogs. In the last year, we've crawled 170 of them or approximately 49% of the total. Altogether, we've crawled 327 websites and blogs, about 94% of the total, since we began using Archive-It two years ago. In addition, a significant number have been crawled more than once. Of those that have yet to be crawled, the majority have underlying code that make them nearly impossible to crawl using the technology currently available to us.
By this point, we had hoped to be crawling our websites and blogs annually. Although we haven't reached that goal, we've certainly improved from approximately one-half of our websites in 2 ½ years prior to using Archive-It, to nearly all of our websites and blogs in less than two years with Archive-It. And there's the added bonus of most of our crawled content from the last two years being available online via our Smithsonian Institution Websites Collection on Archive-It.
We continue to take steps to improve our efficiency. One of our next steps will be to evaluate the websites we've already crawled to determine which ones do not need to be crawled again because they are no longer being updated. An example might be an online exhibition that was launched in its final format and was never intended to be modified. The fewer websites that need to be crawled, the more frequently we'll be able to capture those that do.
- Web Archiving Update, The Bigger Picture, Smithsonian Institution Archives
- Smithsonian Now Using Archive-It to Crawl Websites, The Bigger Picture, Smithsonian Institution Archives
- Connecting the Dots: Issues with Preserving Complex Websites, The Bigger Picture, Smithsonian Institution Archives
It's time again to celebrate all the wonderful things archives have and do! The Society of American Archivists declares each October American Archives Month and the Smithsonian theme for this year is "Discover and Connect." At the Smithsonian Institution Archives, we handle over 6000 reference requests per year and have an ambitious digitization plan to serve people worldwide through our website. This past year alone, over 380,000 people accessed our resources online.
Archivists and conservators at the Smithsonian are top-notch, and to celebrate the occasion, we make them available to you for an entire day to answer your questions about preserving your own collections. Our 4th Facebook Q&A will be held on October 27th, from 10am-4pm EST, on the Smithsonian's Facebook page. Four of our staff members will be there with skills in a/v, digital, and paper archives. Here's the line-up:
Joe Hursey, National Museum of American History's Archives Center, Reference Coordinator(updated 10/20 due to conflict)
- Michael Pahn, National Museum of the American Indian Archives Center, Head Archivist (A/V specialty)
- Marguerite Roby, Photo Archivist, Smithsonian Institution Archives (added 10/20 since original posting)
- Lynda Schmitz Fuhrig, Smithsonian Institution Archives, Electronic Records Archivist
- Dave Walker, Center for Folklife and Cultural Heritage, Ralph Rinzler Folklife Archives, Audio Digitization Specialist
Also check out past Q&A's to see if your question has already been answered!
There are several other ways to connect with the Smithsonian's 16 archives this month:
- Blog's across the Smithsonian will give an inside look at collections and practices.
- If you're a Pinterest fan, check out the Smithsonian's October is Archives Month board.
- Archivists across the Smithsonian will share sound, video, and film on the Smithsonian AV Archivists Tumblr.
- Digital volunteers can explore and help us transcribe letters, diaries, and field books on the SI Transcription Center.
- And if you're local to DC, a selection of artists' diaries from the Archives of American Art is on exhibit in the Lawrence A. Fleischman Gallery in Washington, D.C. 11:30am-7pm daily.
We'd love to have you participate in any way you can. Three cheers for archives!
For the past few months I've been walking around the office telling my coworkers my latest project was upgrading our site's search. In actuality the word "upgrading” wasn't really the best indication of what I was doing, "extreme overhaul" would have been a better fit.
The project really got kicked off in June when we started looking at the results from our website survey. Yes, we do read those! So if you happen to have one popup while your browsing our site, please fill it out and let us know how we're doing!
The thing that immediately stuck out when reviewing the surveys was that people were not overly happy with our site searching capabilities.
Our old site search was actually three separate searches that had been linked together with tabs to make them appear as one. This was done because some of our site content is actually stored in a separate database which has it's own way of searching.
If one had typed "Wetmore" into the search box at the upper right hand corner of the page, they would have been presented with results of mostly blog posts and few pages from our Smithsonian history content. To find actual collection items related to Wetmore, one would have to click on one of the tabs (either collections or finding aids depending on what they were looking for) and load yet another page with the search results.
The process of searching was clunky, limited, and not terribly helpful for researchers. In fact we had three respondents say they would just use Google to search for pages on our site instead of using our actual site search. We had known the search wasn't very good, but the survey results opened our eyes to how much higher a priority fixing it had to be.
Our site search now relies on one search, which is powered by a Google Search Appliance, with a contributed Drupal module and custom module providing the wiring to hook the site up to the Search Appliance.
Now performing a search for "Wetmore" will provide you with not just blog posts and pages, but also anything else on our site related to your search (collection guides, images, chronologies, legal documents, etc.) To paraphrase J. R. R Tolkien, we now have "one search to find them."
The Google Search Appliance also indexes metadata for us. Not only does the metadata get factored into our searches, which provides more accurate searching, but we can also filter off of the metadata. The result is the ability for our site visitors to now filter based on subject, creator, and date ranges.
Certain keywords are also flagged to provide suggested search results. During the process of rewriting our search, we looked at our top site search queries. We paired each of those searches up with pages that provide general information on the subject of the search. Using the same example I used before of a search for "Wetmore" (who was the sixth Smithsonian Secretary.), it will provide the user with a light grey box containing a link to Alexander Wetmore's biography along with a brief excerpt from that page.
But let's say you just did a search for an item that you know is in the Smithsonian's collection, but it turns out it isn't in the Archives. What then? Are you doomed to search all of the other Smithsonian Units until you find that one collection item your looking for? Not at all! On most of our site search results there will be a link to the Smithsonian Collection Search Center (it's located in the left column under the date range filter). If our search doesn't have what you’re looking for, there's a good chance it exists somewhere in the Smithsonian.
- Smithsonian Institution Archives Moves to Drupal 7, The Bigger Picture blog, Smithsonian Institution Archives
- Search results for "Wetmore" on Smithsonian Institution Archives website
- Smithsonian Collections Search Center
James Smithson’s original purpose in bequeathing his estate to the United States was to establish the Smithsonian Institution for the “increase and diffusion of knowledge.” And the saying goes - learn by doing. To this end, the Smithsonian has been increasing its interactive opportunities across its entire network of museums and research centers, and the Archives are no exception.
This summer, I had the opportunity to work as an intern with the Digital Services Division (DSD), mainly working on the digitization of special collections. During my time here, I noticed that both the DSD and the Archives as a whole place a heavy emphasis on public engagement with the Archives collections. Peoples' use of collections definitely plays an important role in guiding the Archives decisions on what to prioritize for digitization. This aspect combined with other factors such as the physical condition and size of the collection, the available information about the materials, and the use of digitized collections for special projects inform the Archives as to what collections to digitize.
My work this summer, along with a few other interns and volunteers, was to digitize some of the Archives collections for special projects. All of us came from different backgrounds and had varying degrees of experience with digitization. With the ever increasing demand for digitized materials from the Archives, it is constantly in need of as many helping hands as possible. As a result a great deal of the digitization work is done by interns or volunteers.
The digitization of materials allows the Archives to share its collections with those who are not able to physically come to the Archives. By making its content as widely available as possible on the Archives' website, in the Collections Search Center, and in the Smithsonian Institution Research Information System (SIRIS) its collections can be discovered by as many people as possible. A new avenue that people can interact with the Archives collections can be found in the Smithsonian Transcription Center, where “volunpeers” can help transcribe text from digitized materials. Meghan Ferriter, Project Coordinator, Smithsonian Transcription Center, talks about the role of volunpeers in her blog post Growing to a Community of Volunpeers: Communication & Discovery.
Making archival collections available online and engaging people to help make them more accessible are just some of the many steps towards connecting people to collections. As more museums, libraries, and archives put their collections online there will be more opportunities for people to see materials from across the country and from across the world. The Smithsonian has made great strides in the past few years in getting its collections online and is now poised better than ever to work with other institutions and organizations to make it collections more readily discoverable.
Indeed, Smithsonian Secretary G. Wayne Clough mentions in his e-book, Best of Both Worlds: Museums, Libraries, and Archives in a Digital Age, that one of the Smithsonian’s next endeavors is collaboration with other institutions. With the expanding role of Wikipedia in research archives around the world are recognizing that collaboration with sites that get heavy traffic is highly beneficial in making people aware of their collections. Indeed, the Archives continues to experience an increase in traffic on its website and in its use of collections as a result of hosting regular Wikipedia edit-a-thons.
The Archives is currently engaged in work with Gale Cengage Learning, and also often cooperates with other folks at the Smithsonian such as the National Museum of Natural History and the National Museum of American History. By collaborating with other museums and institutions in making its collections available, the Archives is following the tenet laid out by James Smithson for the "increase and diffusion of knowledge."
- Growing to a Community of Volunpeers: Communication & Discovery, The Bigger Picture blog, Smithsonian Institution Archives
- Location! Location! Location!, The Bigger Picture blog, Smithsonian Institution Archives
- Best of Both Worlds: Museums, Libraries, and Archives in a Digital Age, by G. Wayne Clough, Smithsonian Institution
The Smithsonian Transcription Center has been around for over a year and the community of #volunpeers who expertly transcribe and review texts has grown and grown. This summer, my project was to get to know the community of #volunpeers who contributed to Smithsonian Institution Archive’s projects by looking through pages and pages of data reflecting the quantity and frequency of completing transcription and review activities. Here is a graph of the activity of Archives #volunpeers during the first 6 months of the year.
To understand why activity was high during certain moments and low during others, and to explore how the Transcription Center operates as a system with multiple moving parts, I took a systems approach and a landscape ecology perspective.
The three parts of the system that I explored are:
- #Volunpeer behavior - The frequency and quantity of transcription and review activities completed by users
- Project landscape - The amount and type of Archivesprojects available for activity
- Social media communication - Transcription Center special events and social media posts by Smithsonian units and the #volunpeer community
Each of these components is related to one another. For instance, #Volunpeer behavior is affected by the types of projects available for activity and the type and quantity of social media communication at a given moment. #Volunpeers generally gravitate towards projects with a narrative component, like diaries or field notes written poetically. Furthermore, events like #7DayReviewChallenge and #CandC (Contribute&Connect) foster the re-engagement of formerly dormant #volunpeers and boost the activity of existing active contributors.
The most prominent characteristic of the Archives community of #volunpeers is that the majority of all activity is completed by a handful of top contributors. Does this matter? Is this trait good, bad, or both? The answer is both.
The Archives is incredibly lucky and thankful to have such amazing power #volunpeers, which corresponds to a high-volume of transcription and review activity and opportunities for knowledge discovery, but this trait has the potential to threaten the overall health of the Transcription Center system. Why? Let’s turn to landscape ecology and Smithsonian Secretary Spencer Fullerton Baird’s Index of Correspondence to understand.
A healthy and sustainable system, meaning that it is productive and exists long term, requires resiliency, meaning that if threatened or damaged, the system can recover quickly and fully. If a system is not resilient, it is vulnerable and fragile, meaning that its vitality is at risk if the system suffers a loss.
In his Index, Baird corresponded with hundreds and hundreds of natural history collectors and citizen scientists, some of whom collected the same things from similar geographical locations. For example, there was a redundancy of shell collectors from Grand Rapids, Michigan listed in Baird’s Index.
One of the many benefits of having a large, diverse, and redundant network of collectors was that if one collector stopped collecting, or his/her items were damaged during transport to the Smithsonian, Baird could draw upon the collections of another correspondent who had a similar collection. Seemingly redundant collectors become the saviors of the system! This allows it to continue uninterrupted, which increases its sustainability and stability.
The same is true for the Transcription Center.
Having a large and diverse group of #volunpeers who complete activity instead of a tiny group of power #volunpeers contributes to a healthy, resilient, stable, and sustainable system. Since the Archives still has numerous projects that need transcribing and reviewing, striving for the sustainability of the Transcription Center is a top priority for us and we hope that you feel the same way!
Check out the Transcription Center for yourself!
And if you want to know more about Baird’s Index, check out this interview with Smithsonian historian, Pam Henson.
- Accession 91-069 - Spencer Fullerton Baird Index of Correspondence, 1850s-1870s, Smithsonian Institution Archives
- Increasing Access: The Smithsonian Transcription Center, by Kristin Conlin, The Bigger Picture blog, Smithsonian Institution Archives
- Paper Painting: Using Acrylics to Repair Leather Bindings, by Breann Young, The Bigger Picture blog, Smithsonian Institution Archives
- Transcription Beyond Description: Engaging Opportunities and Weaving Webs of Knowledge, by Meghan Ferriter, The Bigger Picture blog, Smithsonian Institution Archives