The Bigger Picture: Visual Archives and the Smithsonian
Posts tagged with: Web/Tech
- I see you - a new satellite image of the National Portrait Gallery portrait commission, One of Many, One, by Jorge Rodriguez-Gerada. [via AirSpace blog, NASM]
- Yes, you heard right, the Smithsonian is on its way to raising $1.5 Billion to support its museums, research centers, and programs. [via The Torch, SI]
- Getting toned, book style - Toning Japanese paper hinges for reattaching boards to leather bindings. [via Unbound blog, Smithsonian Libraries]
- Announced this week - The papers of Nobel laureate Toni Morrison will reside at Princeton University Library. [via InfoDocket]
- Not just go-go or punk - The new D.C. Vernacular Music Archive at George Washington University encompasses the variety of music found in our nation's capital. [via DCist]
- Challenge accepted - Flickr created a site to tell you if your picture has a park or a bird in it in response to a challenge laid out in the XKCD webcomic. [via PetaPixel]
- A new tool is coming from Rhizome that allows you preserve the dynamic content found on social media sites called Colloq. [via Bits blog, The New York Times]
- Walk in the steps of Jane Goodall, the Jane Goodall Institute and Google teamed up to bring their Street View Trekker cameras to Gombe National Park in Tanzania and allow you to explore and experience it. [via PetaPixel]
- A bold plan from the National Archives - Digitize their analog records, all 12 billion pages of them. [via AOTUS blog, NARA]
- An epic road trip - Collecting on the road with Jason Stieber, National Collector, Archives of American Art. [via Smithsonian Collections Blog]
- Now availble - DigDC, a new online archive of Washington D.C. history created by the D.C. Public Library’s Special Collections department. [via Washington Post]
- Documenting events as the are happening - A conversation with Howard Besser and the efforts of Activist Archivists in saving the records of the "Occupy" movement. [via The Signal: Digital Perservation, LOC]
- From the stacks - Exhibits writer-editor, David Romanowski, talks about his adventures in doing research in the National Air and Space Museum Archives' Technical Files for the Hawaii by Air exhibition. [via AirSpace blog, NASM]
- Some thoughts on archival appraisal in the age of distant reading and computational analysis of large sets of electronic records. [via The Signal: Digital Preservation, LOC]
- Gale/Library Journal 2014 Library of the Year - Edmonton Public Libary - presents this cool video timeline of their 101 year history. [via InfoDocket]
Last week, we celebrated two years of using Archive-It for documenting the Smithsonian Institution's web presence. Previously, we had been using an in-house software and hardware installation in order to crawl websites and had cobbled together various less-than-ideal methods for capturing social media. Our hope was that a subscription to Archive-It would allow us to capture our web presence in a more efficient manner as well as allow us to provide better access to our crawled web content.
So how are we doing?
The Smithsonian currently has a total of 349 distinct websites and blogs. In the last year, we've crawled 170 of them or approximately 49% of the total. Altogether, we've crawled 327 websites and blogs, about 94% of the total, since we began using Archive-It two years ago. In addition, a significant number have been crawled more than once. Of those that have yet to be crawled, the majority have underlying code that make them nearly impossible to crawl using the technology currently available to us.
By this point, we had hoped to be crawling our websites and blogs annually. Although we haven't reached that goal, we've certainly improved from approximately one-half of our websites in 2 ½ years prior to using Archive-It, to nearly all of our websites and blogs in less than two years with Archive-It. And there's the added bonus of most of our crawled content from the last two years being available online via our Smithsonian Institution Websites Collection on Archive-It.
We continue to take steps to improve our efficiency. One of our next steps will be to evaluate the websites we've already crawled to determine which ones do not need to be crawled again because they are no longer being updated. An example might be an online exhibition that was launched in its final format and was never intended to be modified. The fewer websites that need to be crawled, the more frequently we'll be able to capture those that do.
- Web Archiving Update, The Bigger Picture, Smithsonian Institution Archives
- Smithsonian Now Using Archive-It to Crawl Websites, The Bigger Picture, Smithsonian Institution Archives
- Connecting the Dots: Issues with Preserving Complex Websites, The Bigger Picture, Smithsonian Institution Archives
- I've read about trying out historic recipes, but historic deordorant recipes? [via O say can you see? blog, NMAH]
- The recognition of the importance and need for improvements to disaster preparedness and art conservation and historic preservation got a boost after the 1966 flood of the Arno River in Florence, Italy. [via Pushing the Envelope blog, NPM]
- Collaborations towards tools to access and preserve email. [via The Signal: Digital Preservation, LOC]
- Have a little one? Here is some great advice for getting kids to really explore museum exhibitions. [via O say can you see? blog, NMAH]
- Photographic inspiration - 7-year old gets deep in the mud to get the shot at a cyclocross race in Colorado. [via PetaPixel]
- From Europenana - #OpenCollections - highlights of some of the most interesting and high quality collections from around Europe. [via Euopeana blog]
- A look at what it means when one inherits a collection, particularly one which may have significant monetary value. [via The New York Times]
- Is Pluto a planet? A discussion and vote on the definition of a "planet." [via Smithsonian Science]
For the past few months I've been walking around the office telling my coworkers my latest project was upgrading our site's search. In actuality the word "upgrading” wasn't really the best indication of what I was doing, "extreme overhaul" would have been a better fit.
The project really got kicked off in June when we started looking at the results from our website survey. Yes, we do read those! So if you happen to have one popup while your browsing our site, please fill it out and let us know how we're doing!
The thing that immediately stuck out when reviewing the surveys was that people were not overly happy with our site searching capabilities.
Our old site search was actually three separate searches that had been linked together with tabs to make them appear as one. This was done because some of our site content is actually stored in a separate database which has it's own way of searching.
If one had typed "Wetmore" into the search box at the upper right hand corner of the page, they would have been presented with results of mostly blog posts and few pages from our Smithsonian history content. To find actual collection items related to Wetmore, one would have to click on one of the tabs (either collections or finding aids depending on what they were looking for) and load yet another page with the search results.
The process of searching was clunky, limited, and not terribly helpful for researchers. In fact we had three respondents say they would just use Google to search for pages on our site instead of using our actual site search. We had known the search wasn't very good, but the survey results opened our eyes to how much higher a priority fixing it had to be.
Our site search now relies on one search, which is powered by a Google Search Appliance, with a contributed Drupal module and custom module providing the wiring to hook the site up to the Search Appliance.
Now performing a search for "Wetmore" will provide you with not just blog posts and pages, but also anything else on our site related to your search (collection guides, images, chronologies, legal documents, etc.) To paraphrase J. R. R Tolkien, we now have "one search to find them."
The Google Search Appliance also indexes metadata for us. Not only does the metadata get factored into our searches, which provides more accurate searching, but we can also filter off of the metadata. The result is the ability for our site visitors to now filter based on subject, creator, and date ranges.
Certain keywords are also flagged to provide suggested search results. During the process of rewriting our search, we looked at our top site search queries. We paired each of those searches up with pages that provide general information on the subject of the search. Using the same example I used before of a search for "Wetmore" (who was the sixth Smithsonian Secretary.), it will provide the user with a light grey box containing a link to Alexander Wetmore's biography along with a brief excerpt from that page.
But let's say you just did a search for an item that you know is in the Smithsonian's collection, but it turns out it isn't in the Archives. What then? Are you doomed to search all of the other Smithsonian Units until you find that one collection item your looking for? Not at all! On most of our site search results there will be a link to the Smithsonian Collection Search Center (it's located in the left column under the date range filter). If our search doesn't have what you’re looking for, there's a good chance it exists somewhere in the Smithsonian.
- Smithsonian Institution Archives Moves to Drupal 7, The Bigger Picture blog, Smithsonian Institution Archives
- Search results for "Wetmore" on Smithsonian Institution Archives website
- Smithsonian Collections Search Center
- 1 of 56
- next ›