Smithsonian Institution Archives
  • Collections
  • Services
  • Smithsonian History
  • About
  • Education
  • Blog
  • Forums
  • Press
  • Audiences
  • Donate

The Bigger Picture: Visual Archives and the Smithsonian

Five Tips for Designing Preservable Websites

by Robin C. Davis, Intern on August 2, 2011

 A few of the Smithsonian’s many websites.

Here at the Smithsonian Institution Archives, we take pride in preserving the Institution’s history, including its sizable web presence. While various offices at the Smithsonian create and back up the contents of their websites, the Archives also crawls each website using Heritrix, an open-source tool created by the Internet Archive, to capture content in an archival format. Our aim is to preserve the ABCs of digital objects: appearance, behavior, and content. We take care to tailor crawl configurations to each specific website to capture as much of its ABCs as possible while adhering to our collections policy. Sometimes, though, the structure of the site itself makes a perfect crawl difficult or impossible.

Based on our experience, and because the preservation process of a digital object starts at its creation, here are some suggestions for web developers that can help ensure that the websites they create and maintain will be easier to crawl, can remain accessible, and will be preserved.

1. Follow accessibility standards
Adhering to accessibility standards renders your site usable by everyone and accessible by more devices, including Heritrix and the Wayback Machine. Here are some useful resources:

  • the W3C’s Web Accessibility Initiative (WAI)
  • a Best Practices guide from the University of Illinois’s Center for Information Technology and Web Accessibility
  • description of and standards for Section 508, a law that requires that federal agencies’ electronic and information technology to be accessible to people with disabilities

2. Avoid proprietary formats for important content or provide alternate versions
There’s no assurance that proprietary formats used in web design will stick around in the long run. If the software manufacturer retires the product or closes, it will be much harder in the future for archives and libraries to display the digital object, since they’ll need to obtain a copy of software that might be old, rare, or difficult to implement. Instead, stick to open standards like HTML and CSS. If you decide to use Flash, offer a text-only version, too, and strive to provide equal content and experience.

Great example of Flash and HTML versions of a site: “Native Words, Native Warriors” from the National Museum of the American Indian, in Flash (left) and HTML (right).

3. Maintain stable URLs and redirect when necessary
Avoid linkrot! Linkrot is the tendency of links on the internet to point to resources that are no longer available. Carefully plan and implement a URL design scheme with a policy of persistence. In our test crawls, we’ve come across websites with links that are as much as 40% broken. When updating a website, be sure to provide redirects for relocated documents. Your users will appreciate having continued access to the information. And in the same vein...

4. Design navigation carefully and include a sitemap
Our crawler is usually set to six “hops”, which means it will grab content six links away from a given seed URL. We won’t capture pages buried more than six levels deep. To help the crawler (and your readers) discover your entire website, provide a sitemap. For large collections of documents which may be listed over several or many pages, provide a “view all” link, too.

Great example of a sitemap: Smithsonian Education.

5. Allow browsing of collections, not just searching
Sometimes archived websites contain a lot of good content, but it’s not accessible through the archival interface because the search function doesn’t work offline. If your website contains a searchable collection of documents or images, make sure it’s also browsable, e.g. by arranging images by genre. This way, a crawler can at least capture content by categories — and current users can wander through the collections without having to know what they’re looking for.

Great example of categories for browsing: Smithsonian Collections Search Center.

Your users, present and future, will thank you for making your site more accessible and crawlable, and you’ll have the added bonus of being more discoverable to other crawlers like Google.

The ephemeral nature of the web is both a blessing and a curse. While it’s easy to produce and publish digital content, it’s just as easy to delete it or lose it. By designing with preservation in mind, you help web archivists safeguard your work for the future. It’s part of our cultural legacy!

Related post:
Archiving the Smithsonian’s presence on the Internet, By Lynda Schmitz Fuhrig, Electronic Records Archivist

 

 

Categories: What Gets Saved
Tags: Web/Tech, Digitization, Conservation
Comments: View 3 comments, or Give us yours!
All comments are moderated and subject to approval. Further information is available in The Bigger Picture’s Commenting Guidelines.

Comments (3) – Leave a comment

Brian Kelly

Thanks for providing these tips.

The advice that web site owners and developers should "Avoid proprietary formats for important content or provide alternate versions" appears sensible. However we need to remember that open standards may fail to become widely accepted, leading to a lack of tools to render open standards.

Two years ago I wrote a position paper which described"An Opportunities and Risks Framework For Standards" which highlights risks that people may feel safe in using open standards and fail to assess risks that the open standards may become marginalised - as has been the case with W3C's SMIL standard.

I would also say that there will be times when proprietary formats will provide user benefits - the popularity of the MP3 audio format provides an illustration of this point, with the format being widely supported in consumer MP3 players.

Perhaps this point could be phrased:

2. Use mature, well-supported open standards which can deliver benefits to users of your web site today as well as minimising risks that proprietary formats and associated tools may become obsolete in the future.

Brian Kelly February 8, 2012 at 4:49 am
  • reply
Robin Davis

Brian, thank you for your sensible input! You're quite right that user interface designers and digital content producers should keep the users in mind and utilize widely-accepted formats rather than little-used ones. Archivists must make decisions about the ingestion and preservation of these formats at their institutions, taking into account the risks associated with requiring proprietary software to read archived files saved in formats such as MP3, PDF, Flash, etc.

Note that formats and standards can come and go, though, despite once-widespread use, and so part of a preservation workflow may involve the migration or emulation of a given format (see http://www.lockss.org/lockss/How_It_Works#Format_Migration, for example).

Robin Davis February 15, 2012 at 6:31 pm
  • reply
Web Design Brisbane

Point 3. is crucial especially for older sites in regards to SEO. Sometimes those broken urls have inbound links that can help with SEO. I would recommend using a 301 redirect to a page on your website taking about a similar topic. It's also provides a better user experience in the long run. Great article about web design.

Web Design Brisbane April 9, 2012 at 7:04 pm
  • reply

Leave a comment

The content of this field is kept private and will not be shown publicly.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Lines and paragraphs break automatically.

More information about formatting options

CAPTCHA
This question is for testing whether you are a human visitor and to prevent automated spam submissions.
By submitting this form, you accept the Mollom privacy policy.

Produced by the Smithsonian Institution Archives. For copyright questions, please see the Terms of Use.

Stay in touch!

Facebook Twitter Flickr YouTube SlideShare
Join our eNewsletter

About

Connecting you to America’s past with a behind-the-scenes exploration of the Smithsonian’s history, treasures, and the challenges that Archives face preserving collections. More details...

Smithsonian on Flickr Commons

Topics/Tags

  • See Here (507)
  • American History (449)
  • Science (358)
  • Archive (233)
  • Cities/Places (233)
  • Exhibitions (196)
  • Web/Tech (163)
  • Photo History (154)
  • Politics/Government (138)
  • Behind the Scenes (135)

Blog Roll

All Smithsonian blogs
American Historical Association Blog
American Institute of Conservation Blog
Archives Next
Archives of American Art
Around the Mall
Field Book Project
Hanging Together
Library of Congress Blogs
National Archives (US) Blogs
National Museum of American History, O say can you see?
Smithsonian Collections Blog
Smithsonian Libraries
Teaching American History

Categories

  • Collections in Focus (797)
  • What Gets Saved (268)
  • Behind the Scenes (181)
  • Smithsonian History (92)

Recent Posts

  • Sneak Peek 5/16/2012
  • The Nation's Refrigerator
  • See Here: 5/14/2012
  • Link Love: 5/11/2012
  • See Here: 5/11/2012

Monthly Archive

  • May 2012 (14)
  • April 2012 (27)
  • March 2012 (28)
  • February 2012 (27)
  • January 2012 (26)
  • December 2011 (31)
  • November 2011 (28)
  • October 2011 (35)
  • September 2011 (31)
  • August 2011 (35)
  • July 2011 (41)
  • June 2011 (43)
  • May 2011 (33)
  • April 2011 (40)
  • March 2011 (43)
  • February 2011 (35)
  • January 2011 (36)
  • December 2010 (42)
  • November 2010 (40)
  • October 2010 (44)
  • September 2010 (37)
  • August 2010 (39)
  • July 2010 (38)
  • June 2010 (37)
  • May 2010 (42)
  • April 2010 (44)
  • March 2010 (47)
  • February 2010 (40)
  • January 2010 (39)
  • December 2009 (43)
  • November 2009 (34)
  • October 2009 (11)
  • September 2009 (11)
  • August 2009 (12)
  • July 2009 (14)
  • June 2009 (10)
  • May 2009 (12)
  • April 2009 (14)
  • March 2009 (10)
  • January 2009 (1)
Smithsonian Institution Archives
eNewsletter Facebook Twitter Flickr YouTube SlideShare
Smithsonian Institution
  • Privacy
  • Copyright
  • Contact