The Bigger Picture: Visual Archives and the Smithsonian
Five Tips for Designing Preservable Websites
Here at the Smithsonian Institution Archives, we take pride in preserving the Institution’s history, including its sizable web presence. While various offices at the Smithsonian create and back up the contents of their websites, the Archives also crawls each website using Heritrix, an open-source tool created by the Internet Archive, to capture content in an archival format. Our aim is to preserve the ABCs of digital objects: appearance, behavior, and content. We take care to tailor crawl configurations to each specific website to capture as much of its ABCs as possible while adhering to our collections policy. Sometimes, though, the structure of the site itself makes a perfect crawl difficult or impossible.
Based on our experience, and because the preservation process of a digital object starts at its creation, here are some suggestions for web developers that can help ensure that the websites they create and maintain will be easier to crawl, can remain accessible, and will be preserved.
1. Follow accessibility standards
Adhering to accessibility standards renders your site usable by everyone and accessible by more devices, including Heritrix and the Wayback Machine. Here are some useful resources:
- the W3C’s Web Accessibility Initiative (WAI)
- a Best Practices guide from the University of Illinois’s Center for Information Technology and Web Accessibility
- description of and standards for Section 508, a law that requires that federal agencies’ electronic and information technology to be accessible to people with disabilities
2. Avoid proprietary formats for important content or provide alternate versions
There’s no assurance that proprietary formats used in web design will stick around in the long run. If the software manufacturer retires the product or closes, it will be much harder in the future for archives and libraries to display the digital object, since they’ll need to obtain a copy of software that might be old, rare, or difficult to implement. Instead, stick to open standards like HTML and CSS. If you decide to use Flash, offer a text-only version, too, and strive to provide equal content and experience.
3. Maintain stable URLs and redirect when necessary
Avoid linkrot! Linkrot is the tendency of links on the internet to point to resources that are no longer available. Carefully plan and implement a URL design scheme with a policy of persistence. In our test crawls, we’ve come across websites with links that are as much as 40% broken. When updating a website, be sure to provide redirects for relocated documents. Your users will appreciate having continued access to the information. And in the same vein...
4. Design navigation carefully and include a sitemap
Our crawler is usually set to six “hops”, which means it will grab content six links away from a given seed URL. We won’t capture pages buried more than six levels deep. To help the crawler (and your readers) discover your entire website, provide a sitemap. For large collections of documents which may be listed over several or many pages, provide a “view all” link, too.
5. Allow browsing of collections, not just searching
Sometimes archived websites contain a lot of good content, but it’s not accessible through the archival interface because the search function doesn’t work offline. If your website contains a searchable collection of documents or images, make sure it’s also browsable, e.g. by arranging images by genre. This way, a crawler can at least capture content by categories — and current users can wander through the collections without having to know what they’re looking for.
Your users, present and future, will thank you for making your site more accessible and crawlable, and you’ll have the added bonus of being more discoverable to other crawlers like Google.
The ephemeral nature of the web is both a blessing and a curse. While it’s easy to produce and publish digital content, it’s just as easy to delete it or lose it. By designing with preservation in mind, you help web archivists safeguard your work for the future. It’s part of our cultural legacy!
Related post:
Archiving the Smithsonian’s presence on the Internet, By Lynda Schmitz Fuhrig, Electronic Records Archivist
Comments (3) – Leave a comment
Thanks for providing these tips.
The advice that web site owners and developers should "Avoid proprietary formats for important content or provide alternate versions" appears sensible. However we need to remember that open standards may fail to become widely accepted, leading to a lack of tools to render open standards.
Two years ago I wrote a position paper which described"An Opportunities and Risks Framework For Standards" which highlights risks that people may feel safe in using open standards and fail to assess risks that the open standards may become marginalised - as has been the case with W3C's SMIL standard.
I would also say that there will be times when proprietary formats will provide user benefits - the popularity of the MP3 audio format provides an illustration of this point, with the format being widely supported in consumer MP3 players.
Perhaps this point could be phrased:
2. Use mature, well-supported open standards which can deliver benefits to users of your web site today as well as minimising risks that proprietary formats and associated tools may become obsolete in the future.
Brian, thank you for your sensible input! You're quite right that user interface designers and digital content producers should keep the users in mind and utilize widely-accepted formats rather than little-used ones. Archivists must make decisions about the ingestion and preservation of these formats at their institutions, taking into account the risks associated with requiring proprietary software to read archived files saved in formats such as MP3, PDF, Flash, etc.
Note that formats and standards can come and go, though, despite once-widespread use, and so part of a preservation workflow may involve the migration or emulation of a given format (see http://www.lockss.org/lockss/How_It_Works#Format_Migration, for example).
Point 3. is crucial especially for older sites in regards to SEO. Sometimes those broken urls have inbound links that can help with SEO. I would recommend using a 301 redirect to a page on your website taking about a similar topic. It's also provides a better user experience in the long run. Great article about web design.
Leave a comment
Produced by the Smithsonian Institution Archives. For copyright questions, please see the Terms of Use.
About
Smithsonian on Flickr Commons
Topics/Tags
- See Here (507)
- American History (449)
- Science (358)
- Archive (233)
- Cities/Places (233)
- Exhibitions (196)
- Web/Tech (163)
- Photo History (154)
- Politics/Government (138)
- Behind the Scenes (135)
Blog Roll
Categories
- Collections in Focus (797)
- What Gets Saved (268)
- Behind the Scenes (181)
- Smithsonian History (92)
Monthly Archive
- May 2012 (14)
- April 2012 (27)
- March 2012 (28)
- February 2012 (27)
- January 2012 (26)
- December 2011 (31)
- November 2011 (28)
- October 2011 (35)
- September 2011 (31)
- August 2011 (35)
- July 2011 (41)
- June 2011 (43)
- May 2011 (33)
- April 2011 (40)
- March 2011 (43)
- February 2011 (35)
- January 2011 (36)
- December 2010 (42)
- November 2010 (40)
- October 2010 (44)
- September 2010 (37)
- August 2010 (39)
- July 2010 (38)
- June 2010 (37)
- May 2010 (42)
- April 2010 (44)
- March 2010 (47)
- February 2010 (40)
- January 2010 (39)
- December 2009 (43)
- November 2009 (34)
- October 2009 (11)
- September 2009 (11)
- August 2009 (12)
- July 2009 (14)
- June 2009 (10)
- May 2009 (12)
- April 2009 (14)
- March 2009 (10)
- January 2009 (1)



