Making Sense of Data That’s Linked and Open

 The author's father working with a Liquid Scintillation Spectrometer which apparently measured chem

If you are a regular reader, or someone who works for a museum, library, or archive, you intimately understand the difficulty in managing big collections. If you’re not in this world, you do understand how hard it is to manage family photographs, a collection of email love letters, or the folder tucked in the bottom of your closet with old college papers. When you multiply this by, oh say six thousand people (the current number of employees at the Smithsonian which was founded in 1846), you’ll get the level of complexity we’re dealing with at the Smithsonian Institution Archives.

Something I think about often (and also happen to be responsible for at the Archives) is making these big collections more accessible and engaging online. Because of that, I was happy to be accepted to attend the recent Linked Open Data in Libraries, Archives, and Museums Summit to get a better grasp on how we can share our information and resources not only on our websites, but with other cultural heritage institutions.

I had a vague notion of what Linked Open Data meant, mostly in that it relates to this thing called the semantic web, or the concept of having a web of linked data that can be easily accessed and processed by machines. Still, I was looking for something more tangible, so I went back to a notable TED talk (see bottom of this post) by the semantic web guru, Tim Berners-Lee. In 1989, Berners-Lee defined the basic building blocks for the web; HTML and URLs. He sent the simple and brilliant idea in a memo to his boss at the European Particle Physics Laboratory in Geneva, Switzerland, and was given the permission to work on it… which he did and here we are today.

Recognize this cloud? It's the

In his fascinating talk, he tells us how hard it was to get people to understand the concept of the worldwide web in the 1980's. In fact, his first demonstration — a page, with a hyperlink, linked to another page — was not exactly an attention-grabber. But as we increasingly go online to make phone calls, read books, and collaborate, we feel the power of his idea daily.

Now, Berners-Lee wants us to put not just our documents online, but also our data so it can be accessible, re-purposed, and understood. He explains it much better than me, but he starts to reveal the possibilities as he defines three rules for putting data on the web:

  1. Http names should refer to people, places, events, products, etc, and not just documents.
  2. When people access those http names, they should get important information back in a standardized format so it’s comprehensible and shareable.
  3. The data people get back should also define relationships to other people, events, places, and things with an http name.

Sounds as simple as his notion for URLs and HTML? We can only surmise how little we are understanding of his suggestions.

At LOD-LAM, we were trying to figure out what this all means for cultural heritage organizations. We do have tons of data which is often housed in several institutions. However, I found myself wondering what a large aggregation of data would do for our visitors. Sometimes too much is just too much.

There are some initial examples out there to wrap your head around. Check out the Civil War Data 150 project. There are several institutions that have records on the Civil War, but they seldom bring them together in one place. This project will aggregate data from various institutions and use that data to define a common language for things like battles, regiments, and officers, etc. Basically, it will take all the work these individual institutions have done and create a standard vocabulary. Then, the project members are going to enlist classrooms to help tag collection objects and records with these names, which will enable these different collections to play nicely together. And since the data is open, other people and organizations will be able to understand it, ingest it, and put it out in different formats;  maps, online publications, and in other ways we haven’t even imagined.

If you think about all the data available in the world, the possibilities are endless.  As Berners-Lee summarizes, you don’t even have to be a big player to contribute. It’s about people doing their bit to create a bigger resource (see Open Street Map for one example).

Produced by the Smithsonian Institution Archives. For copyright questions, please see the Terms of Use.