As part of its five-year strategic plan, 2017–2022, the Smithsonian committed to reaching one billion people each year through a digital-first strategy. And one big part of that strategy includes making our collections available through digitization—the process of transforming analog material into digital form.
Our staff prioritizes digitization for access, often in answering reference requests for researchers, and for preservation. But a lot more work—far beyond scanning, storage, and upload—goes on behind the scenes to make even one digitized document ready for our web audiences.
As the Archives’ social media manager, I, myself, often find it tempting to knock on my colleagues' doors every time I’d like to get something digitized for a quick, one-off social media holiday post. Easy, right? Well, actually, not so much. I thought today’s blog post might be a good opportunity to give our experts a chance to vent about patiently explain to us just how much work goes into the digitization process.
I’m going to start off with one question that might send your eyes rolling back or cause steam to flow out of your ears, but why aren’t all of our collections digitized and online?
Kira: The simplest answer, before you even get to all of the technical reasons surrounding digitization capacity, etc., is because not all of our collections are unrestricted, and we don’t necessarily own the copyright for everything we have, so even if we digitized everything, it still wouldn’t all be online.
Jessica: The process of digitization is not simply “point-and-shoot.” Creating an image is just step one of a multi-step process. There is image processing, quality assurance, metadata creation (both embedded and in collections management software) and uploading to digital asset management systems that also occur before an image is posted online for the public to view. Not to mention that there are back-end procedures in place in order to keep everything organized and retrievable by Archives staff. The other reason? Digitization takes a lot of resources. It takes an incredible amount of money, staff resourcing, and specialized imaging equipment to have a streamlined digitization workflow. It is also, perhaps surprisingly, a very manual process - archival materials are too rare and fragile for automated digitization workflows, so each item must be carefully and methodically handled by digitization staff.
What are the key steps in the journey of a single reference request from staff or a researcher to a digital asset appearing on our website?
Marguerite: Aside from the research required to identify appropriate materials for a reference request, extra research is required to create the metadata for such requests. Unlike digitization projects where we systematically go through a large body of similar material, on-demand digitization requires on-demand cataloging and on-demand research. Creation and application of metadata is a crucial step that makes digitized assets discoverable and accessible.
Kira: Our digitization tracking form has eleven different statuses before something is considered complete. These statuses are: request submitted, staging, conservation (if necessary), digitizing, QA, prep for researcher (cropping, etc.), metadata (cataloging), DAMS, rights research, prep for online access, and complete.
Jessica: Ditto to what Kira said! For some extra clarification, “QA” means quality assurance, and the “prep for researcher” stage can also include making an access copy of the requested images, as the original images are often too large to send electronically, especially in cases where multiple files are requested. The “prep for online access” stage is where image files are uploaded to a variety of different web platforms (our website, BHL, etc.).
What is something you wish your colleagues or researchers knew about the work that goes into the digitization process?
Marguerite: Image capture, once decisions are made, is just one step. Managing the digital object requires ingest of digital assets and metadata into a gauntlet of systems that do not talk to each other. Also, cataloging is an especially time consuming process.
Kira: I wish people knew that the digitization process does not end once something is delivered to a requestor. That is often what people think because that’s the only part they care about, but all of the other steps happen after delivery and take a LOT more time to complete, and those steps have to be completed for every single image that was generated as part of the request. Currently, it’s the same staff that digitizes that material that has to do the rest of the steps, so we aren’t just spending our days waiting for something to scan.
Jessica: A lot of technical knowledge and decision-making goes into every single image we create. Lighting, positioning the material, color calibration, choosing the right image resolution and imaging equipment, are just a few of the factors to consider when making decisions to achieve the best possible reproduction.
In a dream world, we’d be able to drop everything and answer a single reference request, right? But let’s talk reality. About how long does it take to properly digitize and catalog a single photograph or document?
Jessica: This is an interesting question since we usually digitize batches of material for efficiency’s sake. A single archival folder of documents, with both the front and back sides imaged, can generate hundreds of files. The time needed to image a single document may also depend on the imaging equipment required. Imaging stationary-size paper on a flatbed scanner will not require the same setup time as capturing architectural drawings or bound volumes on an overhead imaging station. That being said, I have fulfilled reference requests of single images from beginning to end, and it can take anywhere from two hours to perhaps a day to image, edit, generate metadata, upload to our digital asset management system, and create access copies of the files for delivery to a researcher.
Marguerite: Depends on the single thing. A group portrait can take hours to catalog as we create index terms for each person, so that names are standardized with Library of Congress or local authorities.
Heidi: The worst answer for anything is “it depends,” but it really does depend. For something standard/normal/simple (whatever you want to call the “easy ones”), it still takes days to complete. There are systems in place to make sure everything is done properly. For example, a patron emails and wants [insert piece of film] scanned. Awesome. We have it, no preservation issues, no global pandemics, no rights issues, it’s something 100% we can deliver in a reference request, and we have the negative number. By following even the most basic steps it would take thirty hours. Realistically, it takes around five working days if no issues happen. This is the absolute best-case scenario for my initial answer of “it depends.” Now insert all the weird and wonky things that could happen. And with each kink, twist, turn, wave, up, down, and sideways…add time.
How hard is researching the rights and reproduction limits of a photograph or record?
Marguerite: It depends how much information is available about the “thing.” If we don’t know the creator of a thing, and it can’t definitively be credited to the Smithsonian, then we spend a bit of time researching to see if we can figure out who the creator is. If a folder or series of items that are scanned have multiple creators or creation dates, an exploration of rights has to be performed at the item level. To accurately determine copyright of an item, we have to be able to pinpoint the date and creator, and if that information isn’t readily available, we have to do extra research.
What’s important to know about adding metadata and properly storing files?
Jessica: Thankfully a lot of metadata can be generated as part of a batch process and uploaded to our collections management system in bulk. However, verification that the batch process worked correctly is still required, and when you multiply that by hundreds or thousands of image files, working with metadata comprises a large fraction of my job.
It’s also tempting when doing bulk upload of metadata to provide as much information as possible, but it’s important to strike a balance between being complete in your descriptive information and utilizing your time as wisely as possible. This may include making certain metadata fields required for each image but leaving others as optional since a user will still be able to locate an item during a search using the metadata that already exists.
For every image file that is created as part of a reference request or long-term digitization project, we use the same file naming convention for every digital asset we create. Each file contains a unique identification number that is searchable in our internal collections management system. Once all work on an image is complete (including metadata generation and placement in our digital asset management system), all duplicate or working files are deleted to save precious storage space on our network servers and prevent redundancy.
Marguerite: There are some materials that can be cataloged in bulk, and metadata can be repurposed from the finding aids. Some materials, specifically photographs, require greater care. Identifying people, places, events, or artifacts that appear in a photograph has tremendous research value. This information is not usually included in the finding aid.
Why is it important to carry out all of these steps of the digitization process?
Heidi: Without all the steps you just have a digital file. And that digital file has little to no descriptive/searchable information. So, by following all the steps of digitization that file now can be found in multiple ways. It can be managed. It also means we aren’t repeating work, over, and over, and over, and over again. By having the mentality that we do it right now, we up the life expectancy of not just the digital file but the physical thing too.
Marguerite: If all of our material was magically digitized tomorrow, it would be impossible to find anything without metadata. Standards-based digitization and metadata practices allow us to effectively manage digital objects in the same way we manage physical objects.
Jessica: To put it frankly, there isn’t much point in digitizing something only to discover it wasn’t done according to an organization’s best practices, and then have to re-do it. That’s why the quality assurance step is so crucial. Another reason to complete the seemingly administrative steps of adding metadata, catalog information, and uploading to a digital asset management system is to allow the image files to be discoverable in the near future and long-term. To not be able to search for or have descriptive information about an image file is essentially equivalent to loosing that image file. This could lead to the unnecessary repetition of digitization work.
- “The Archives’ Video Digitization Equipment Gets an Upgrade,” by Kira Sobers, The Bigger Picture, Smithsonian Institution Archives
- “Creating a Digital Smithsonian,” by Lynda Schmitz Fuhrig, The Bigger Picture, Smithsonian Institution Archives
- “Minute by Minute: Preparing Board of Regents Records for Digitization,” by William Bennett, The Bigger Picture, Smithsonian Institution Archives
- “Rapid Capture Digitization to Bring the Smithsonian’s Board of Regents Minutes Online,” by Ricc Ferrante, The Bigger Picture, Smithsonian Institution Archives
- “Workflow Dynamics of Digitization,” by Patrick Milhoan, The Bigger Picture, Smithsonian Institution Archives