Detailed timed observations on an aurora viewed from 9 p.m. August 3th to 2 a.m., August 9, 1872, in Holt County, Missouri, on an Aurora Borealis, 1872, Smithsonian Institution Archives, Record Unit 60, Image no. SIA2010-0731.

The Increase and Diffusion of Data

Research has been at the core of Smithsonian’s mission from the beginning, and sharing that research—through activities like publishing papers and data—is still key to fulfilling that mission for the “increase and diffusion of knowledge.”

Though many are familiar with the Smithonian’s museums and collections, research— particularly scientific research—has been a key activity of the Institution from its very beginning. 

The Smithsonian’s first Secretary, Joseph Henry, was a well-respected scientist who championed the idea that the Institution should be a center of research. To support the mission for the “increase and diffusion of knowledge” he promoted the publication of scientific research and participation in the international exchange of publications. Throughout the early years of the Institution, staff participated in programs and expeditions, collecting specimens and observational data—the raw material needed to further research. In 1849, Secretary Henry himself started the Smithsonian Meteorological Project.

Observations taken from 9 p.m. August 3th to 2 a.m. August 9, 1872 in Holt County, Missouri, of an a

As part of this project, the Institution aggregated data collected by a network of volunteers from across the United States—many of whom we would now call “citizen scientists.” Their observations, such as those found in this letter, above, detailing the appearance of an aurora in Holt County, Missouri, were aggregated with others and then formally published and distributed widely through newspapers and peer-reviewed publications. 

Section about "Auroras" written in German.

Historically, the primary way to disseminate scientific information has been through the published (and later peer-reviewed) article, informally called a paper. The underlying data for a paper—such as the individual observations in the Meteorological Project letters—or those recorded in field books, were never published on their own. The original “raw” data may have been copied and shared informally, but only the analyzed and synthesized conclusions from those observations would be made available through the published paper. Only recently with the advent of mass digitization, and through projects like the Smithsonian Transcription Center, can scholars take advantage of the important historical data locked away in old field books.

Modern scientific communication practice has begun to fully acknowledge the separate usefulness, and sometimes the necessity, of having access to the raw data that are the basis for papers. Scientists still go into the field to take observations and make measurements manually in notebooks and in spreadsheets on laptops, but now digital data that is easier to share and reuse can come directly from sources like environmental sensors, radio collars, and drone video. Sharing these data can enable other researchers—sometimes in completely different fields— to build on the research and potentially produce new and innovative science. Providing access to the data can also ensure scientific rigor by enabling other resesearchers to verify and reproduce the results of experiments and analyses found in papers.

A grid background, covered in green, yellow, pink, blue, purple, and pink dots.

For many years Smithsonian researchers have been sharing their data in a variety of ways. Sometimes they’re published on department or museum websites, like these datasets from the Smithsonian Migratory Bird Center. Some funding agencies require data to be shared in repositories that specialize in particular kinds of data, like genomic data stored in the National Center for Biotechnology Information’s Sequence Read Archive (NCBI-SRA). The Smithsonian’s Office of Research Computing also provides centralized support for use of general-purpose repositories like Smithsonian’s Figshare for Institutions. This general-purpose repository enables Smithsonian researchers to provide basic access—including viewing, downloading and citing via digital object identifier (DOI) to a wide variety of research data including video, tabular data, and images related to subjects like ecology, animal behavior, geology, bioinformatics, and environmental monitoring.  

Just like the amazing collections held and cared for at the Smithsonian, research data are invaluable Institutional assets. The Office of Research Computing is committed to working with researchers, librarians, and archivists at the Smithsonian to improve long-term access to these and other data collected at the Institution to further our mission.

Related Resources


Produced by the Smithsonian Institution Archives. For copyright questions, please see the Terms of Use.