Innovation

How Archival Research Morphs in the Digital Age

August 21, 2019 3250
The Media Wall at The Lewis and Ruth Sherman Centre for Digital Scholarship in the Mills Memorial Library at McMaster University, (Photo: Ron Scheffler /McMaster University)

Our society’s historical record is undergoing a dramatic transformation.

Think of all the information that you create today that will be part of the record for tomorrow. More than half of the world’s population is online and may be doing at least some of the following: communicating by email, sharing thoughts on Twitter or social media or publishing on the web.

Governments and institutions are no different. The American National Archives and Records Administration, responsible for American official records, “will no longer take records in paper form after December 31, 2022.

The Conversation logo
This article by Ian Milligan originally appeared at The Conversation, a Social Science Space partner site, under the title “Historians’ archival research looks quite different in the digital age.

In Canada, under Library and Archives Canada’s Digital by 2017 plan, records are now preserved in the format that they were created in: that means a Word document or email will be part of our historical record as a digital object.

Traditionally, exploring archives meant largely physically collecting, searching and reviewing paper records. Today, and into the future, consulting archival documents increasingly means reading them on a screen.

This brings with it opportunity — imagine being able to search for keywords across millions of documents, leading to radically faster search times — but also challenge, as the number of electronic documents increases exponentially.

As I’ve argued in my recent book History in the Age of Abundance, digitized sources present extraordinary opportunities as well as daunting challenges for historians. Universities will need to incorporate new approaches to how they train historians, either through historical programs or newly-emerging interdisciplinary programs in the digital humanities.

The ever-growing scale and scope of digital records suggests technical challenges: historians need new skills to plumb these for meaning, trends, voices and other currents, to piece together an understanding of what happened in the past.

There are also ethical challenges, which, although not new in the field of history, now bear particular contemporary attention and scrutiny.

Historians have long relied on librarians and archivists to bring order to information. Part of their work has involved ethical choices about what to preserve, curate, catalogue and display and how to do so. Today, many digital sources are now at our fingertips — albeit in raw, often uncatalogued, format. Historians are entering uncharted territory.

Digital abundance

Traditionally, as the late, great American historian Roy Rosenzweig of George Mason University argued, historians operated in a scarcity-based economy: we wished we had more information about the past. Today, hundreds of billions of websites preserved at the Internet Archive alone is more archival information than scholars have ever had access to. People who never before would have been included in archives are part of these collections.

Take web archiving, for example, which is the preservation of websites for future use. Since 2005, Library and Archives Canada’s web archiving program has collected over 36 terabytes of information with over 800 million items.

Even historians who study the middle ages or the 19th centuries are being affected by this dramatic transformation. They’re now frequently consulting records that began life as traditional parchment or paper, but were subsequently digitized.

Historians’ digital literacy

Our research team at the University of Waterloo and York University, collaborating on the Archives Unleashed Project, uses sources like the GeoCities.com web archive. This is a collection of websites published by users between 1994 and 2009. We have some 186 million web pages to use, created by seven million users.

Our traditional approaches for examining historical sources simply won’t work on the scale of hundreds of millions of documents created by one website alone. We can’t read page by page nor can we simply count keywords or outsource our intellectual labour to a search engine like Google.

As historians examining these archives, we need a fundamental understanding of how records were produced, preserved and accessed. Such questions and modes of analysis are continuous with historians’ traditional training: Why were these records created? Who created or preserved them? And, what wasn’t preserved?

Second, historians who confront such voluminous data need to develop more contemporary skills to process it. Such skills can range from knowing how to take images of documents and make them searchable using Optical Character Recognition, to the ability to not only count how often given terms appear, but also what contexts they appear in and how concepts begin to appear alongside other concepts.

You might be interested in finding the “Johnson” in “Boris Johnson,” but not the “Johnson & Johnson Company.” Just searching for “Johnson” is going to get a lot of misleading results: keyword searching won’t get you there. Yet emergent research in the field of natural language processing might!

Historians need to develop basic algorithmic and data fluency. They don’t need to be programmers, but they do need to think about how code and data operates, how digital objects are stored and created and humans’ role at all stages.

Deep fake vs. history

As historical work is increasingly defined by digital records, historians can contribute to critical conversations around the role of algorithms and truth in the digital age. While both tech companies and some scholars have advanced the idea that technology and the internet will strengthen democratic participation, historical research can help uncover the impact of socio-economic power throughout communications and media history. Historians can also help amateurs parse the sea of historical information and sources now on the Web.

One of the defining skills of a historian is an understanding of historical context. Historians instinctively read documents, whether they are newspaper columns, government reports or tweets, and contextualise them in terms of not only who wrote them, but their environment, culture and time period.

As societies lose their physical paper trails and increasingly rely on digital information, historians, and their grasp of context, will become more important than ever.

As deepfakes — products of artificial intelligence that can alter images or video clipsincrease in popularity online, both our media environment and our historical record will increasingly be full of misinformation.

Western societies’ traditional archives — such as those held by Library and Archives Canada or the National Archives and Records Administration — contain (and have always contained) misinformation, misrepresentation and biased worldviews, among other flaws.

Historians are specialists in critically reading documents and then seeking to confirm them. They synthesise their findings with a broad array of additional sources and voices. Historians tie together big pictures and findings, which helps us understand today’s world.

The work of a historian might look a lot different in the 21st century — exploring databases, parsing data — but the application of their fundamental skills of seeking context and accumulating knowledge will serve both society and them well in the digital age.

Ian Milligan is an associate professor in the Department of History at the University of Waterloo and is the principal investigator of the Web Archives for Historical Research group. His primary research focus is on how historians can use web archives. He teaches courses in historical methodology, postwar Canada, and digital history, and supervise graduate students in diverse areas including postwar Canadian history, video games, and childhood studies. In 2016, Milligan was awarded the Canadian Society for Digital Humanities Outstanding Early Career Award and holds an Ontario Early Researcher Award.

View all posts by Ian Milligan

Related Articles

When Do You Need to Trust a GenAI’s Input to Your Innovation Process?
Business and Management INK
December 13, 2024

When Do You Need to Trust a GenAI’s Input to Your Innovation Process?

Read Now
The Authors of ‘Artificial Intelligence and Work’ on Future Risk
Innovation
December 4, 2024

The Authors of ‘Artificial Intelligence and Work’ on Future Risk

Read Now
Beware! AI Can Lie.
Innovation
December 3, 2024

Beware! AI Can Lie.

Read Now
From the University to the Edu-Factory: Understanding the Crisis of Higher Education
Industry
November 25, 2024

From the University to the Edu-Factory: Understanding the Crisis of Higher Education

Read Now
Canada’s Storytellers Challenge Seeks Compelling Narratives About Student Research

Canada’s Storytellers Challenge Seeks Compelling Narratives About Student Research

“We are, as a species, addicted to story,” says English professor Jonathan Gottschall in his book, The Storytelling Animal. “Even when the […]

Read Now
Deciphering the Mystery of the Working-Class Voter: A View From Britain

Deciphering the Mystery of the Working-Class Voter: A View From Britain

How is class defined these these days – asking specifically about Britain here but the question certainly resonates globally – and when […]

Read Now
Our Open-Source Tool Allows AI-Assisted Qualitative Research at Scale

Our Open-Source Tool Allows AI-Assisted Qualitative Research at Scale

The interactional skill of large language models enables them to carry out qualitative research interviews at speed and scale. Demonstrating the ability of these new techniques in a range of qualitative enquiries, Friedrich Geiecke and Xavier Jaravel, present a new open source platform to support this new form of qualitative research.

Read Now
0 0 votes
Article Rating
Subscribe
Notify of
guest

This site uses Akismet to reduce spam. Learn how your comment data is processed.

0 Comments
Newest
Oldest Most Voted
Inline Feedbacks
View all comments