Recordkeeping Roundcasts Episode 2: Project ARCHANGEL and distributed ledger experimentation

In this episode of Recordkeeping Roundcasts, we pick up the conversation with TNA’s John Sheridan, turning to the experimentation that he and colleagues at allied institutions are doing in distributed ledger technologies.

For background, take a look at this recent post by Alex Green on the TNA blog about Project ARCHANGEL.


CF: It is a critical juncture, I think, for archives in all forms. Certainly, as you’ve articulated, the expectations for digital records, digital information, discoverability are exerting pressures on archives. I need to move on, as I could talk about this stuff all day. I need, tragically, to move on to the next subject, which is linked to this notion about reputation, in that another project that you’ve been engaged in recently has to do with trust, and this is an area that I’ve been very interested in, in that archives, I feel, for too long have relied on the either legislated or culturally allocated notion that they are trusted institutions. And in some cases, I think it’s allowed a certain laziness to creep in around things like demonstrating the work that goes into maintaining authentic reliable records.

There’s a sort of a veil is drawn across some of the processes, the mysteries that go on behind the walls of the archive, and that’s been to the detriment, I think, of archival institutions. And so, it’s terrific to see the National Archives, for example, being so open about what it’s doing. But in relation to this project, for the listeners of the podcast, this is a project called ARCHANGEL, which is breaking new ground by using blockchain technologies to allow verification of born digital records over any timespan, really. The data is permanently preserved using this technology, which relies on peer to peer distribution and consensus checking, and thus removes the need for a trusted third party. I think it does offer some very exciting opportunities for archives to move from this institutional trust by mandate to technologically enabled transparency of processes on record. So I wonder, could you just briefly tell us a little about the project, and I’m interested to know if you see it as having potentially wider application in terms of record-keeping solutions outside of the traditional archival sphere.

JS: So it’s a fascinating juncture. The arrival of blockchain technology, I think, is important for anyone who’s interested in record-keeping. This is a new record-keeping technology, fundamentally, and I think it’s The Economist back in 2015, described blockchain as the trust machine. So a big part of our motivation, and partly because we have a concern about just leaning on an institutional basis of trust, which works fine when your physical collection is effectively immutable by virtue of its size and by virtue of its physical properties, but your digital collection is fundamentally changeable, and in the case of born digital records that are closed for a significant period, we’re essentially asking people to trust that the record is the thing that we’re claiming it to be over quite long periods of time.

Now, we are absolutely not complacent about the institutional basis of trust. We think we need to augment that with new archival practice, and blockchain is a really interesting candidate technology for how we might go and do that. It relates quite closely, though, to the previous conversation we were having about context, because whereas blockchain may provide us with the technological means for assuring over time that a record hasn’t been altered, and essentially by computing a hash, so running an algorithm over a record and turning the record into a number, and you put that number into a blockchain, the blockchain gives you a date-time stamp for when you wrote that number onto this distributed ledger, and then we imagine multiple archives all participating in curating and sustaining this archival distributed ledger over time, and essentially guarding and guarantoring or guaranteeing each other’s collections. By writing these hashes to the ledger with the date-time stamp, it means that at a later date you can come back when you have the same object and you can re-compute the hash and you can verify, yes, this is the same thing that the archive said that it had at this previous point in time.

That isn’t only thing that you need to know about a record. You need to know who created it and when did they create it and what were they doing. So you’ve got this really interesting question of assuring the record and also trying to assure the context of the record, the archival context. There’s some interesting questions about how much of the information that you have about a record do you put on to the ledger? And the point of ARCHANGEL is to research all of that, to understand what information do you put on to a distributed ledger, from an archival practice point of view. If you have a digital archive based on the Open Archival Information System reference model, where in your process do you write these hashes onto a ledger? Where does it fit in the context of a digital preservation system? How long might you expect the hash to be valid for if you do this?

So are you talking about five years or 10 years or 20 years? What happens at the point where your hashing algorithm becomes provably breakable mathematically? How might you then move your ledger to a new form of encryption, a stronger form of hash? So all of these kinds of questions, what happens in the case where you’ve taken a content object like a video? And we are preserving quite a lot of video at the National Archives in the UK. And in order to preserve it, you have re-encoded it into another format that’s easier to preserve. Well, if you just hash based on the sequence of bytes, then you’re going to get a different hash, so are there ways in which we can create content hashes, as opposed to bitstream or bytestream hashes, to verify content objects through forms of migration?

So these are all really important research questions that we’re exploring through ARCHANGEL and it’s giving us the opportunity to try this stuff for real, roll our sleeves up, set up a distributed ledger. If anyone’s out there and wants to try working with a distributed ledger, just get in touch, because we can really quickly get you up and running as a node potentially, on our ARCHANGEL research network of nodes in our archival distributed ledger, so you can experiment with some of this for yourself. Asking and answering some of these questions in order to get a sense of whether this truly is a technology that we can reliably use as part of our digital preservation infrastructure and that others can reliably use that is adding that extra, I suppose, cryptographic assurance ultimately that the records that we have are not being altered over time and you can prove it, you can cryptographically prove it.

CF: Is it being used and experimented in in agencies in the UK government? And I guess, you know, could you see some applicability of some of the approaches that you’re developing in the sort of earlier engagement and intervention piece that is mentioned in the strategy around identifying, registering records that are out in agency land?

JS: So, that’s a great question. And the UK government is for a lot of reasons very interested in distributed ledger technology. In London we’re obviously at the heart of FinTech. So, there’s a huge amount of work that happens around developing all kinds of novel or new or interesting financial technologies. So, the government has funded quite a lot of research in a wide variety of contexts. It’s been great for the National Archives, because we’re joining a UK government community of practice around distributed ledger technology, where we can get some visibility on how other people in government are looking at distributed ledgers for all of the use cases you’ll be familiar with, so things like land, property transactions or intellectual property rights assurance, through to things that from my perspective I’m not altogether sure I quite understand how they work, around health data or all sorts of use cases that are being explored.

It gives us the opportunity to swim in that sea. I have to say, the archival use case, from the ones that I see, looks like one of the strongest use cases for blockchain going. I feel that we’re really onto something for archives, because the jeopardy’s real.

So, it’s a great opportunity for us to, as I say, roll our sleeves up, explore what this looks like, see what other people are doing, learn from their experiences. There’s also, I think, an opportunity for archives, and this is… If we think about the value that an archivally assured distributed ledger might have, then we have an unrivaled prospect of longevity collectively as institutions. And so you may imagine, actually, there being some incentive for archives by allowing other kinds of transaction onto an archivally assured distributed ledger. And that may be a way in which the archives are incentivized to curate and continue to sustain a distributed ledger.

So, we’re very interested in whether there are some new economic models for archives through distributed ledger technology. Again, from a research perspective, but asking some of those questions, ’cause someone has to pay. Someone has to do the work that keeps the distributed ledger alive. We know that typically archives are pretty poor as institutions. So, is there some opportunity to trade our relatively high level of trust as actors and our unrivaled prospect of longevity into some form of incentive mechanism through distributed ledger technology that means that we get some benefit ourselves in terms of this strong assurance, cryptographic assurance for the records we’re preserving on one side, but we also have something that is economically sustainable, because maybe there’s a route for other people to be able to benefit from or use or work with the distributed ledger that we’re standing up. And again, in ARCHANGEL that’s very much part and parcel of what we’re trying to research.

CF: I wonder, there may be ways, then, that by having such services that are economically viable and desired by government organizations that there would be opportunities for the archive to get visibility, if not registration, of high-value classes of records that perhaps otherwise wouldn’t come into the the transfer process.

JS: Absolutely.

CF: So that might be another benefit.

JS: That’s exactly right. And there’s a very interesting use case that, again, we’re interested in researching around using smart contracts for timed release, so which again… I’m not suggesting that record creators are yet in a place where they’re yet willing to go, “Yes, let’s… ”

CF: Yes, yes, let’s automate…

JS: “Let’s automate digital transfer using… Automate it right up front, smart contracts, timed release 20 years, there you go.”

CF: Off you go, yes.

JS: We could potentially use that in terms of ourselves, where we have records that do have, that we keep closed with a point in time where they will be made available. And so you might imagine the archive itself using smart contracts for that. And we’re interested in exploring more generally timed release and whether that’s something that even… Is that something that we could conceive to make work. And again, it’s research, so we can do the technical assessment. We’re interested in doing the business assessment and finding the sweet spot where we actually understand how this technology can be viably and practically deployed in our context, against the wider context where the level of trust that we imagine there being in digital archives in future is significantly less, because of the general skepticism that is rising and rising and rising in any form of digital evidence. And it’s against that backdrop of a rising skepticism in all forms of digital evidence, for very good reasons, and I suppose a sense that the archive finds itself in an arms race with the forces of fakery that we think about privileging our collection with new forms of trust and then just pushing the light around understanding, well, what are the opportunities here.

CF: Look, it’s a fascinating project, and I’ll be keeping a close eye on it, and I know that you’ve mentioned that it will be the subject of an iPRES paper later this year, so I’m sure that many of us in the community will be watching with great interest to see what happens next.

JS: The other thing to say, we’re very much committed to working in the open, and we’re very keen to collaborate with other institutions. So if anybody wants to try being part of an archival distributed ledger as part of their research and dipping their own toes in the water, then we are well placed to be able to make that pretty easy to do. So just get in touch and we’d be really glad to have that conversation.

About John Sheridan

John-Sheridan - CopyJohn Sheridan is the Digital Director at The National Archives, with overall responsibility for the organisation’s digital services and digital archiving capability. His role is to provide strategic direction, developing the people and capability needed for The National Archives to become a disruptive digital archive.

John’s academic background is in mathematics and information technology, with a degree in Mathematics and Computer Science from the University of Southampton and a Master’s Degree in Information Technology from the University of Liverpool.

Prior to his current role, John was the Head of Legislation Services at The National Archives where he led the team responsible for creating, as well overseeing the operation of the official Gazette. John recently led, as Principal Investigator, an Arts and Humanities Research Council funded project, ‘big data for law’, exploring the application of data analytics to the statute book, winning the Halsbury Legal Award for Innovation.

John has a strong interest in the web and data standards and is a former co-chair of the W3C e-Government Interest Group. He serves on the UK Government’s Data Leaders group and Open Standards Board which sets data standards for use across government. John was an early pioneer of open data and remains active in that community.


About Cassie Findlay

Digital archivist and recordkeeping professional, co-founder of the Recordkeeping Roundtable. @CassPF on Twitter.
