Report on ‘Hacking the archives: Archival description in an online world’

How do we ensure that “meaning as well as content lies at the end of the road to discovery?”

On Wednesday August 24 we got together to talk about the question of how we can ensure that “meaning as well as content lies at the end of the road to discovery?” Our speakers were Chris Hurley, Tim Sherratt and Richard Lehane.

Recordkeepers strive to contextualise, authenticate and preserve evidence. They create detailed descriptive tools and infrastructures as means to describe and manage records and to facilitate their access and use through time.

But are recordkeepers losing the battle to translate the meaning and value of this skill to an online world? As Chris Hurley asks, in the rapidly expanding information universe, are carefully contextualised archival collections at risk of ‘becoming just another quarry for digitised content, often indistinguishable, depending on how it has been googled, from other information resources available on the net’?

L-R: Kate Cumming (MC), Tim Sherratt, Richard Lehane, Chris Hurley

It was a brilliant evening, with three unique but complementary perspectives from our speakers. As we usually do, we recorded the speakers for a podcast but *very annoyingly* the recording device ran out of memory 10 minutes in – something we only discovered when it was too late. Sorry, all, & we are looking at upgrading our equipment.

We do, however, have a copy of Chris’s presentation: Hurley Description in an online world (PDF, 260 KB), and you can check out Tim’s work here: http://discontents.com.au/. State Records NSW’s Open Data project and API, which Richard spoke about, are here: http://data.records.nsw.gov.au

I also live tweeted the evening’s proceedings as @RkRoundtable, here’s the Tweetstream:

3:57:12	Looking forward to seeing @wragge @richardlehane and Chris Hurley speaking about better online #archives access, context & use tonight!
6:43:44	Hi! We’ll be live tweeting #Hacking the #archives: Archival description in an online world from 5.30pm Sydney time http://t.co/nElNagH
7:39:20	Getting ready for #Hacking the #archives… standby
7:40:44	Introductions from @kateandthegirlz – our speakers are @wragge @richardlehane and Chris Hurley – an impressive line up
7:42:44	. @kateandthegirlz giving a plug for our series system workshop in October.
7:45:53	First speaker Chris Hurley “Things have to change in order to stay the same” (The Leopard). What do we need to change to honour our mission
7:46:39	Hurley: What happens in an archive? Reading or searching? The beauty of the rummage
7:47:17	Hurley: The doer, the document and the deed – the 3 entities around which all our descriptive practices are built #archives
7:48:22	Hurley explaining the scaleability of our descriptive practices
7:50:03	Hurley: Structuration “the interrelation of parts in an organised whole” Even if users don’t care about it (which they don’t) is what we do
7:50:42	Hurley: Structuration gives meaning, authentication and authority. “What’s your source for that? #archives
7:51:04	Hurley: Archivists manage meaning through time. #archives
7:52:21	Hurley: Our search tools encompass our context – we think it’s “good for the users” to go through this pathway (cf – it’s not) #archives
7:52:54	Hurley: The custodial model has to change. Banish the little truck! #archives
7:55:39	Hurley: We archivists must come to terms with digital / digitised records and data reuse. Redefine our services & efforts #archives
7:56:35	Hurley: You can’t go online without significantly improving your description. Reading room standards won’t do #archives
7:57:16	Hurley: Mergers of #archives with libraries and other memory institutions is happening. We need to acknowledge.
7:58:17	Hurley: We need a collective approach so we can look at records from various archival institutions in one place, with one search #archives
8:00:11	Hurley: The internet removes the power of the information provider to control the narrative. Users can now fashion their own approaches
8:01:37	Hurley: The Australian Info Commissioner wants better records management for better use/reuse PSI. That’s our stuff, postcustodial model
8:02:10	Hurley: The UK National Archives are the lead agency on government data use / reuse. Not 30 years from now, but now #archives
8:03:23	Hurley: Responsible agencies can digitise records more quickly and at no expense to the #archives – let them do it!
8:04:41	Hurley: Gateways (Digests) – contextual information that sit above the archival documentation program of the archive. A super context.
8:05:41	Hurley: We should be reference sites. Our data is reusable and can become authority for others to use for their work #archives
8:06:47	Hurley: All of these things are not about getting our hands on ‘stuff’ – is about adding value in a postcustodial / online world #archives
8:07:45	Hurley: An archivist’s gotta do what an archivist’s gotta do. Get over it. #archives
8:09:01	Hurley: Work with recordkeeping. Get involved in describing records at creation (CF: It’s the continuum, stupid!)
8:10:58	Gosh he’s good #Hurley #legend #archives
8:11:40	Now it’s @richardlehane – apologising to the pocast listeners because most of his talk will be demo based
8:13:45	. @richardlehane the API is a thin layer that sits above our archival control systems & offers a new way into our data #archives
8:15:14	. @richardlehane is against names for search interfaces. It’s just a search, why do you need to know the name? Is it so special? #archives
8:16:57	.@richardlehane taking us through using Archives Investigator. On starting, you are presented with lots of info – still no search! #archives
8:18:37	Now @richardlehane shows us search results using typical #archives search tools require you to understand archival descriptive models
8:20:00	. @richardlehane – if we focus on the deed, the doer and the document for search and present results that way, is much better for users
8:21:45	. @richardlehane – users should be presented with options to refine their search – eg dates, series (archival model conveyed subliminally)
8:23:12	. @richardlehane – we have also worked to make our URLs as precise as possible #archives
8:24:19	Relationships! (DRINK) we want them up front to allow fossicking by our users #archives
8:25:57	Commenting and tagging allows users to add information to our catalogue (but to a separate dbase, archivists, don’t freak out) #archives
8:26:29	By the way, this famous api is here: api.records.nsw.gov.au #archives (Have a play!)
8:27:18	And this is the open data website: data.records.nsw.gov.au #archives
8:29:20	. @richardlehane is now showing his Mashing the Ministries timeline using Ministry entity data + Wikipedia + Trove data #archives
8:30:16	It’s one thing to release data but another to allow ppl to easily use it to connect with other stuff – this is what api does @richardlehane
8:31:58	.@richardlehane You can get archival data in various formats for diff uses eg XML, JSON – interoperability for agencies, users #archives
8:33:11	The api supports EAC CPF standard for agency descriptions and protocols like open #archives ingest – so ppl like Trove can use our stuff
8:33:39	This all makes federated search much easier (yay) #archives
8:35:36	@kateandthegirlz doing a little recap before introducing @wragge – he’s into abundance and how to grapple with it
8:38:02	.@wragge starts by talking about a project he’s doing, ‘Invisible Australians’ – about the records of admin of the White Australia Policy
8:38:49	The National Archives’ holdings of the White Australia records en masse are moving, disturbing and compelling. #archives
8:39:57	.@wragge – So how can you extract data from this body of (paper) records and connect it up? To track what occurred to these ppl #archives
8:41:59	.@wragge thought that this could be a crowdsourced project – transcribing the White Aust forms & linking with other records #archives
8:43:10	.@wragge Other records of relevance would be the guidelines used to administer the dictation test etc. #archives
8:44:11	.@wragge notes that to do this project he doesn’t need the participation of the National #Archives. He can haz the data via his hacks
8:45:13	The divisions between the users and the control of the info / discovery are breaking down because of the evolution of the web, linked data
8:46:18	In fact, @wragge notes, the National Archives could pull descriptive info from his project rather than the other way around #archives
8:48:59	There are various tools that allow researchers to manage data / research eg Zotero and share it – maybe via groups #archives
8:50:56	People in a group in Zotero are in effect creating their own finding aids – ways into #archives based on their interests
8:51:44	.@wragge – What about a ‘gems and strays’ group in Zotero where you could drop interesting / unusual finds? (CF: awesome) #archives
8:53:02	.@wragge We should harness the research that people are doing and reuse, share – social systems for sharing research #archives
8:54:37	Mining footnotes from JSTOR – heaps of nice structured data there including #archives refs. Feed them back to enhance archival desription
8:55:32	Research into text mining, text analysis has been going on for some time. Tools that can pull out dates, names. This is not pie in the sky
8:57:40	Aim shd be to create infinite number of pathways through collective memory – opening our culture for exploration and greater participation
8:58:34	This is now entirely achievable says @wragge – via the semantic web #archives
8:59:34	.@wragge was at lod-lam.net summit in San Franciso earlier this year (lucky Tim)
9:01:37	Linked data is about enabling powerful relationships via sharing vocabularies Eg use FOAF to overcome family name / surname #archives
9:03:24	.@wragge Or use the same identifier from an authority source to indicate a place, person etc. #Archives could be an authority source
9:04:14	Persistent URLs (DRINK!) #archives
9:05:35	.@wragge would like to see #archives exhibitions have persistent identifiers, as well as core archives data
9:06:34	Hello. Let’s provide #archives data in machine readable form please. No more HTML please. – @wragge
9:07:14	He also makes a plea for sensible licensing regimes #archives
9:08:32	And let’s create a culture amongst cultural institutions where you get access and then decide on what you want to play with #archives
9:09:25	Phew. Three AMAZING speakers. Thank you!!!! Questions now #archives
9:14:12	Question about Freebase, DBPedia – all purpose identifiers #archives
9:15:23	Google linked data cloud shows you different linked open data options
9:17:09	The stuff that #archives people are very familiar with sit at the core of exploiting linked data – the non tech ppl and techs need to collab
9:18:21	Wait, there’s an authority source for ‘is part of’?? #relationships #sexy
9:19:48	.@apicot with a question about whether terms like Function Activity Transaction are meaningful in linked data #archives
9:20:40	.@wrgge – you can define relationships between vocabularies if you want. No need to define it all, is socially contingent #archives
9:21:17	.@wragge – we need more archivists getting into the modelling #archives
9:22:30	Chris Hurley noting that businesses will (unknowingly) adopt archival practices – eg 3 entity model (doer, deed, document) #archives
9:23:40	Question from Barbara Reed: there are very good metadata sets out there. Why did @richardlehane not use ISO23081? #archives
9:24:42	.@richardlehane – we basically replicated exactly what was in the catalogue already. B Reed – write a canonical XML schema! #archives
9:27:04	@richardlehane saying govt #archives can offer stable long term URLs that we know will last – permanence, while agencies come and go
9:29:40	.@apicot not impressed with terminology / taxonomies used by ppl like the Bureau of Statistics on census- we have better models! #archives
9:30:31	Postcustodialism (Drink!!!) #archives
9:31:49	Straying into #opengov and the GIPA Act in NSW and that most govt agencies haven’t put their docs online in a useful way #archives
9:32:26	Barbara: we MUST have scaleability and linking of recordkeeping systems and archival systems #archives
9:33:19	Plug for the series system workshop ..winding up. Fingers tired. Yay. Awesomeness, thanks all #archives
9:36:17	One last plug – by @wragge for thatcampcanberra.org
23:21:25	Hey, we doubled our followers overnight! (33! Wo0t!) Welcome all 🙂