Measuring our digital archives

Wouldn’t it be great if we had a standard unit of measurement for archival finding aids?  Surely there’s one already, right?  Well, before I answer that, let me back up a little bit…

A recent post by Michele Combs on the EAD Listserv has me thinking again about the large colleciton of EAD records that I work with on a daily basis.

Michele’s question was a seemingly simple one:

what percentage of your collections (that have “finding aids”) are encoded in EAD?

This, then, raised the question of how exactly do we define a finding aid, and also implied questions about whether all instances of “finding aids” should be encoded in EAD (my answer would be YES, if only for the format).  But that’s not the part that interested me during the discussion.

What interested me was when someone else in the list mentioned that though they had a certain percentage of their finding aids in EAD, they also had some finding aids that were extremely long (up to 1000 pages!), and that almost none of the collections that went over 100 pages were in EAD format.  This makes some sense, as it would take a lot of time to type that information into digital format (if it only exists on paper), and the OCR process/clean-up might take even longer.  That said, eventually these collections will have to be converted to EAD:  certainly their current length already suggests the importance of the collection!

But the introduction of “page count” is what really interested me, and gave me some good ideas.  Here’s what I mean:

“Page counts” are not a very good unit of measurement, since the format, font type, font size, margins, spacing, etc., can all affect what length you’ll end up with.  However, any finding aid that’s in a digital format (be that EAD or even MS Word) can easily be measured by the unit of character count (sans the EAD tags in regards to the XML format, though, of course).   This way, archives/archivists can do a quick and accurate count of the size of ALL of their finding aids.

What’s more, this measurement would then be accurate when compared to collections at other institutions (which would certainly not be the case if it were just based on page counts).

Of course, it’s important to note that I’m only talking about the “size” of the finding aid, and not the physical size of the collection.  However, once you have the “descriptive” size of the finding aid, you could then compare that information with the physical extent of the collection.

But why would you want to do that?  Well, for one, it could be a useful tool to visualize not only the size of our collections, but the lengths that we go toward describing them (and, in a lot of cases, the lengths that we still need to go, in regards to collections that may be physically large but nearly bereft of descriptive attention).

So, I’m thinking about starting to develop a simple toolset to do just that on our local collection (assuming I ever have the time) in hopes that it could then be extended to other archival institutions that are also using EAD.  Hopefully such a large-scale assesment would have some unintended effects as well, but at the very least I think that it could be an interesting way to pinpoint collections — or even areas of collections — that are in need of more processing to increase their visibility (and this, I’m thinking, could be an ideal step to take after the wave of  “more product, less processing” approaches in order to help archivists prioritize their time).


One response to “Measuring our digital archives

  1. Pingback: Measuring our digital drchives, part 2 « Mark Custer | defenestrated blog

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s