Category Archives: EAD

I want my, I want my EAD

Controversial lyrics (from the song referenced in this post’s title) by Mark Knopfler aside, I have a few questions regarding the state of EAD in the state of N.C.

  1. How widespread is the adoption of EAD?
  2. Who’s not using the standard who would like to?
  3. Are there any plans afoot to create a regional EAD consortia (such as OAC, NWDA, RIAMCO, etc.)?  Please say yes; and secondly, wouldn’t it be great if such a consortia had an xforms powered admin interface for users to create and update records online? (I’m thinking of the “EADitor” Orbean Xforms project that Ethan Gruber has recently started to work on, regarding this last point).
  4. Is anyone carrying on the torch of NCBHIO and creating EAC records, now that that standard has arrived? (and since that standard essentially demands a consortia, in my mind, how is EAC affecting question #3?)

Unfortunately, I cannot even begin to answer questions 2 – 4 on my own, but I will attempt to quickly provide a start of an answer to my first question.  To that end, I went to the Society of North Carolina Archivists’ website, clicked on their links page, and then did a quick look through the “SNCA Affiliated Repositories” section.

And so, here’s an unannotated list of direct links to (mostly EAD) finding aids at North Carolina institutions (and, if anyone knows of a much better list, please let me know!): (bowdoin was listed on the SNCA Member Links page, but it’s not a college in N.C.) (added to the list on 2009-12-09)

Also, if you’re using EAD in N.C. and don’t see your website listed here, please let me know.


LITA Camp presentation on EAD

And here’s a much shorter post about LITA Camp so that I can post my presentation, post-hoc sytle.

I had arrived in Dublin ready to talk about the EAD redesign project that I’m currently involved with at ECU.  However, there wasn’t anyone in attendance that worked exclusively in Special Collections or Archives, so I opted to attend a breakout session on Institutional Repositories rather than to host my own on EAD.

After the conference, I figured that I’d just post a link to my powerpoint presentation.  However, the powerpoint that I prepared was pretty useless without my notes attached, so I then decided that I should record a shortened version for a screencast.  And here’s the result:

And, in order to be a good creative commons citizen — since I skipped my last powerpoint slide in the screencast — here’s a list of the images that I used:

If you have any comments or questions, just let me know.

Hello, Columbus

Not sure what to expect yet, but I’ve just recently arrived in Columbus, Ohio. Here’s the reason why:

Lita Camp 2009

I plan to do a presentation regarding my attempts/plans to integrate our EAD records with our Digital Repository. I was hoping to have a functioning beta by now (aside from just a few web page examples and a PPT presentation), but a lot of other work has come up that hasn’t permitted that to happen. Nevertheless, I still hope to launch everything in July.

And, after this weekend, I’ll go ahead and post my presentation and a detailed conference report. I’m looking forward to it…

Subjective Access

“Subjective access may not guarantee that I’m right about the character of the state I’m conscious of myself as being in, but on this view it does ensure that I’m the one who’s in the state if anybody is.”

— David M. Rosenthal, from Consciousness and mind (oclc: 61200643, page 355)

In EAD, we file subjects under a tag known as <controlaccess>, which is short for “controlled access headings.” And, since we’re using these for access (just as they were so used in the card catalogs of old, and occasionally still current), they should certainly be hyperlinks, right?

So, in what state are the subjects of our finding aids actually in, and in what state should they be?

In our case, at ECU, we’re in the process of updating all of our old subjects into Library of Congress Subject Headings, like this:

Sinbad (dog) + more

Sinbad (dog) + more

This isn’t an easy process, but it does mean that, once done, all of our finding aids will “play” nicely with all of the objects in our Library Catalog as well as all of the objects in our Digital Repository. But, for the time being, everything that’s listed in our “controlled access headings” is listed as plain text (without even, at this time, the ability to restrict a keyword search to those fields).

After the update, however, not only will we feature more advanced search options on an advanced search page, but we’ll also turn all of our subjects into hyperlinks. But this begs the question:

to what should we link?

At first, this was “obvious” to me, but after now having looked around at other institutions, it seems that there are a few different solutions, which I’ll list below:

(1) Link nowhere (The option that’s most often employed, and the one that we’ll be moving away from)

(2) Link to the rest of the finding aid database (subject to subject)

(3) Link to the rest of the finding aid database (subject to keywords)

(4) Link to the library catalog, which would include the rest of the finding aid database (subject to subject)

Right now, I’m currently leaning toward a fifth option, at least for the time being (which is just a combination of options 2 and 4):

(5) Link to finding aid database (subject to subject), and then on the page of search results, also include a link that will extend that same subject search to our library catalog and also to our digital repository.

I’d love to hear other options and what other people think may be the best route (though, in my opinion, that may largely come down to the size of your collections and also to the extent that they’ve been cataloged in some normalized fashion).

My final thought about all of this is how to extend it beyond our own collections, into larger EAD databases like ArchiveGrid. Wouldn’t it be useful to a researcher if a subject in a local finding aid could be extended to repositories worldwide? In this case, though, it would definitely be easier to follow Columbia’s example so as to not get involved with messy crosswalks and the like.

Gangling Container Lists

Linotype operator

— or, on faking a neologism

What’s a “gangling container list”*, you might reasonably wonder?  Well, I’m using the term “GCL” to refer to a “container list” (or inventory) in a finding aid that is particularly hard to encode/potentially confusing to the user/online viewer.  The main GCL at ECU belongs to the Manuscript Collection numbered 741.  Let me explain, in a less cryptic fashion:

Right now, the only collection that we have that’s both heavily described and digitized is our Daily Reflector Negative Collection.

Though the encoding for this collection isn’t divided into thematic series (it’s arranged chronologically instead), it is arranged/subdivided by:

  1. Box
  2. Folder
  3. Sleeve
  4. Item (when digitized).

Here’s an example of our EAD encoding for that, where the compenent level in the EAD corresponds to the ordered-list numbers above:

Snippet of the EAD container list for the Daily Reflector Negative Collection

Snippet of the EAD container list for the Daily Reflector Negative Collection

If you’re familiar with EAD, you might look at this and have a lot of questions/criticisms.  However, I don’t want to focus on how this finding aid is encoded (as it’s not typical for our collections, and it isn’t ideal yet), but instead what I want to focus on is its physical arrangement, its display, and how we’re going to connect it to the portions that are digitized.

Until now, we’ve only been linking digitized objects in our finding aids at the item level (in this case, that’d be the <c04> tag).  However, we have a received an LSTA grant for this collection that will shortly result in the digitization and description of over 7000 images.   And, in preparation of this grant, the container list (or, GCL) has grown from a relatively short list, that contained information about its 45 boxes, to an exceptionally long list, which now contains information for over 13000 described sleeves.

Presently, the online finding aid has every box, folder, and sleeve listed on just one page of output.  It also includes just over 100 images that were digitized prior to the grant for testing purposes.  But, if the finding aid were to include all of the images, this would result in over 20000 lines being added just to the container list!

So, we have two dilemmas:

  1. How to deal with this “one page display”
  2. How to deal with so many items (which will only increase after the grant).

As for problem number 1, we’re going to continue with our one page display option for the time being (though we may eventually employ other types of interfaces) in order to keep our search processes as simple as possible.  This could/should be an entire blog post on its own, however, so I’ll save that for another time.

That leaves problem number two. One potential solution, though not yet employed, will adhere to the following principles:

  • Encode everything (all +7000 items, and add new items as they’re requested for digitization)
  • Do not provide item level links in the finding aid (at least in the initial display) if the collection has too many items (rather than setting an arbitrary item number limit, however, this decision will be made at the collection level and might only include this particular collection, due to the next reason)
  • When possible, only scan and catalog at the lowest level of granularity already described in the finding aid (this means that when future items are requested by a patron for digitization, we might scan all of the other items in that folder at the same time, and only describe the “digital object” at the same level as is described in the finding aid).  See this object for a pilot example (but note that the display is not finished and that it hasn’t yet been cataloged).
  • Create a new stylesheet that can differentiate between providing links at the box, folder, sleeve, and item levels when necessary.
  • Create a new template that helps to address issue number 1 until that issue can be more thoroughly examined.

For this finding aid, then, the stylesheet will only output links at the “folder” and “sleeve” levels.  The individual items will only be accessible from these two levels (of folder and/or sleeve).  In some cases, then, the sleeve link will take you to a display with only one item and in others it will take you to a display with multiple items (it just depends on how many of the negatives were selected from that particular sleeve).  Each of these “sleeves” has a description that includes the total number of physical negatives included, though, so it should hopefully be somewhat clear to the user whether the sleeve is partially or fully digitized.

Check back next week for a mock-up of a newly improved Daily Reflector finding aid (this ambitious deadline, I’m hoping, will give me some incentive to finally write that stylesheet).   The mock-up won’t look like the final format, however, as there will still be some work that needs to be done to more fully integrate our digital repository with our finding aid database, but it should present a pretty clear idea.

In the meantime, please leave comments, suggestions, or even examples of your own GCLs.  I’ve certainly seem some instances of innovative displays for extremely large collections, but what I’m more interested in seeing is a display method for such a collection that also fits in with the overall delivery and “search” of the rest (that is, not just a finding aid that’s like an online exhibit, but a mutable sort of finding aid that integrates well with every EAD at that institution).

*Though the phrase is abbreviated as GCL, the recommended pronunciation is actually “Gackl”**, which is utilized instead of “G.C.L.” in order to better emphasize the electronic awkwardness of its referent.
**That’s not an e-typographical error. The preferred spelling is “Gackl” rather than something like “Gackle” for two important reasons:

  1. Obviously, to mock all things Web 2.0
  2. So as to not confuse the term with (nor raise awareness of) Gackle, North Dakota.

Measuring our digital archives, part 2

Well, I’ve seen a few attempts at visualizing archival holdings in the past, but this one by Mitchell Whitelaw at the University of Canberra is somewhat similar to what I was discussing in my previous post:

Click on image to see the blogpost "Packing Them In"

This is working with a very LARGE dataset, which includes some 57k series from the National Archives of Australia [ see Mitchell Whitelaw’s post for more information: ].  The visualization, in this case, is highlighting the ratio between the size of a series (in linear meters) with the number of registered items that belong to that series (the emptier the square is, the smaller amount of registered items).

So, my idea is certainly not all that original (and I certainly didn’t think that it would be).  But, this find still encourages me to pursue my particular path, since the one thing unique about my idea is the usage of the length of the EAD itself to be used for comparitive purposes (though this may not prove ideal, I think it’s still worth checking into).  It would also be nice to see how this and other similar processes compare when used on the same collections…  but, before that can happen, the proper toolsets will need to be developed.

Measuring our digital archives

Wouldn’t it be great if we had a standard unit of measurement for archival finding aids?  Surely there’s one already, right?  Well, before I answer that, let me back up a little bit…

A recent post by Michele Combs on the EAD Listserv has me thinking again about the large colleciton of EAD records that I work with on a daily basis.

Michele’s question was a seemingly simple one:

what percentage of your collections (that have “finding aids”) are encoded in EAD?

This, then, raised the question of how exactly do we define a finding aid, and also implied questions about whether all instances of “finding aids” should be encoded in EAD (my answer would be YES, if only for the format).  But that’s not the part that interested me during the discussion.

What interested me was when someone else in the list mentioned that though they had a certain percentage of their finding aids in EAD, they also had some finding aids that were extremely long (up to 1000 pages!), and that almost none of the collections that went over 100 pages were in EAD format.  This makes some sense, as it would take a lot of time to type that information into digital format (if it only exists on paper), and the OCR process/clean-up might take even longer.  That said, eventually these collections will have to be converted to EAD:  certainly their current length already suggests the importance of the collection!

But the introduction of “page count” is what really interested me, and gave me some good ideas.  Here’s what I mean:

“Page counts” are not a very good unit of measurement, since the format, font type, font size, margins, spacing, etc., can all affect what length you’ll end up with.  However, any finding aid that’s in a digital format (be that EAD or even MS Word) can easily be measured by the unit of character count (sans the EAD tags in regards to the XML format, though, of course).   This way, archives/archivists can do a quick and accurate count of the size of ALL of their finding aids.

What’s more, this measurement would then be accurate when compared to collections at other institutions (which would certainly not be the case if it were just based on page counts).

Of course, it’s important to note that I’m only talking about the “size” of the finding aid, and not the physical size of the collection.  However, once you have the “descriptive” size of the finding aid, you could then compare that information with the physical extent of the collection.

But why would you want to do that?  Well, for one, it could be a useful tool to visualize not only the size of our collections, but the lengths that we go toward describing them (and, in a lot of cases, the lengths that we still need to go, in regards to collections that may be physically large but nearly bereft of descriptive attention).

So, I’m thinking about starting to develop a simple toolset to do just that on our local collection (assuming I ever have the time) in hopes that it could then be extended to other archival institutions that are also using EAD.  Hopefully such a large-scale assesment would have some unintended effects as well, but at the very least I think that it could be an interesting way to pinpoint collections — or even areas of collections — that are in need of more processing to increase their visibility (and this, I’m thinking, could be an ideal step to take after the wave of  “more product, less processing” approaches in order to help archivists prioritize their time).