Category Archives: Libraries

Zombies vs. Libraries

Zombies vs. Libraries

The website for the 7th annual Joyner Library Paraprofessional conference is now live:

http://events.lib.ecu.edu/paraprof/2010/

Our theme this year is Zombies vs. Libraries: Integrating Pop-culture with Library Tradition. Thanks to the sponsorship of the North Carolina Library Paraprofessional Association (and the help of Gloria Bradshaw, Tracie Hampton, and Christopher Turner from HR), this will be the first year that we will be offering two preconference workshops!

We hope to see you in Greenville this May!

Advertisements

Symphony’s problem with operators

I’d be quite alarmed if this hasn’t already been reported since SirsiDynix’s Symphony OPAC has been out in the wild for quite some time, but here’s an annoying “bug” that I just discovered today.

Search for near in any out-of-the-box Symphony OPAC and you’ll get yourself an error.  Now try with, adj, same, and even the Boolean operators or, not, & and (I’ll ignore “xor“, since I can’t think of any examples when I’d ever type that).

[digression: If you try to search for but (for, by, etc.), however, it will tell your that your search contains all stopwords.  And,  I’ll try to forgo the argument about whether or not a library catalog should remove so-called stopwords in this day and age, but suffice it to say that a user can’t find any albums by “The The” without moving beyond the default search form; and, in my opinion, a user should never have to do such a thing for such a simple search. ]

So, yes, the Symphony OPAC seems to have a problem with operators, but it’s certainly not likely that someone will search for adj.  If someone does, however, whether by accident or not, they shouldn’t be greeted with an error.  Instead, their query should be run as is, but on the results page there should also be a new <div>, placed unobtrusively, that informs the user that “adj” is also a proximity operator, and this is how to use it, should they want/need.

What’s worse, though, is that you cannot use near, with, same, or, not, and at the beginning OR end of any of your queries (exception:  you can use not at the beginning of your query without getting an error since that operator doesn’t require a first half of an argument, but it will still treat not as an operator).  And this, in my mind, is the real bug here.  You cannot, then, search for:

  • Near a thousand tables
  • Near eastern archaeology
  • The singularity is near
  • With wings like eagles
  • Same differences
  • Same river twice (but you’re fine if you include the stopword “the” at the beginning, since “same” will no longer be the first word in the query)
  • I love you just the same
  • Or else my lady keeps the key
  • Ready or not
  • Not philosophy (won’t search for the query “not philosophy” but will instead search for any record that doesn’t contain the keyword “philosophy”…  so, you won’t get an error, but you’ll get a LOT of results).
  • And then there were none
  • And the band played on
  • And you get the idea…

Of course, you can move beyond the default search values and use any of those proximity operators in conjunction with the “browse” (or “begins with”) radio button, but that should NOT be a requirement for using a select few query terms.  Or, worse, you could work around this bug, for now, by altering your search to something like this:

“and” then there were none

or even

the and then there were none

but that’s a pretty silly solution, as well.  In any event, I have no idea if this bug has been reported or not, but I am quite certain that it would be a very easy fix for SirsiDynix to implement, so I hope that they do so soon — that is, if they don’t already have a patch for this in the works.

Anyhow, if you want to try this out, of if you’re really ambitious and think that you can find any other bugs worth reporting, here’s a list of libraries using Symphony that I’ve compiled:

http://www.twine.com/twine/12vl6vpd0-19f/libraries-using-symphony-elibrary-as-an-opac

Unfortunately, it’s hard to dertermine a static link to Symphony OPACs, so most of those links will take you to a timed-out session.  Once there, though, you can get back to the main search page usually just by clicking on “OK”, and then starting a new search.

[ update: I just checked a Sirsi Unicorn library catlog, and it also seems to have this same issue on default, keyword searches.  So apparently this is a carryover from that legacy system (we were previously on Dynix’s Horizon, which did not have the same issue by default; at least not that I’m aware of).  So, in hindsight, I guess this is a Unicorn bug, which makes me certain that it’s already been reported, but I really wonder why it exists.  Indexing a default query in this manner seems very strange to me.  Certainly they could just require their operators to be followed by a special character, such as “#”, or even just  not treat any boolean or proximity operators as operators when they appear at the beginning or end of a query.]

GNU Wget NCBHIO

If that doesn’t qualify as a strange blog-post title, I don’t know what does.

Anyhow, I just wanted to say how nice it was to visit the Carolina Digital Library and Archives department (CDLA) at the Louis Round Wilson Library in UNC-Chapel Hill.

Louis Round Wilson Library

Louis Round Wilson Library

Natasha Smith, the head of the Digital Publishing group, invited our entire Digital Collections department to visit them on the morning of May 20th. While there, we were able to demonstrate and discuss with them the behind-the-scenes processes of our Digital Repository, and we were also able to hear and see a lot of interesting projects that CDLA is working on. To mention just one, for brevity’s sake, I’ll say that I was very excited to see a superbly designed online template for a finding aid to a collection that has now been digitized in its entirety. So, definitely keep an eye out for the new Thomas E. Watson Papers finding aid once it is unveiled.

During our discussions, we also got on the topic of EAC (or Encoded Archival Context), which is a standard that I’ve definitely wanted to see developed more fully after first hearing about it just one year ago. It was nice, too, that Richard Szary was in the room, since he was one of the original working group members of the EAC standard while at Yale (though I wasn’t aware of that at the time).

In my mind, EAC would be a perfect candidate to be deployed with something like Metaweb’s Freebase.  Sure, we could still export valid XML files for the preservation of the information (as the standard should still adhere to the original Toronto Tenets), but I think it would go a long way to have this information available on an easily editable and transferable platform.  I’d love, for instance, to be able to pull biographical and relational information about people mentioned in our finding aids via something like their Metaweb Query Language, and also to dynamically generate a list of “related collections” available elsewhere, at other institutions.

Anyhow, Maggie Dickson, the Watson-Brown Project Librarian, mentioned the NCBHIO project, which is something that I had never heard of before.  Here’s a link to their website, NCBHIO, which isn’t entirely functioning anymore, but it is the only website devoted exclusively to a collection of EAC records of which I’m currently aware, at least.  If there are more out there, though, I’d love to hear about them.  I have heard of a few European institutions already incorporating EAD and EAC, but I’m definitely not aware of anything else like this.

In any event, in order to learn more about the standard, I went ahead and used Wget to download all 59 EAC records that are still currently hosted on the NCBHIO website (hence the strange title for this post).   Hopefully I’ll have some time this summer to study those files some more and perhaps even create a few EAC records myself.

Until then, if anyone else is working with EAC, or anything like the EATS project (here’s a blog post about that), which was developed by the New Zealand Electronic Text Centre, I’d love to hear more about it.

Measuring our digital archives

Wouldn’t it be great if we had a standard unit of measurement for archival finding aids?  Surely there’s one already, right?  Well, before I answer that, let me back up a little bit…

A recent post by Michele Combs on the EAD Listserv has me thinking again about the large colleciton of EAD records that I work with on a daily basis.

Michele’s question was a seemingly simple one:

what percentage of your collections (that have “finding aids”) are encoded in EAD?

This, then, raised the question of how exactly do we define a finding aid, and also implied questions about whether all instances of “finding aids” should be encoded in EAD (my answer would be YES, if only for the format).  But that’s not the part that interested me during the discussion.

What interested me was when someone else in the list mentioned that though they had a certain percentage of their finding aids in EAD, they also had some finding aids that were extremely long (up to 1000 pages!), and that almost none of the collections that went over 100 pages were in EAD format.  This makes some sense, as it would take a lot of time to type that information into digital format (if it only exists on paper), and the OCR process/clean-up might take even longer.  That said, eventually these collections will have to be converted to EAD:  certainly their current length already suggests the importance of the collection!

But the introduction of “page count” is what really interested me, and gave me some good ideas.  Here’s what I mean:

“Page counts” are not a very good unit of measurement, since the format, font type, font size, margins, spacing, etc., can all affect what length you’ll end up with.  However, any finding aid that’s in a digital format (be that EAD or even MS Word) can easily be measured by the unit of character count (sans the EAD tags in regards to the XML format, though, of course).   This way, archives/archivists can do a quick and accurate count of the size of ALL of their finding aids.

What’s more, this measurement would then be accurate when compared to collections at other institutions (which would certainly not be the case if it were just based on page counts).

Of course, it’s important to note that I’m only talking about the “size” of the finding aid, and not the physical size of the collection.  However, once you have the “descriptive” size of the finding aid, you could then compare that information with the physical extent of the collection.

But why would you want to do that?  Well, for one, it could be a useful tool to visualize not only the size of our collections, but the lengths that we go toward describing them (and, in a lot of cases, the lengths that we still need to go, in regards to collections that may be physically large but nearly bereft of descriptive attention).

So, I’m thinking about starting to develop a simple toolset to do just that on our local collection (assuming I ever have the time) in hopes that it could then be extended to other archival institutions that are also using EAD.  Hopefully such a large-scale assesment would have some unintended effects as well, but at the very least I think that it could be an interesting way to pinpoint collections — or even areas of collections — that are in need of more processing to increase their visibility (and this, I’m thinking, could be an ideal step to take after the wave of  “more product, less processing” approaches in order to help archivists prioritize their time).

Computers in Libraries 2008

In order to do a bit of sight-seeing before the conference-proper, I drove up to our nation’s capital on Saturday, April 5th. One of my favorite things about Washington D.C. is the opportunity to while away an afternoon in air-conditioned museum after air-conditioned museum, completely free of charge. And so, I made Sunday no exception, frequenting a few favorite locales and also attending the National Portrait Gallery for my very first time.

By the time I made it to the Crystal City Hyatt Convention Center on Monday morning, I quickly realized that this conference was going to be even more crowded than the museums that I visited the day before.

By Wednesday evening, I had attended 14 sessions and 2 opening keynote sessions. I will list all of those sessions at the end of this report, so if anyone has any specific questions or would like me to send them my notes, please let me know. For the remainder of this report, however, I will briefly discuss 3 of those sessions.

  1. Mobile Search with Megan Fox and Gary Price:
    The two very knowledgeable presenters went through a veritable PowerPoint compendium of all things mobile search. To see just some of what I mean (including, even, a list of academic libraries that provide websites for optimized mobile access), definitely check out this online directory provided by Megan Fox:http://web.simmons.edu/~fox/pda/.

    This was probably one of my favorite presentations due to the fact that I couldn’t help but hear about new things (and some old developments, but new to me, like 2d barcodes) that got me thinking about how libraries might change in the future. Sadly, this wasn’t the case for all of the attendees, since one in particular asked the presenters what any of this had to do with libraries. Granted, the answer provided (which, poorly paraphrased by me, went something like: “It is all about making use of new ways to direct information to your users”), did not provide any new examples to the valid question. However, there are a lot of libraries already making use of mobile technologies by allowing users to text bib records to their phone and even the ability to browse the entire catalog from a mobile device (NCSU’s MobiLIB).

  2. Information Commons with Barbara Tierney:
    If you attend a conference in the humanities you’ll hear presenters read from their thoughtful papers rather than just conduct a presentation with the aid of PowerPoint. It was quite interesting, I must admit, to see how many people started to leave this presentation after the presenter began reading from printed paper, and after they continually looked to the blank screen over and over again in hopes of seeing something, anything projected. Nevertheless, I was happy to sit and listen to her detail the history and the concept of the “information commons” in libraries. It was a topic I knew little about and, after all, she promised to show some pictures at the end of her talk, so of course I stayed to take notes and to take a break from the familiar style of PowerPoint.

    If you’re interested, we have two copies of her book, Transforming Library Service Through Information Commons, here at Joyner Library.

  3. Open Source Software for Superior Solutions, focusing on the Smithsonian Presentation:
    And here it is:

    http://siris-collections.si.edu/search/

    I could say so much about this resource, but I think that it speaks much more eloquently for itself. So definitely try it out. It sets the bar for all library catalogs and future refinements to come. It’s proof-of-concept for what quality, yet slightly superficial metadata can get you… and new discoveries it can help you make. If anyone would like to talk to me about this resource, please let me know. It does a lot of things well – even some things not so noticeably — but it also points to a lot of future refinements that can/will be made in the future. And best of all, it was created by library employees (from Erik Hatcher at UVA to the full team at the Smithsonian which have adapted his and other open source software for the result that you see here). Very exciting stuff, in my opinion.

    All in all, I have to say that the conference occurred at a very strange, almost ominous, post-April-fool’s time. First of all, the Library of Congress was essentially closed off the entire time that I was there (see this press release for more info). To bookend that unfortunate closure, Stephen Colbert’s portrait had been moved from the National Portrait Gallery just days before I made my inaugural, admission-free visit; and it also just so happened that D.C.’s newest museum, the Newseum, wouldn’t be opening until Friday, April 12th. That last coincidence, though, was probably for the best, since that particular museum’s 2002 closure in Virginia and recent reopening in D.C. at a 450 million dollar facility is certainly a controversial issue, even if one is to ignore the $20 admission fee. Being on a budget, I probably would’ve opted out of visiting it even if it had opened before my arrival – heck, I even decided not to pay the meager $6 to run around the Butterfly Pavilion at the National Museum of Natural History.

    Despite missing out on those four extracurricular events, though, the conference itself was a very good experience. For one thing, I learned that it’s best not to attend conference sessions on topics with which you are already quite familiar – it’s unlikely that you’ll learn much new. That said, if nothing of particular interest is scheduled at the same time, it certainly doesn’t hurt to attend such a session because you can nevertheless use it as a great networking opportunity. And finally, just as discovering an interesting book by wandering the stacks in a library is far more rewarding than completing a known-item search, the most rewarding aspects of the conference occurred unexpectedly, meeting colleagues in between sessions and learning about exciting new projects that have in turn reinvigorated my own work on projects back here at ECU.

Sessions Attended:

  • –“Hi Tech and Hi Touch” with Jenny Levine, the Shifted Librarian
  • –“Mobile Search” with Megan Fox & Gary Price (discussed above)
  • –“Library Web Presence: Engaging the Audience” panel, Penn State and Temple
  • –“Widgets, Tools, & Doodads for Library Webmasters” with Darlene Fichter and Frank Cervone
  • –“Wikis: Managing, Marketing, and Making them Work” with Chad Boeninger
  • –“Mashups for Non-Techies” with Jody Fagan (about Yahoo! Pipes)
  • –“Drupal and Libraries” with Ellyssa Kroski
  • –“The Library Sandbox” with Barbara Tierney (discussed above)
  • –“Harnessing New Data Visualization Tools” with Darlene Fichter
  • –“Catalog Effectiveness: Google Analytics and OPAC 2.0” panel, Ohio State and College of New Jersey
  • –“Learning from Video Games” with Chad Boeninger
  • –“One Click Ahead: Best of Resource Shelf” with Gary Price
  • –“Google Tracking” with Greg Notess
  • –“Open Source Solutions” panel, Smithsonian and Howard County Public Library (discussed above)

“Audio Preservation in the Digital Age” — a conference report

As a new member of the Digital Collections department, I was very excited to attend this conference in order to learn more about digital preservation. Having also been interested in the “history of sound recording” ever since my first exposure to the website www.tinfoil.com, I had also hoped that this event would impart new perspectives from experienced professionals in order to guide me to learn more about audio engineering and its importance for cultural stewardship. Luckily, the presentations (especially those by George Blood and Mike Casey) did not disappoint; rather, they supplied many recommended resources to pursue long after the conference was over, and much inspiration to share with my colleagues at Joyner as well as with our profession in general.

In a concerted effort to keep this report somewhat short, though, I will restrict my comments to the following format:

[1-4] Name of the presenter, name of their segment

a) One hyperlink (generally to that presenter’s current “project”)

b) One thing that I learned during their talk or thought that they expressed exceptionally well.

Following those 4 sections, I will close with a few hypothetical musings about how the ideas gained at this conference might also be used here at ECU.

To begin, then, in the order that the presenters spoke on November 2nd:

[1] Sam Brylawski, Audio Preservation Basics Part I

a) Rather than just provide a link to where Sam currently works (the Special Collections dept. at UC Santa Barbara) here’s a link to a “podcast” that he did for NPR about “Music Insiders” picking old analog recordings that “should be issued on CD”. You’ll need RealPlayer, or another method to play real audio files, though:

http://www.npr.org/ramfiles/wesun/20021215.wesun.brylawski.ram

b) Sam relayed quite a number of sayings that doubled as good archival advice, but the one that I’ll mention here is as follows: “Preservation should begin even before you acquire a collection”. With this maxim in mind, you should remind potential donors that:

1. If the rights are transferred to the archival institution along with the materials then the archive will have a better chance of getting grant money in the future, as projects with digital exhibits are easier to “sell” (and this, it should be reminded, will provide more attention to their donated materials).

2. Remind the donors that you cannot give them the full market value for their wonderful collection since you will also have to spend money to sustain their materials in the proper way. This should be narrated as an advantage, however, as the longevity of a collection will hopefully outweigh any slight, immediate monetary gain.

[2] George Blood, Audio Preservation Basics Part II

a) A link to George’s company that provides archival audio services:

http://www.safesoundarchive.com/

b) A book can be physically examined somewhat quickly. Not as quickly, perhaps, as the robot in the movie “Short Circuit” can read an entire encyclopedia, but it can still be given a rather quick visual examination to determine if it needs any immediate preservation work. The same can be done for most audio formats too (to see if there is mold on wax cylinders, warped records, tangled tape, etc). And yet, this ease of examination doesn’t automatically transfer to the digital domain of binary digits. Fortunately, there is an easy solution thanks to the science of cryptography.

Every digital file can be given an MD5 checksum (or any other cryptographic hash function) in order to check the integrity of that file. So, whenever a TIFF, WAV, or any other digital file is created, an algorithm can be used to process the variable-length file into a fixed-length of 128 bits (which will be its MD5 checksum). Periodically – and especially both before & after a file is moved – you should regenerate the 128 bits. If they match, you’re good; when they don’t, you know you have a corrupted file that needs further preservation attention. And so, some sort of hash function should be stored in the administrative metadata of every digital file.

[3] Chad Hunter, Small Scale Audio Preservation & Digitization Projects

a) A link to Appalshop, a small non-profit center in Kentucky that is dedicated to preserving the local arts.

http://www.appalshop.org/

b) Do not be afraid to make mistakes in your projects and do not be afraid to share those mistakes. Granted, even Brylawski mentioned, among other things, that the Library of Congress has improvised with storage materials in the past, but it was Hunter that gave the most detailed information about a few different projects conducted at Appalshop and the lessons that they’ve learned and are continuing to learn.

One lesson: make sure you have a clear timetable and communication points set up with a vendor if outsourcing. You certainly don’t want your materials being mailed back to you without any warning and without any discussion about the archival process and the state of those materials.

[4] Mike Casey, Large Scale Audio Preservation & Digitization Projects

a) A link to the Sound Directions Project, which is a nationally-funded collaboration between Indiana University and Harvard to develop best practices and standards for the digital preservation and interoperability of archival audio formats.

http://www.dlib.indiana.edu/projects/sounddirections

b) As it’s widely known, the TIFF (revision 6.0) is the file format of choice right now for storing archival masters of image files. However, I wasn’t sure what the equivalent was for audio files. Quite simply, then, I learned that the target preservation format for audio files is the Broadcast Wave Format (confusingly, the file extension is still “.wav”). Based on the lossless Microsoft WAVE format, a Broadcast Wave file has been standardized by the European Broadcast Union and (like normal .wav files) includes space for supplemental metadata. Just one piece of metadata that Mike Casey recommended including is the original filename in the description field (just in case the name ever gets accidentally changed!).

Here at ECU there are numerous audio resources that will have to be considered for digitization in the near future – with technological changes in playback devices, there is simply no getting around this. The Oral Histories that both Joyner and Laupus Library collect are a good example. These collections primarily consist of audiotapes, which any good vendor has exceptional experience with. Because of this, as well as their relatively small size, these collections could also serve as ideal pilot projects for any further audio digitization efforts (be those outsourced, done locally, or a combination).

More local resources to consider, of course, would primarily be housed at the Music Library. According to their website, they currently have about 11,000 CDs, 6500 LPs, and 1800 audiotapes. Of course, we do not own the rights to the majority of these materials, but anything that is a local resource or in the public domain should certainly be inventoried with digitization in mind (the Field Audio Collection Evaluation Tool open-source application that IU will release next year could help to prioritize this process).

The digital collections department has already conducted one “music” project of local importance – the Alice Person: Good Medicine and Good Music project – and I have no doubt that there are many more potential projects waiting in the wings.

Finally, before I close, I want to mention one last piece of advice that George Blood shared with the entire group. When you are writing a grant, remember that you can write into that grant site visits to a select number of potential vendors. Mr. Blood confessed that he was surprised at how few of his customers ever made a visit. Of course, making phone calls to the vendor as well as their references and your colleagues is also invaluable and affordable, but if you are able to get a grant to cover it, you should definitely make it a point to personally visit a select number of vendors before fully committing to any outsourcing project.

Here is Johnny 5 – the robot from “Short Circuit” – to close this report and to wish everyone a wonderful Thanksgiving:

. . . . . . . . . . . . . . . . . . . . .Johnny 5. . . . . . . . . . . . . . . . . . . . . .

Thanks, Johnny… No. 5 is alive!