Category Archives: Information Retrieval

Symphony’s problem with operators

I’d be quite alarmed if this hasn’t already been reported since SirsiDynix’s Symphony OPAC has been out in the wild for quite some time, but here’s an annoying “bug” that I just discovered today.

Search for near in any out-of-the-box Symphony OPAC and you’ll get yourself an error.  Now try with, adj, same, and even the Boolean operators or, not, & and (I’ll ignore “xor“, since I can’t think of any examples when I’d ever type that).

[digression: If you try to search for but (for, by, etc.), however, it will tell your that your search contains all stopwords.  And,  I’ll try to forgo the argument about whether or not a library catalog should remove so-called stopwords in this day and age, but suffice it to say that a user can’t find any albums by “The The” without moving beyond the default search form; and, in my opinion, a user should never have to do such a thing for such a simple search. ]

So, yes, the Symphony OPAC seems to have a problem with operators, but it’s certainly not likely that someone will search for adj.  If someone does, however, whether by accident or not, they shouldn’t be greeted with an error.  Instead, their query should be run as is, but on the results page there should also be a new <div>, placed unobtrusively, that informs the user that “adj” is also a proximity operator, and this is how to use it, should they want/need.

What’s worse, though, is that you cannot use near, with, same, or, not, and at the beginning OR end of any of your queries (exception:  you can use not at the beginning of your query without getting an error since that operator doesn’t require a first half of an argument, but it will still treat not as an operator).  And this, in my mind, is the real bug here.  You cannot, then, search for:

  • Near a thousand tables
  • Near eastern archaeology
  • The singularity is near
  • With wings like eagles
  • Same differences
  • Same river twice (but you’re fine if you include the stopword “the” at the beginning, since “same” will no longer be the first word in the query)
  • I love you just the same
  • Or else my lady keeps the key
  • Ready or not
  • Not philosophy (won’t search for the query “not philosophy” but will instead search for any record that doesn’t contain the keyword “philosophy”…  so, you won’t get an error, but you’ll get a LOT of results).
  • And then there were none
  • And the band played on
  • And you get the idea…

Of course, you can move beyond the default search values and use any of those proximity operators in conjunction with the “browse” (or “begins with”) radio button, but that should NOT be a requirement for using a select few query terms.  Or, worse, you could work around this bug, for now, by altering your search to something like this:

“and” then there were none

or even

the and then there were none

but that’s a pretty silly solution, as well.  In any event, I have no idea if this bug has been reported or not, but I am quite certain that it would be a very easy fix for SirsiDynix to implement, so I hope that they do so soon — that is, if they don’t already have a patch for this in the works.

Anyhow, if you want to try this out, of if you’re really ambitious and think that you can find any other bugs worth reporting, here’s a list of libraries using Symphony that I’ve compiled:

Unfortunately, it’s hard to dertermine a static link to Symphony OPACs, so most of those links will take you to a timed-out session.  Once there, though, you can get back to the main search page usually just by clicking on “OK”, and then starting a new search.

[ update: I just checked a Sirsi Unicorn library catlog, and it also seems to have this same issue on default, keyword searches.  So apparently this is a carryover from that legacy system (we were previously on Dynix’s Horizon, which did not have the same issue by default; at least not that I’m aware of).  So, in hindsight, I guess this is a Unicorn bug, which makes me certain that it’s already been reported, but I really wonder why it exists.  Indexing a default query in this manner seems very strange to me.  Certainly they could just require their operators to be followed by a special character, such as “#”, or even just  not treat any boolean or proximity operators as operators when they appear at the beginning or end of a query.]


Subjective Access

“Subjective access may not guarantee that I’m right about the character of the state I’m conscious of myself as being in, but on this view it does ensure that I’m the one who’s in the state if anybody is.”

— David M. Rosenthal, from Consciousness and mind (oclc: 61200643, page 355)

In EAD, we file subjects under a tag known as <controlaccess>, which is short for “controlled access headings.” And, since we’re using these for access (just as they were so used in the card catalogs of old, and occasionally still current), they should certainly be hyperlinks, right?

So, in what state are the subjects of our finding aids actually in, and in what state should they be?

In our case, at ECU, we’re in the process of updating all of our old subjects into Library of Congress Subject Headings, like this:

Sinbad (dog) + more

Sinbad (dog) + more

This isn’t an easy process, but it does mean that, once done, all of our finding aids will “play” nicely with all of the objects in our Library Catalog as well as all of the objects in our Digital Repository. But, for the time being, everything that’s listed in our “controlled access headings” is listed as plain text (without even, at this time, the ability to restrict a keyword search to those fields).

After the update, however, not only will we feature more advanced search options on an advanced search page, but we’ll also turn all of our subjects into hyperlinks. But this begs the question:

to what should we link?

At first, this was “obvious” to me, but after now having looked around at other institutions, it seems that there are a few different solutions, which I’ll list below:

(1) Link nowhere (The option that’s most often employed, and the one that we’ll be moving away from)

(2) Link to the rest of the finding aid database (subject to subject)

(3) Link to the rest of the finding aid database (subject to keywords)

(4) Link to the library catalog, which would include the rest of the finding aid database (subject to subject)

Right now, I’m currently leaning toward a fifth option, at least for the time being (which is just a combination of options 2 and 4):

(5) Link to finding aid database (subject to subject), and then on the page of search results, also include a link that will extend that same subject search to our library catalog and also to our digital repository.

I’d love to hear other options and what other people think may be the best route (though, in my opinion, that may largely come down to the size of your collections and also to the extent that they’ve been cataloged in some normalized fashion).

My final thought about all of this is how to extend it beyond our own collections, into larger EAD databases like ArchiveGrid. Wouldn’t it be useful to a researcher if a subject in a local finding aid could be extended to repositories worldwide? In this case, though, it would definitely be easier to follow Columbia’s example so as to not get involved with messy crosswalks and the like.

Normalized Dates in EAD

A recent post to the EAD listserv has me thinking once again about how dates are used (or not used) in EAD records.  As far as I can tell, RLG’s ArchiveGrid doesn’t permit searching by date (I could be wrong on this, though, as I don’t have full access to it, but it does use Lucene to index its records; though I suppose that most of these are just MARC records?) and Proquest’s Archive Finder does permit searching by date, but it doesn’t really allow you to do very much (i.e. there’s no way to rank your results by “relevancy”).

This leads me to a question:  what sort of back-end systems are archives using for their EAD records? (are there any surveys out there that has this information, or should we start one???)

At ECU, we’re using an XML database only, but we aren’t doing any advanced searching by date (primarily because, at this time, if you did search for something like “1912”, it’s not going to limit your results very much; and then, really, you’re just back at the whole “browse by collection name” situation).  However, you can do a keyword search for “1912”, and the results that are returned to you will be ordered by the number of hits in each document, which, in my mind, is only a small difference in functionality, but perhaps more useful (in most occasions) than simply limiting your results to any and all collection date ranges that contain the year “1912”.

This leads me to another set of questions:  is anyone out there using the “bulk” attribute as part of your information retrieval process?…  is anyone using dates beyond the collection range (those dates associated with a series, folder, even an item) in the information retrieval process?…  has anyone attempted to test their corpus of EAD records with their current search operations vs. indexing and searching those records by means of different models of IR, such as Nutch, INDRI, Solr, or even just Google Custom Search???

I think it’s great that we’re encoding our documents so well, but I keep wondering if we’re harnessing that information in the best possible ways yet (and perhaps the best solutions won’t be tied to our encoding practices at all).