Report on Library and Museum Digitization Projects

James Moses posted to the SHARP-L list some interesting numbers from the Primary Research Group’s new report on library and museum digitization projects. The first one was probably the most surprising to me: 60% of the funds for digitization projects come from the library’s budget; in the U.S., the number is closer to 64%. I’d have thought grants accounted for a lot more of that.

You know what would be interesting? To see whether Google and Microsoft’s digitization projects were cheaper in real terms. I wouldn’t be surprised to find that they’re more efficient and knowledgeable and therefore wind up spending less in order to digitize comparable amounts.

An insult to the concept of “training”

Via the The Chronicle of Higher Education — apparently every staff member at the University of Iowa, including faculty, will now learn how not to be a jerk. Great news! Obviously it’s only ignorance of social norms that would lead a professor to offer higher grades to his female students if they let him feel their breasts. It couldn’t possibly be the case that he knew it was wrong and did it anyway. What kind of appalling free-will hell of a universe would that be to live in?

That said, I’m not at all opposed to requiring people to spend a few minutes or an hour being told about university policies and punishments, even quizzed on such policies and punishments. Oooh, let me write the quiz!

The University of Iowa punishes professors who trade grades for sex, money, or any consideration whatsoever besides merit by

A. Suspending them.
B. Firing them, tenure or no tenure.
C. Depriving them of copying privileges.
D. Relegating them to offices where new carpet has been installed.
E. Setting a trap baited with jailbait and posting the resulting video on YouTube.

Two things are wrong with this “sexual harassment training,” though. First of all, I’ve been through similar things, and usually the questions are more like the following:

For a professor to trade grades for sex, money, or any consideration whatsoever besides merit is wrong because

A. The Berne convention of 1886 declared it illegal.
B. You could lose your job.
C. It might offend the student.
D. You did not state these considerations in your syllabus’s Course Requirements.
E. All of the above.

However, I haven’t seen their proposed test. I’m sure they haven’t written it yet. Perhaps it will be sensible, and perhaps I’m being unfair. But the second thing that’s wrong? Don’t call it “sexual harrassment training.” That dilutes the very idea of education. “Training,” especially at a university, should mean teaching someone to do something. Unless they’re offering a certificate in CYA, I don’t see what they’re teaching. Call it a “University Policy Review” or something.

Current mood: disgusted.

UPDATE 8/15: Others are also disgusted.

Networks & literary influence

Last night I read Six Degrees: The Science of a Connected Age, by Duncan Watts. It’s a few years old, now — interesting to read a book about networks, including social networks, that came out in 2003, right before the Web 2.0 explosion. Watts even added a chapter just a year later strictly in order to take into account SARS, Howard Dean, and Friendster. Awhile ago I also read (via audiobook, but let’s not quibble) the 2002 Linked: The New Science of Networks, by Albert-Laszlo Barabasi, and of course I positively slavered over Here Comes Everybody a month or two ago. The Watts and Barabasi books are much more mathematical, though still very accessible, and among other things it’s amusing to read them as a narrative of scientific competition. Watts and his team build a brand-new general model of a network, but, oh no! they forget to take degree distribution into account, upon which sly Barabasi and his team triumphantly scoop them in Nature (or was it Science) with that forgotten, important adjustment. Gripping stuff. Intellectual agon at its finest.

I love reading about networks, and I’m starting to see power laws everywhere. (Which reminds me that I should really read the book The Long Tail; I’ve only read the article.) I still don’t quite get what a scale-free network is, or rather, I know what it is, but I don’t quite get what factors make a network scale-free instead of random. I’m also wondering whether hub-and-spoke networks are really a kind of hierarchy with only two levels. I’m almost at the point (god save me) of wanting to go read a bunch of business books about organizational structure. Which organizations, which structures, which networks, best allow for communication, learning, creativity, happiness and puppies? A very interesting question.

My dissertation was also based on a network question, I think, though a different one: How does one idea (“the villanelle”) travel through a network? Or, really, more simply (I’m almost ashamed to say how simple), What is the network that this idea traveled through? What was its exact route of textual transmission? I began to get very incensed at some of the fuzzy, romantic claims of those who would airily suggest that the villanelle just, you know, happened. Peasants sang merrily in the fields surrounding Naples, and next thing you know all the courtiers in France are wild for A’bA” abA’ abA” abA’ abA” abA’A”. Dude, there’s always a Patient Zero.

Last night after I shut the covers of Six Degrees, I started dreaming about building a database (or maybe it would only need to be a web service if it could hook into existing textual repositories via some kind of search protocol or APIs) that tracks literary influence. Surely someone must have done something like that? All you need to do to start, at least, is to get access to a bunch of full corpusses. Corpi? All the works of a single author; as many sets of those as you can get. Start with Shakespeare. Take every word Shakespeare ever wrote (and yes, I know that that right there is contentious, because which editions?) and do a basic concordance. There’s one at www.opensourceshakespeare.org/concordance/, for instance. Then take someone else: T. S. Eliot, say. Same thing: basic concordance. (Apparently there isn’t one online any longer.) Then you link the individual words, so that if you look up the word “flower” you can find out that Shakespeare used is 64 times and Eliot used it 28 times, for instance. Make sure to calculate that as a percentage of the total number of words in that author’s corpus. Then use some algorithm to determine the degree of similarity: how many of Shakespeare’s top 500 words appear in Eliot’s top 500 words, for instance. And it’d also need to take into account how many times the word “Shakespeare” appears in Eliot’s writing.

The more authors whose total oeuvre you put in, the more detailed your model of the literary network would become. Shakespeare would be a hub, duh, the Google, the Tokyo Station. You couldn’t stop there, of course. You’d have to eventually incorporate the literary criticism, for the simple reason that Jane Smiley’s A Thousand Acres never once mentions the word “Shakespeare,” let alone “Lear” or “a-cold.” If she uses “forked” and “jelly” it’s surely in a kitchen sense. So you’d take all the works (at that point you need a “Work” field as well as an “Author” field) that mention both “Shakespeare” and “Smiley,” and there you go. Wait, no, then you’d need to build in geography and chronology. Does Marlowe ever mention Shakespeare? Does Shakespeare ever mention Marlowe? I don’t think so, but you can’t say they didn’t influence each other. They must have.

There’d be lots of interesting insights to come out of it if it were built right, had a great interface, and had lots of good, clean, upright, loyal, chaste data behind it. Boy, wouldn’t that beat stupid clinamen, tessera, kenosis blah blah blah all hollow?

Sigh. The fantasies I have.