Networks & literary influence

Last night I read Six Degrees: The Science of a Connected Age, by Duncan Watts. It’s a few years old, now — interesting to read a book about networks, including social networks, that came out in 2003, right before the Web 2.0 explosion. Watts even added a chapter just a year later strictly in order to take into account SARS, Howard Dean, and Friendster. Awhile ago I also read (via audiobook, but let’s not quibble) the 2002 Linked: The New Science of Networks, by Albert-Laszlo Barabasi, and of course I positively slavered over Here Comes Everybody a month or two ago. The Watts and Barabasi books are much more mathematical, though still very accessible, and among other things it’s amusing to read them as a narrative of scientific competition. Watts and his team build a brand-new general model of a network, but, oh no! they forget to take degree distribution into account, upon which sly Barabasi and his team triumphantly scoop them in Nature (or was it Science) with that forgotten, important adjustment. Gripping stuff. Intellectual agon at its finest.

I love reading about networks, and I’m starting to see power laws everywhere. (Which reminds me that I should really read the book The Long Tail; I’ve only read the article.) I still don’t quite get what a scale-free network is, or rather, I know what it is, but I don’t quite get what factors make a network scale-free instead of random. I’m also wondering whether hub-and-spoke networks are really a kind of hierarchy with only two levels. I’m almost at the point (god save me) of wanting to go read a bunch of business books about organizational structure. Which organizations, which structures, which networks, best allow for communication, learning, creativity, happiness and puppies? A very interesting question.

My dissertation was also based on a network question, I think, though a different one: How does one idea (“the villanelle”) travel through a network? Or, really, more simply (I’m almost ashamed to say how simple), What is the network that this idea traveled through? What was its exact route of textual transmission? I began to get very incensed at some of the fuzzy, romantic claims of those who would airily suggest that the villanelle just, you know, happened. Peasants sang merrily in the fields surrounding Naples, and next thing you know all the courtiers in France are wild for A’bA” abA’ abA” abA’ abA” abA’A”. Dude, there’s always a Patient Zero.

Last night after I shut the covers of Six Degrees, I started dreaming about building a database (or maybe it would only need to be a web service if it could hook into existing textual repositories via some kind of search protocol or APIs) that tracks literary influence. Surely someone must have done something like that? All you need to do to start, at least, is to get access to a bunch of full corpusses. Corpi? All the works of a single author; as many sets of those as you can get. Start with Shakespeare. Take every word Shakespeare ever wrote (and yes, I know that that right there is contentious, because which editions?) and do a basic concordance. There’s one at, for instance. Then take someone else: T. S. Eliot, say. Same thing: basic concordance. (Apparently there isn’t one online any longer.) Then you link the individual words, so that if you look up the word “flower” you can find out that Shakespeare used is 64 times and Eliot used it 28 times, for instance. Make sure to calculate that as a percentage of the total number of words in that author’s corpus. Then use some algorithm to determine the degree of similarity: how many of Shakespeare’s top 500 words appear in Eliot’s top 500 words, for instance. And it’d also need to take into account how many times the word “Shakespeare” appears in Eliot’s writing.

The more authors whose total oeuvre you put in, the more detailed your model of the literary network would become. Shakespeare would be a hub, duh, the Google, the Tokyo Station. You couldn’t stop there, of course. You’d have to eventually incorporate the literary criticism, for the simple reason that Jane Smiley’s A Thousand Acres never once mentions the word “Shakespeare,” let alone “Lear” or “a-cold.” If she uses “forked” and “jelly” it’s surely in a kitchen sense. So you’d take all the works (at that point you need a “Work” field as well as an “Author” field) that mention both “Shakespeare” and “Smiley,” and there you go. Wait, no, then you’d need to build in geography and chronology. Does Marlowe ever mention Shakespeare? Does Shakespeare ever mention Marlowe? I don’t think so, but you can’t say they didn’t influence each other. They must have.

There’d be lots of interesting insights to come out of it if it were built right, had a great interface, and had lots of good, clean, upright, loyal, chaste data behind it. Boy, wouldn’t that beat stupid clinamen, tessera, kenosis blah blah blah all hollow?

Sigh. The fantasies I have.