Notes on Freebase workshop at THATCamp SoCal

The below is cross-posted from the THATCamp SoCal (The Humanities and Technology Camp Southern California) blog at socal2011.thatcamp.org/01/12/notes-on-freebase-bootcamp-session/

***

I’ve been hearing about Freebase for awhile now, especially from Jon Voss, who organized and ran THATCamp Bay Area, so I figured I’d go to that BootCamp session here at THATCamp SoCal. I’m very, very glad I did. It was taught by Kirrily Robert, who’s Skud on Twitter. As I said on Twitter, I had thought that Freebase was simply a place where people could upload their datasets, and it is that. But it’s also a rather amazing project that’s a bit difficult to explain if you don’t know what open linked data is. And if you don’t know what open linked data is, why then the rather charming animated video that Kirrily showed us might be of use (it’s about “Metaweb,” which is the name of the company that owned Freebase before Google recently bought it, but it gives the idea — web.archive.org/web/20100528142644/http://www.metaweb.com:80/ will now resolve to freebase.com):

[youtube tBSdYi4EY3s]

Kirrily is the developer liaison for Freebase, but I thought she did a great job of pitching the workshop to us non-developer humanist types, and I think that the actual developers who were there (including Joyce Ouchida from USC) probably also got a good idea of what Freebase is all about and what they could do with it. We started by looking at the Freebase page for William Blake:

You may notice (I did) that a good bit of Freebase content comes from Wikipedia; one of the things that struck me like a hammer about Freebase is how purely factual it is. And, later, how it’s the relations between things that constitutes Freebase’s “entity graph,” not prose — the video above even begins by evoking what a pain words are and how their meanings are contingent. It’s all very poststructuralist. I love it.

We moved quickly into editing, which wasn’t any harder (in fact quite a bit easier) than editing Wikipedia. I did a good bit of work on my pet go-to topic, the villanelle, adding several instances of “poems of this form” (Bishop’s “One Art,” for instance, for which I also had to create a page in Freebase, though others, such as Plath’s “Mad Girl’s Love Song,” already had pages). We then looked at how to construct Freebase queries in MQL, Meta Query Language, and we talked about how to use Google Refine to clean up Excel data sets for use in Freebase. (That alone was a terrific tool to learn about.)

What I’m wondering now is whether Freebase might even be a better site to send students to for factual information research than Wikipedia; I’m not sure. In the session, I asked what Freebase is for: whether it’s a destination research site or a provider of structured semantic data for developers. Kirrily said that they had discussed that very question rather a lot at Freebase, and that their usage statistics show that the latter use is by far the more common. If I did more development, I can definitely see how I’d be all over Freebase’s linked data — so, so useful in building applications. Kirrily mentioned one example at conflicthistory.com. It made me think seriously about building something I’ve had in mind for some time: a site backed by a database of poetic forms are tagged with their forms (sonnet, triolet, villanelle etc.) and other features, and I can see that sucking in some of the existing Freebase data to that would save a load of work. I went out and registered poeticforms.org right away, in fact.

Anyway, thanks Kirrily and THATCamp SoCal — this was a great session.