Imagine a National Digital Library: I Wonder If We Can

Here’s a paper and accompanying slides about the National Digital Public Library planning initiative I wrote for the Electronic Resources and Libraries Annual Meeting in Austin, TX. I append the plain text below. See also the bibliography.

Imagine a National Digital Library: I Wonder If We Can by Amanda French

 

*****

Recently, unexpectedly, I’m completely keen on going to Korea. Why?

Because I’m dying to see these guys in their natural habitat. These are D.to, N.to, and U.to. They are the mascots for the National Digital Library of Korea, also called the “dibrary,” which opened its doors (yes, doors) in Seoul on May 25th, 2009. According to the dibrary’s website, “Smart D.to is a digital knowledge messenger who searches for the information you require. D.to is blue, a futuristic color, and represents the messenger delivering necessary digital knowledge.” N.to, the green mascot, “loves nature and green ideas,” and “symbolizes the freedom of the world of knowledge.” And the red mascot is “warm-hearted U.to,” who “symbolizes the guidance to the ubiquitous world, who will be with us at anywhere and any time.”

You have to admire the charm of of ditching a dry mission statement for a trio of brightly-colored 21st-century allegorical figures as rococo as any  be-draped nymph or undraped cherub in the Library of Congress.

You also have to admire people who would literally build a digital library with a sign out in front identifying it as such. The National Digital Library of Korea took seven years to build, at a cost to the Korean government of about $112 million dollars US, and by some accounts it contains over 116 million “pieces of digital content,” which would make it almost eight times as large as the Europeana digital library, which claims 15 million items. That 116 million number, however, is probably based on a definition of “pieces of digital content” that includes (say) database records, and is therefore not measured in the same units as most digital libraries. But reports also testify, more believably, that the dibrary has digitized 380,000 books, and that is a very respectable number, one larger than the 300,000 ebooks offered for lease by NetLibrary, for instance.

The National Digital Library of Korea is an eight-story building (five of those stories underground) that seats 550 patrons, and it runs 300 TB of server space. The physical space and the equipment are so advanced as to seem almost fictional. On the main floor, pictured here, there are touch-screen help kiosks. There are 3D monitors that do not require viewers to wear 3D glasses. There is a Global Lounge running PCs in English, Chinese, Japanese, French, and Vietnamese. There are multimedia viewing and creation and editing spaces as well as meeting and café spaces.

There are more touch-screen kiosks, these dedicated to the sole purpose of reading digital newspapers. There are electronic tables with touch-screen surfaces, and using those tables you can see digital surrogates of historic Korean books as they lie open flat before you, seemingly in the table rather than on it.

There is a permanent art installation that “displays customized videos based on a user recognition function.” There is an enormous screen reserved only for 3D text, including “user messages.” There is a Laptop Zone, and there is a “Productivity Computer Cluster” whose desktop computers have large monitors and multiple monitors.

There is a connecting bridge called the Way of Knowledge that connects the National Digital Library of Korea with the National Library of Korea, and projected on the walls of the Way of Knowledge are “motion-sensitive interactive contents.”

And there is, of course, D.to, N.to, and U.to. Get my drift? Feel a sudden longing to go to Korea?

But the Korean dibrary is not just about fancy physical spaces or symbolic cartoon characters: it’s very much about providing a whole set of national library services for Korea. In September 2009, just a few months after the dibrary first opened, Korean law was altered in order to give Korean dibrarians the authority to collect and indeed responsibility for collecting Korean data from the open web. Certain kinds of data were legally required to be deposited in the national digital library so as to enable not only preservation but also “the production and distribution of alternative materials for the disabled.” Now centrally coordinated by the National Digital Library of Korea are all kinds of digital services, from training programs to inter-library loan. The dibrary is even charged with creating a “one card system that gives access to 699 public libraries nationwide,” a system scheduled to go live in 2012. And once Korea has fully nationalized as many library materials and services as it can, it’s apparently not going to stop there: last summer a meeting was held to plan a China-Japan-Korea Digital Library, an Asian digital library or portal modeled after The European Library project. To me it sounds like the second step toward the single digital library filed contentedly away in the humming systems of the starship Enterprise, waiting to be addressed with a question: “Computer . . .”

In fact, the first article lobbed into the recent discussion of a U.S. national digital library is titled “A Library Without Walls,” indicating that its author is using the traditional definition of a digital library, the one that defines a digital library as strictly digital. Robert Darnton, that piece’s author, is the director of the Harvard University Library. In October of 2010, he convened a meeting at Harvard of “42 top-level representatives from foundations, cultural institutions, and the library and scholarly worlds” to discuss how to create a national digital library for the United States — “That is, a comprehensive library of digitized books that will be easily accessible to the general public,” as he wrote on the New York Review of Books blog afterward. Darnton made no mention of a building, nor Korea, and user messages displayed on a large 3D monitor were apparently the farthest thing from his mind. He evoked instead “the Republic of Letters,” Voltaire and Jefferson and their Enlightenment ideals of widely shared knowledge.

Similarly, in an October interview with Jennifer Howard for the Chronicle of Higher Education, Darnton said, “One of the first things we discussed was the financial problem. It didn’t take long for people there to arrive at a conclusion, which is: We can do it. Everyone seemed convinced that this is certainly within the scope of a funding campaign by foundations.” Grant-making government foundations such as the National Endowment for the Humanities and the National Science Foundation were probably included in that category as well as private grantors such as the Andrew W. Mellon foundation, but apart from that there seems to have been little hope that the U.S. government would step in directly to fund such a project. Nevertheless, what the group of research library leaders pictured sounded more like a public library than like a research library.

Said Darnton, “The agreement was very solid about the desirability of this thing, and then there was discussion about what ‘this’ was. In general, I think it fair to say, everyone thought the library should be one for the American people, by which I mean not an exclusive research library but a grand collection of books that could be used in junior colleges and high schools and institutions of every sort throughout the country.” Public librarians began to notice that they were being left out of the discussion of how to create this thing that sounded a good bit like a public library, and Darnton and his group began to make changes in response.

By December, when Harvard’s Berkman Center announced that it was officially taking on the planning initiative as a project, the National Digital Library had become “The National Public Digital Library of America.” Public librarians were also invited to participate in the discussions, and public library groups such as LibraryCity with similar goals began to join the general public dialogue in academicky forums such as the Chronicle of Higher Education. Initial meetings of the DPLA are still being funded by the Sloan foundation, and although it’s early days yet, there’s no talk of seeking Congressional funds.

Andrew W. Mellon’s father, Judge Thomas Mellon, whose financial success would eventually result in the formation of the Mellon foundation, would have approved of leaving governments out of it, as David Canadine reports:

In November 1881, Andrew Carnegie offered $250,000 for the construction of a municipal library, on the condition that the corporation [of the city of Pittsburgh] commit $15,000 a year to its maintenance. To meet this condition, state law would have to be changed, enabling the corporation to earmark public funds for this purpose. The Judge believed in reading, he was acquainted with Carnegie, and they shared Scottish ancestry, a devotion to Burns and Spencer, and a passion for free enterprise. But he was vehemently against this gift, fearing that any such statutory alteration would open still wider the floodgates of municipal profligacy, civic debt, and caucus corruption. He proposed an alternative scheme, whereby a library would be built and maintained by the public subscriptions of rich individuals, and for a time his plans carried the Select Council. But the Flinn-Magee machine was determined that the Carnegie scheme should prevail, and the ensuing battle would last five years. Eventually, in October 1886, the Select Council accepted Carnegie’s terms, with the Judge casting the only dissenting vote. Meanwhile, Carnegie had increased his gift to one million dollars, to finance not only a library but also more extensive buildings devoted to the arts, science, and technology. As a result, the city’s annual obligation for maintenance rose to $40,000 a year, confirming the Judge’s worst fears about municipal profligacy and waste.

It always surprises me a little, naïve fool that I am, that there can be any doubt that a library is a public good that contributes to a more informed and happier citizenry, and that it is therefore a legitimate expense of government. But of course, tax-supported public libraries as we know them today have really only been around since the middle of the nineteenth century, and in fact it was that same Andrew Carnegie who would do the most of anyone to create a national system of public libraries in America. Had the Judge prevailed, heaven only knows whether we’d have public libraries at all today – the Pittsburgh library under discussion was the very first of over 1600 libraries Carnegie would fund in the United States on the condition that local governments commit to supporting them. (Though Carnegie had built one library previously in the tiny Scottish town where he had grown up.)

And of course, governments have slashed public library budgets in both the U.S. and the U.K. lately, to the point where one group of library users in Milton Keynes checked out all the books in their local library as a protest against its planned closing.

So, then, Darnton is apparently wise not to seek federal funding for a National Digital Public Library of America, although leaders at the National Archives and the Library of Congress are indeed involved in the planning, and although appeals to Congress may yet be on the menu. Consider, too, that we have already had a “National Digital Library” initiative, and while it was not a failure, it was certainly not widely transformative. Some of you may even remember all the way back to 1990 when the Library of Congress’s American Memory pilot project began, ending in 1994 having digitized (and put on CD-ROM) some 200,000 public domain items related to American history. In 1994, the National Digital Library took over the same work, and beginning with $60 million over five years ($45 million of it donated by technology corporations) eventually digitized about 9 million archival items in the public domain for American Memory. One critique of this project, influenced by Michel Foucault and titled, unoriginally enough, Library of Walls, points out that it was very far from being in the American public’s interest: “[T]he ‘National Digital Library’ is anything but the ‘plain vanilla’ presentation of historical material,” writes Samuel Collins. “Rather, the entire American Memory project from its inception in 1990 to its continued development today shows […] a careful selection and organization of materials designed to both highlight the institution of the Library of Congress and appeal to the Library’s ‘clients,’ especially Congress.” The Library of Congress, let us remember, is not de jure the national library of the United States.

And yet surely the largest problem with creating a Digital Library of America is the province of both Congress and the Library of Congress. Let me hear you say it: copyright. We saw in the case of the National Digital Library of Korea (which is physically linked by the Way of Knowledge to the existing National Library of Korea, a national library de jure) that the Korean government was willing to change its laws in order to better enable the digital library to do its work of preservation. Is our government willing to do that? Does anyone know how much it costs to hire a lobbyist, and does anyone know whether Mellon or Sloan can pay for that?

Well. Copyright. In any case, you might be asking yourself: What does this have to do with me, a humble Electronic Resources Librarian? A few things. First of all, pie in the sky, imagine an American national library consortium, and imagine the bargaining power such a consortium would have with STEM journal publishers. As it happens, Korea is again an illuminating example. In 2002, an assessment of a Korean digital library effort for university researchers called the Research Information Service System (RISS) discovered that 95% of its users were seriously frustrated by their inability to access the full text of foreign journal articles. Korean libraries simply could not afford to pay the permission fees. Four years later, in 2006, Korea had formed a consortium: the Korea Electronic Site License Initiative (KESLI), partly modeled on OhioLink, increased “the use levels of scholarly information to six times higher than average than before.” Imagine that.

More realistically, if there really does come to be a serious national initiative in which academic and public librarians actually partner on providing broad access to electronic resources, that’s a change electronic librarians in research libraries should to be aware of, too, and foster or resist as judgment suggests.

Lastly, and most importantly, whether this ever winds up affecting Electronic Resources librarians, if it does come to pass, it will affect us as citizens.

It is therefore good to know that there are ways in which we can share our thoughts. The intrepid director of the Center for History and New Media, Dan Cohen, is ON THIS VERY DAY attending a meeting of the DPLA, and he is more than willing to receive the wisdom of the crowd via Twitter. Hashtag: #dpla.

NDPLA also has an open e-mail list that anyone can join, and a wiki that anyone can edit. I for one plan to follow this project closely as it evolves and share my opinions via the listserv and other means just as soon as said opinions descend voraciously upon me like a Cooper’s hawk diving for a mouse in the reading room of the Library of Congress. You should do likewise.

(And tell them you want the National Digital Public Library of America to have a cool building like they have in Korea. We can put it in Detroit.)

*****

Addendum: In the course of researching this paper, I put together a spreadsheet with some basic information about some large digital libraries – in that category I include commercial products such as NetLibrary and OverDrive and Audible as well as university and foundation projects such as HathiTrust and Europeana and government initiatives such as the National Library of Norway and Gallica, the national digital library of France. Feel free to browse and analyze. j.mp/lg-dig-lib

Notes on Freebase workshop at THATCamp SoCal

The below is cross-posted from the THATCamp SoCal (The Humanities and Technology Camp Southern California) blog at socal2011.thatcamp.org/01/12/notes-on-freebase-bootcamp-session/

***

I’ve been hearing about Freebase for awhile now, especially from Jon Voss, who organized and ran THATCamp Bay Area, so I figured I’d go to that BootCamp session here at THATCamp SoCal. I’m very, very glad I did. It was taught by Kirrily Robert, who’s Skud on Twitter. As I said on Twitter, I had thought that Freebase was simply a place where people could upload their datasets, and it is that. But it’s also a rather amazing project that’s a bit difficult to explain if you don’t know what open linked data is. And if you don’t know what open linked data is, why then the rather charming animated video that Kirrily showed us might be of use (it’s about “Metaweb,” which is the name of the company that owned Freebase before Google recently bought it, but it gives the idea — web.archive.org/web/20100528142644/http://www.metaweb.com:80/ will now resolve to freebase.com):

[youtube tBSdYi4EY3s]

Kirrily is the developer liaison for Freebase, but I thought she did a great job of pitching the workshop to us non-developer humanist types, and I think that the actual developers who were there (including Joyce Ouchida from USC) probably also got a good idea of what Freebase is all about and what they could do with it. We started by looking at the Freebase page for William Blake:

the William Blake Freebase page

You may notice (I did) that a good bit of Freebase content comes from Wikipedia; one of the things that struck me like a hammer about Freebase is how purely factual it is. And, later, how it’s the relations between things that constitutes Freebase’s “entity graph,” not prose — the video above even begins by evoking what a pain words are and how their meanings are contingent. It’s all very poststructuralist. I love it.

We moved quickly into editing, which wasn’t any harder (in fact quite a bit easier) than editing Wikipedia. I did a good bit of work on my pet go-to topic, the villanelle, adding several instances of “poems of this form” (Bishop’s “One Art,” for instance, for which I also had to create a page in Freebase, though others, such as Plath’s “Mad Girl’s Love Song,” already had pages). We then looked at how to construct Freebase queries in MQL, Meta Query Language, and we talked about how to use Google Refine to clean up Excel data sets for use in Freebase. (That alone was a terrific tool to learn about.)

What I’m wondering now is whether Freebase might even be a better site to send students to for factual information research than Wikipedia; I’m not sure. In the session, I asked what Freebase is for: whether it’s a destination research site or a provider of structured semantic data for developers. Kirrily said that they had discussed that very question rather a lot at Freebase, and that their usage statistics show that the latter use is by far the more common. If I did more development, I can definitely see how I’d be all over Freebase’s linked data — so, so useful in building applications. Kirrily mentioned one example at conflicthistory.com. It made me think seriously about building something I’ve had in mind for some time: a site backed by a database of poetic forms are tagged with their forms (sonnet, triolet, villanelle etc.) and other features, and I can see that sucking in some of the existing Freebase data to that would save a load of work. I went out and registered poeticforms.org right away, in fact.

Anyway, thanks Kirrily and THATCamp SoCal — this was a great session.

Your Twitter followers and Facebook friends won’t read your peer-reviewed article if they have to pay for it, and neither will strangers

Here’s the paper I’m giving today at the Modern Language Association convention in Los Angeles at the panel “The Open Professoriat: Public Intellectuals on the Social Web.” You can see the slides on Google Docs and embedded below; the text of the talk (also given below) is in the speaker notes.

 

***

The question before today’s panel is “Can social media help broaden the audience for academic work?” I’m going to talk about a more specific version of this question, namely, “Can Twitter and Facebook help earn more readers for peer-reviewed articles?”

The answer is “Yes, but those readers will not pay to read peer-reviewed articles.”

In December of 2010, I tweeted a link to a PDF of an article from the recently published proceedings of the 2010 meeting of the American Society of Information Science and Technology titled “How and Why Scholars Cite on Twitter.” It was one of my most clicked-on links for the year, with 118 views—many of the links I tweet to news articles and so on get only thirty or so clicks. The authors studied a sample of 46,515 tweets from twenty-eight scholars — seven scientists, fourteen social scientists, and seven humanists — and reported that “In our sample of tweets containing hyperlinks, 6% were citations. Of these, 52% were first-order links and 48% were second-order.” By this, they meant that 52% of the links went directly to peer-reviewed work, while 48% were links that went to non-peer-reviewed work about peer-reviewed work: blog posts and news articles, for instance.

One of the main reasons that scholars tweeted these “second-order” links was that they worked for everyone: “[S]cholars may prefer to link directly to the article when it is open access but will resort to second-order links to bypass paywall restrictions. Participants were attracted to open-access articles for Twitter citations; Ben said ‘I would certainly be much more likely to link to things if they were more readily available.’ ”

This article doesn’t study who exactly was clicking on the links the scholars tweeted, although it does report that scholars regarded Twitter as a way to share information with members of their discipline. Certainly this is one of the chief things I use Twitter for, myself: sharing and acquiring information from my colleagues in the digital humanities. In July of 2010, however, I used Twitter for a slightly different purpose: to let members of my network know that a peer-reviewed article of my own had just appeared: “The Summer 2010 issue of *Victorian Poetry* with my article ‘Edmund Gosse and the Villanelle Blunder’ in it is out,” I wrote, and included a link to the article’s landing page in Project Muse. Several friends, scholars themselves in entirely different fields, replied with congratulations on Twitter: one at least showed that she had read at least a little of the article, because she mentioned a word I used in the first paragraph. And one complete stranger, a follower named Robert Withers, wrote that he couldn’t “find Victorian Poetry on his local newsstand,” with what degree of seriousness I simply don’t know. I replied with a link to my (openly available) dissertation, on which the article draws, and we had a short exchange about poetic form.

I feed my Twitter updates to Facebook, and so my Facebook friends saw the tweet as well. There, two friends, one an anthropologist in Aberdeen and one a poet in New England, also expressed interest and congratulations: my friend Alex the anthropologist, however, complained that his university didn’t subscribe to the journal and that he therefore wouldn’t be able to read the article.

So this is six people, five of them my friends. Hardly earth- or academy-shattering. But of that small sample, two were not scholars, and they are terrific examples of a broader audience for peer-reviewed scholarly work. I haven’t spoken to my friend Leigh Palmer the New England poet in person for years, and I would never have thought to (say) e-mail her my article, but as you can see, she was very interested in and engaged with what I wrote. Robert Withers is a stranger to me, but I looked him up for this piece and discovered that he is an independent filmmaker by trade; he was interested in the article for its own sake, but could not read it because it was behind a paywall. The “broader audience” that is indeed reachable via Twitter and Facebook was in this case halved because the article is not openly available. I might mention, too, that when the article was accepted by Victorian Poetry, I negotiated with them to be allowed to post the article openly online, but I did not gain that right.

The audience for an article on Edmund Gosse and the villanelle, of course, is small to begin with – the link that appeared on Twitter and Facebook was clicked on only twenty-four times. How many people might be reading the article through Project Muse or in print, of course, I do not know and have no way of telling: my article is as yet uncited (unsurprisingly, given how recent it is) by anyone else writing about the villanelle or a related scholarly topic. But it’s also clearly the case that Twitter and Facebook can indeed help earn more readers for peer-reviewed articles, as long as those articles are openly and freely available on the web.