Open Shakespeare at OKCon 2011
OKCon 2011, at the Kalkscheune buildings in Berlin, was fantastic, and I thought it would be a good idea to publish a few reflections on some of the stuff that was going on there, both for the benefit of those who did not make it nor watch the live feeds, and for the chance it offers of mapping Open Shakespeare’s position in the wider Open Knowledge community.
Rufus Pollock provided the opening address, pointing out how the convergence of the two phenomena of greater data availability and advanced computing power had created the perfect conditions for openness to flourish. He announced one such flourishing in the form of datacatalogs.org, which came online at the start of the conference. His next point was to argue that the focus of activities in the community was moving from making data accessible to providing tools for and building communities around that data. Of course, the quantity problem is only half solved (a later speaker pointed out the small quantities of open government data in Asia, for example), but was still at a point where data cycles (ecosystems of community, tools and data) could be founded. This last point fits neatly with Open Shakespeare, since the project is slowly forming just such a cycle: early editions of Shakespeare’s plays are open data, and a small community is either building tools (like the annotator) or using them to create more content about Shakespeare’s works, which in turn offers new programming challenges and so completes the circle.
Glyn Moody’s keynote talk, immediately following Rufus’, approached the topic of Open Knowledge from a different angle, by analysing the current situation in terms of a new abundance which placed pressure on systems, such as the UK’s copyright law, designed for eighteenth-century conditions of scarcity. Although Moody did not mention it, Shakespeare himself was something of a forerunner in this domain: the “fourteen years plus fourteen more” model of copyright established in 1710 was the result of bookseller lobbying, not least that of Jacob Tonson, eager to protect his monopoly on the works of Shakespeare and others (notably Milton, and Dryden’s translations of Virgil). Having sketched out his model of abundance and scarcity, Moody concluded with the provocative question of how open projects would function without copyright, pointing out that many in fact depend upon restrictive legislation as their raison d’être. The only answer that I can give is that open projects would perhaps continue as the first models of communities where exchange and collaboration are well established (as in Open Shakespeare), that is to say, continuing as, in other words, those “data cycles” and “ecosystems” that Pollock had described as the successors to the victories of open data availability.
Later on in the conference, in the second track of talks, a panel on ‘Data Journalism: What Next?’ provided considerable food for thought on the topic of communities, much of it served up by the Guardian’s Simon Rogers. It was he, for example, that questioned the merits of crowd-sourcing, arguing that it did not provide objective data, since its contributors could be extremely biased, an MP participating, for instance, in the crowd-sourced analysis of his own expenses. This point was backed up by Stefan Candea, with both he and Simon Rogers emphasising the important labour that remained for the journalist when it came to looking over crowd-sourced responses and shaping them into a story. A neat example of this was the Guardian’s exploration of Sarah Palin’s emails, where users were directed to a random email and then asked to signal anything of interest. Although not flawless (one imagines a Palin aide slaving away to hide significant correspondence), its randomness nevertheless provided an even coverage of the files. This randomness might be an important tool for Open Shakespeare’s own crowd-sourcing of annotations, as a way of directing users to annotate less-appreciated works. As regards the verifiability of these annotations, Open Shakespeare has the problematic luxury of considering subjective opinion on the Bard’s art as valid as objective facts about it, since these opinions map the contours of contemporary attitudes to Shakespeare. Further, the intense subjectivity of responses to art means that such subjective annotations do not suffer from the problem of verifiability, because no such critical response has ever been verifiable (for those interested, this line of argument is behind Kant’s description of “universal subjective validity” in his Critique of the Power of Judgment).
It is on this idea of subjective annotation, the generation of subjective data, that I would like to bring this summary to a close. The conference was on Open Knowledge, but it is significant that I found the adjective to have been discussed far more often than the noun. Open Shakespeare’s annotation system, the tool that generates its data cycle, provides both verifiable information (“mirth in funeral” is an example of “synoeciosis” in Hamlet) and subjective opinion (“Words, words, words” is, for one user, “one of the most human lines in the play”). Is the second still data? I would argue that it is, but it is of a kind rarely discussed in Berlin. After all, what are we to do with it in order to integrate it back into the system of open data? Such opinion does not atomise easily, just as Shakespeare’s own words resist, with their context and their double meanings, computerised analysis. We can count the instances of the word “prune”, but it takes an article on the subject to bring out the humour from the information generated by the open-source tool. That article itself is data and can be itself the launch pad for new responses, but it moves the axis of the cycle away from developers’ tools and their data and towards the perspective of the user and, more broadly, that of the community. Rufus Pollock was right to argue for the existence of ecosystems of open data, but the case of Open Shakespeare shows that they can only be fully functional if all three elements are given their full weight: tools, data, and users together.