You are browsing the archive for Technical.

‘O brave new world…’: The Future of Open Shakespeare is Open Literature

- March 12, 2013 in Community, Musings, News, Releases, Technical

At the start of March 2013, went offline. Fear not: it will return in all its full annotating, comparing, analysing, searching, publishing glory soon, as an integral part of this website, where all its data, not least its introductions to individual plays, now lives.

This post will set out the reasons why we decided to make this move, and what our vision is for the project in the months and years ahead.

First, the previous incarnation of Open Shakespeare had several problems, largely invisible to most visitors but extremely frustrating for those of us working behind the scenes.

  • No easy way to upload content such as introductions and essays. This was because we were mixing a pylons back end with a wordpress-powered front end. One of the saddest parts of this situation was that we never managed to get certain introductions live. Now, I’m happy to report that you can read Professor emeritus Hugh Macrae Richmond’s thoughts on Henry VI part 2 for the first time on this website.
  • Open Shakespeare had the potential to be something much bigger than it ever was, as evinced by its sister-project Open Milton, which put Milton’s texts inside the same framework as we were using for Shakespeare. Rather than proliferate parallel projects, it made sense to bring them all together under an ‘Open Literature’ platform: uploading the Milton data is thus one of our next big priorities.

Now from these criticisms comes our vision for Open Literature, an adaptable platform for appreciating literature online. We are creating it with the following principles:

  • Ease of use: many of our Open Shakespeare volunteers, myself included, struggled with the intricacies of the website, the vast majority of Open Literature’s administration can be done through the wordpress interface, whether this is the uploading of texts or the publishing of comments, essays or words of the day.
  • Reuse of existing technology: both the Open Knowledge Foundation and other parter organisations have several projects which overlap with Open Literature: we intend to use Textus to power our annotations here, and we will certainly also be making use of the FinalsClub annotations incorporated into Open Shakespeare through the AnnotateIt system.

So there you have it, the groundings of a website where:

  • Anyone can get involved with little technical knowledge.
  • Literary texts from any authors can be uploaded, annotated, searched and analysed.
  • Quality content about these authors can be made open, available to use, re-use and redistribute.

If you’d like to get involved in setting up this platform, the evolution of all our work on Open Shakespeare, do drop in to the Open Humanities mailing lists, either its general or developer variants.

As Miranda says, “O brave new world / That has such people in’t!”.

‘Touching this vision’: Comments on Producing Shakespeare Visualisations

- April 27, 2012 in Community, Essay, Musings, Technical, Texts

This post is written by Pat Lockley, who has put together a set of data visualisations for both Shakespeare‘s plays and Middleton‘s. These public-domain visualisations were discussed on Open Shakespeare recently, and Pat has kindly written the following description of his own methodology, with some thoughts on how such e-resources are perceived.

I’ve worked in either e-learning or education now for over five years – and one of the main things I have often noticed is the time and effort required to make new resources. People often dream of having a magical button that will make e-learning materials for you, but this, surprisingly perhaps, still remains very much a pipe dream. Often though, as a developer (I am more developer than scholar, or even teacher), you find something in a form which can be converted in order to create e-learning resources. If we ignore the idea that all elearning has to be drag and drop activities or quizzes, then there is a lot of material on the internet from which teaching materials can be made.

So where did the Shakespeare idea come from? Well, I found the text at, and noticed that the web pages had a
structure to them: you could see in the underlying HTML who was a speaker, the act, the scene and what the line number was. Hence I didn’t have to do anything with the HTML, bar write a little bit of code to read it and turn it into a database. Effectively, this code was looking for repeating patterns in the HTML, and then converting them into entries to store in a database.

Now that I had the text in a database, I could write queries on the database to extract and present the data in a variety of ways. All of the data and code was written by me, and some of it is now online on the OKF’s Datahub and GitHub. I’d also be interested in hearing if people would like the data served in any other way. As I said at the start of this blog, people seem to like magic buttons which do all the hard work, and so perhaps making the data available isn’t that helpful for a general audience? Further, I’d like to think that maybe there is some scope in building services around the text, but again, as someone who isn’t a Shakespeare scholar or teacher I think I’d struggle to come up with useful ones in advance.

Shakespeare Visualised

- April 7, 2012 in Community, Essay, Musings, Technical, Texts

How can computers read Shakespeare? It’s a tricky one, not least because ‘reading Shakespeare’ is a bit of a tricky term: I am certain that everyone who reads a Shakespeare play or poem (let alone seeing them performed), reads them in a different way, with different associations and preferences running through their neurons. If ‘reading Shakespeare’ is such a personal, human thing, then it may well be fair to say that computers are not very well equipped to do it. That said, some recent, public domain images by Pat Lockley, entitled ‘The Science of Shakespeare’ present an interesting way to rethink the relation between computers and the act of reading Shakespeare. A computer cannot in any way read as a human does, but that does not make its contribution worthless. Instead, it makes a computer’s reading of Shakespeare something complementary, something that might challenge or confirm our own impressions of Shakespeare.

One thing that many of the images do, for example, is to flatten Shakespeare: the ‘Shakespeare Connections’ sequence shows us who speaks to whom over the course of the play but not at what times; similarly, the ‘Shakespeare Fingerprints’ sequence shows us when someone speaks, but not to whom. When a human reads a play, these two dimensions, the moment and the direction of a speech, cannot easily be filtered out, and I’m yet to find the human reader capable of mapping in his notebook such images as the ‘Science of Shakespeare’ pages provide. In this respect the computer’s view is unique, because non-human.

Let us concentrate now on ‘Shakespeare Connections’. As I mentioned, many of these computer-generated windows on the play confirm things that we already know. In The Winter’s Tale picture, it is unsurprising that Leontes, the jealous and suspicious king of Sicilia who banishes his baby daughter and comes close to killing his wife, is the character who interacts with the largest number of people.

The Winter's Tale

Similarly, it is no surprise that Caius Martius, aka. Coriolanus, is at the heart of Coriolanus.


However, some plays surprise us with their diagrams. It is Falstaff, and not Prince Hal, who is at the centre of the web of King Henry IV part I, and Portia, not the merchant Antonio or Shylock the Jew, who sits at what might also be called the emotional centre of The Merchant of Venice.

Henry IV part I

The Merchant of Venice

One final point. These images show us neither the character who speaks most, nor the most important character in the story. The former is a job for a different program, and the latter one for a human. The ‘Shakespeare Connections’ simply show the character who speaks with whom, and who, out of all these characters, has the largest number of interlocutors. This focus makes the pictures well-suited to showing us the complexity of Shakespeare’s history plays, plays often criticised for their complex plots and excessive numbers of events.

I would like to conclude therefore with a triptych, composed of those images that represent the Henry VI trilogy. Here, the lines in red show us what a tangled web Shakespeare weaves, and how the trilogy descends from the high martial nobility of Talbot, to the bitter struggle led by York and his sons for control of the English throne, until we reach the last convulsions of the war, where Warwick (and the Lancastrian army) is betrayed and killed at the battle of Barnet.

‘That store of power you have’: Repositories

- February 17, 2012 in Community, News, Technical

No Word of the Day this week, but an announcement instead. All the code behind Open Shakespeare, as well as the data is now freely available on GitHub. You can get to it with the following links:

This “store of power”, as Helena puts it at the end of All’s Well that Ends Well, has been around for a while, but the addition of the data puts the entirety of the project in one place. As well as the plays and poems, you will also find the Droeshout engraving of the bard, material from the Encyclopedia Britannica, and some useful scripts (capable, amongst other things, of using XSL to produce high quality PDFs via Latex).

If you have any questions about using the repository, check out the readme or get in touch with us on our mailing list. Making this stuff freely available is a key part of our belief in openness, and it would be truly wonderful to see other projects grow out of our own.

Success in Inventare il Futuro Competition

- November 8, 2011 in Community, Essay, News, Publicity, Technical, Texts

By James Harriman-Smith and Primavera De Filippi

On the 11th July, the Open Literature (now Open Humanities) mailing list got an email about a competition being run by the University of Bologna called ‘Inventare il Futuro’ or ‘Inventing the Future’. On the 28th October, Hvaing submitted an application on behalf of the OKF, we got an email saying that our idea had won us €3 500 of funding. Here’s how.

The Idea: Open Reading

The competition was looking for “innovative ideas involving new technologies which could contribute to improving the quality of civil and social life, helping to overcome problems linked to people’s lives.” Our proposal, entered into the ‘Cultural and Artistic Heritage’ category, proposed joining the OKF’s Public Domain Calculators and Annotator together, creating a site that allowed users more interaction with public domain texts, and those texts a greater status online. To quote from our finished application:

Combined, the annotator and the public domain calculators will power a website on which users will be able to find any public domain literary text in their jurisdiction, and either download it in a variety of formats or read it in the environment of the website. If they chose the latter option, readers will have the opportunity of searching, annotating and anthologising each text, creating their own personal response to their cultural literary heritage, which they can then share with others, both through the website and as an exportable text document.

As you can see, with thirty thousand Euros for the overall winner, we decided to think very big. The full text, including a roadmap is available online. Many thanks to Jason Kitkat and Thomas Kandler who gave up their time to proofread and suggest improvements.

The Winnings: Funding Improvements to OKF Services

The first step towards Open Reading was always to improve the two services it proposed marrying: the Annotator and the Public Domain Calculators. With this in mind we intend to use our winnings to help achieve the following goals, although more ideas are always welcome:

  • Offer bounties for flow charts regarding the public domain in as yet unexamined jurisdictions.
  • Contribute, perhaps, to the bounties already available for implementing flowcharts into code.
  • Offer mini-rewards for the identification and assessment of new metadata databases.
  • Modify the annotator store back-end to allow collections.
  • Make the importation and exportation of annotations easier.

Please don’t hesitate to get in touch if any of this is of interest. An Open Humanities Skype meeting will be held on 20th November 2011 at 3pm GMT.

Open Shakespeare at OKCon 2011

- July 3, 2011 in Musings, News, Shakespeare, Technical

OKCon 2011, at the Kalkscheune buildings in Berlin, was fantastic, and I thought it would be a good idea to publish a few reflections on some of the stuff that was going on there, both for the benefit of those who did not make it nor watch the live feeds, and for the chance it offers of mapping Open Shakespeare’s position in the wider Open Knowledge community.

Rufus Pollock provided the opening address, pointing out how the convergence of the two phenomena of greater data availability and advanced computing power had created the perfect conditions for openness to flourish. He announced one such flourishing in the form of, which came online at the start of the conference. His next point was to argue that the focus of activities in the community was moving from making data accessible to providing tools for and building communities around that data. Of course, the quantity problem is only half solved (a later speaker pointed out the small quantities of open government data in Asia, for example), but was still at a point where data cycles (ecosystems of community, tools and data) could be founded. This last point fits neatly with Open Shakespeare, since the project is slowly forming just such a cycle: early editions of Shakespeare’s plays are open data, and a small community is either building tools (like the annotator) or using them to create more content about Shakespeare’s works, which in turn offers new programming challenges and so completes the circle.

Glyn Moody’s keynote talk, immediately following Rufus’, approached the topic of Open Knowledge from a different angle, by analysing the current situation in terms of a new abundance which placed pressure on systems, such as the UK’s copyright law, designed for eighteenth-century conditions of scarcity. Although Moody did not mention it, Shakespeare himself was something of a forerunner in this domain: the “fourteen years plus fourteen more” model of copyright established in 1710 was the result of bookseller lobbying, not least that of Jacob Tonson, eager to protect his monopoly on the works of Shakespeare and others (notably Milton, and Dryden’s translations of Virgil). Having sketched out his model of abundance and scarcity, Moody concluded with the provocative question of how open projects would function without copyright, pointing out that many in fact depend upon restrictive legislation as their *raison d’être*. The only answer that I can give is that open projects would perhaps continue as the first models of communities where exchange and collaboration are well established (as in Open Shakespeare), that is to say, continuing as, in other words, those “data cycles” and “ecosystems” that Pollock had described as the successors to the victories of open data availability.

Later on in the conference, in the second track of talks, a panel on ‘Data Journalism: What Next?’ provided considerable food for thought on the topic of communities, much of it served up by the Guardian’s Simon Rogers. It was he, for example, that questioned the merits of crowd-sourcing, arguing that it did not provide objective data, since its contributors could be extremely biased, an MP participating, for instance, in the crowd-sourced analysis of his own expenses. This point was backed up by Stefan Candea, with both he and Simon Rogers emphasising the important labour that remained for the journalist when it came to looking over crowd-sourced responses and shaping them into a story. A neat example of this was the Guardian’s exploration of Sarah Palin’s emails, where users were directed to a random email and then asked to signal anything of interest. Although not flawless (one imagines a Palin aide slaving away to hide significant correspondence), its randomness nevertheless provided an even coverage of the files. This randomness might be an important tool for Open Shakespeare’s own crowd-sourcing of annotations, as a way of directing users to annotate less-appreciated works. As regards the verifiability of these annotations, Open Shakespeare has the problematic luxury of considering subjective opinion on the Bard’s art as valid as objective facts about it, since these opinions map the contours of contemporary attitudes to Shakespeare. Further, the intense subjectivity of responses to art means that such subjective annotations do not suffer from the problem of verifiability, because no such critical response has ever been verifiable (for those interested, this line of argument is behind Kant’s description of “universal subjective validity” in his *Critique of the Power of Judgment*).

It is on this idea of subjective annotation, the generation of subjective data, that I would like to bring this summary to a close. The conference was on Open Knowledge, but it is significant that I found the adjective to have been discussed far more often than the noun. Open Shakespeare’s annotation system, the tool that generates its data cycle, provides both verifiable information (“mirth in funeral” is an example of “synoeciosis” in *Hamlet*) and subjective opinion (“Words, words, words” is, for one user, “one of the most human lines in the play”). Is the second still data? I would argue that it is, but it is of a kind rarely discussed in Berlin. After all, what are we to do with it in order to integrate it back into the system of open data? Such opinion does not atomise easily, just as Shakespeare’s own words resist, with their context and their double meanings, computerised analysis. We can count the instances of the word “prune”, but it takes an article on the subject to bring out the humour from the information generated by the open-source tool. That article itself is data and can be itself the launch pad for new responses, but it moves the axis of the cycle away from developers’ tools and their data and towards the perspective of the user and, more broadly, that of the community. Rufus Pollock was right to argue for the existence of ecosystems of open data, but the case of Open Shakespeare shows that they can only be fully functional if all three elements are given their full weight: tools, data, and users together.

How to Participate in the Annotation Sprint

- February 5, 2011 in Community, Publicity, Technical

The votes are in! We are annotating Hamlet

Until 11:30am you can: Vote for the play to be annotated

Any feedback, or thoughts? Use the etherpad to leave your thoughts about the event.

## How to Participate

### Step 0: Check your browser

To participate in the annotation sprint, you will **need a recent version of Firefox or Chrome or Safari**.

### Step One: Login to Open Shakespeare [optional]

**[optional]: you don’t need to login — but if you don’t your contributions will be anonymous.**

To login you’ll need to obtain an OpenID if you don’t have one. Here’s how:

1. Visit

2. Click on the button ‘Sign up for an OpenID’

3. Follow their instructions to create an OpenID by which you will be known when annotating

Now you’ve got an OpenID you can login:

1. Go to our login page

2. Click on the ‘OpenID’ button

3. Copy and paste, or type out your OpenID, which looks like a web address

### Step Two: Start Annotating!

1. Go to our works page and click on ‘annotate’ beneath the chosen play

2. All the instructions are written on the side of the page in the ‘Annotation: Howto’ column

Online Editions of Shakespeare

- January 15, 2011 in Community, Musings, Technical, Texts

The story of Shakespeare on the internet is a tangled tale, and this post is an attempt to unravel it. In expounding the advantages and shortcomings of online editions, I hope also to explain a few of the problems Open Shakespeare faces.

##Editions Used by Open Shakespeare

Every work on the Open Shakespeare website has three possible texts, and it is worth explaining their provenance here in detail:

GUTENBURG FOLIO – These are drawn from Project Gutenberg, with the editorial prefaces removed. Nothing else has been changed. The Gutenberg scanner claims that the text “is as close as I can come in ASCII to the printed text,” however it is important to record here several features of his methodology.
– Some spelling “mistakes” have been corrected according to a dictionary created from the spellings of the Geneva Bible and Shakespeare’s First Folio.
– Typos and abbreviations have also been “corrected”
– “Elongated S’s have been changed to small s’s and the conjoined ae have been changed to ae.”
– The actual text itself is composite, made from “30 different First Folio editions’ best pages”

GUTENBERG – Again taken from Project Gutenberg, this time from a more fully edited edition, with a cleaner layout, and the inclusion of 18th century stage directions. Open Shakespeare, as is usual for us, has removed all the prefatory material but kept the edited text as is. Unfortunately, nothing is disclosed about the process of editing or the source texts used except for the single phrase “This etext was prepared by the PG Shakespeare Team, a team of about twenty Project Gutenberg volunteers.”

MOBY – This text comes from the most widely available online edition of Shakespeare, of whose advantages and shortcomings there is a useful summary on the Open Source Shakespeare website.

##Other Online Editions: ISE and Wordhoard


The principle website for online editions of Shakespeare is ISE (Internet Shakespeare Editions) where the following are offered, taking their entry for Hamlet as an example:

TEXT EDITIONS – These cover modern spelling and unmodified spelling versions based on the first folio and quarto 1 and 2, all of which have been edited. In the case of *Hamlet* this editing has been done by David Bevington, a scholar of some note. For other editions, the editors are less well known, and in many cases there has not yet been a peer review.

FACSIMILES – This is perhaps the real strength of ISE: several different First Folios have been scanned, and the results are very impressive. They also have facsimiles of the 1603 and 1604 quartos of Hamlet.

ANNOTATED EDITIONS – One of these does not yet exist for *Hamlet*, but David Bevington has again produced a useful peer-reviewed edition of *As You Like It*, on which one can toggle his annotations and record of collations.

COPYRIGHT – Everything on the ISE is under a variety of copyrights. The copyright for the edited texts uis owned by the editor, and the images that make up the facsimiles have a rather ambiguous copyright situation, depending on their source. Although, ISE state, “All items published on the site of the Internet Shakespeare Editions…may in all cases…be used for educational, non-profit purposes”, quite where an Open License website like our own fits in is deeply ambiguous, since material published on our website could feasibly be used for commercial purposes.


Provided by Northwestern University, this website provides a set of texts worthy to serve as definitive online editions of Shakespeare. Along with other authors’ works, one can download two versions of Shakespeare’s writings: one encoded in TEI, the other linguistically annotated – which is to say every word in the text is associated with a lemma and part of speech.

For me, the most exciting part of this project is the way in which these lemmatized texts can be manipulated. Northwestern University gives one example: a short program written to answer the question ‘Does Shakespeare use mostly the same vocabulary in each of his works, or does he use different vocabulary?’. I recommend visiting the website for the answer, and for a wealth of other little bits of information about Shakespeare’s vocabulary.

The copyright position of the wordhoard project is complicated. However, the website’s stance is far more ‘open’ than that of the ISE, so collaboration between Wordhoard and Open Shakespeare may be a possibility in the future.

Shakespeare Quarterly part II

- April 6, 2010 in Community, Musings, Publicity, Technical, Texts

Here, for those interested, is my response to Professor Andrew Murphy’s article in the Shakespeare Quarterly:

“I am a member of the Open Shakespeare Project ( – not to be confused with Open Source Shakespeare) and found this article extremely interesting. I feel that your conclusion points towards many of the approaches to Shakespeare that our project incorporates, and that are part of a more ’social’ approach to Shakespeare.

It occurs to me that as well as spreading Shakespeare to a far larger audience, cheap editions of Shakespeare are also a godsend for students, who may write their thoughts all over their pages without fear of ruining something expensive. If all these scribbles were collected, a formidable body of knowledge of Shakespeare would be available, as would an evolving record of responses to this writer.

Our site has recently acquired the ability for anyone to annotate Shakespeare’s works, and soon will add the capacity to attribute, tag, sort, and hide the annotations made. With this we hope to create an ‘open’ edition of Shakespeare’s plays that would grow along similar lines to Wikipedia, harnessing the power of the internet to bring many minds to bear upon a single subject.

Such problems as found with the OSS still pose difficulties for us: we have to use Moby as a source text since all others, including (lamentably) the wordhoard text, are under copyrights that conflict with our Open license. Nevertheless, just as textual problems are flagged up in a critical edition with a footnote, so too could such problems be drawn to the reader’s attention through annotation. As Whitney Trettien’s article points out, the web comes into its own when it is an ‘expressive medium’ itself, and not one which, like the OSS, unthinkingly delivers content.

Essentially, ISE already has this kind of thinking process, displaying an editor’s annotation on each text right down to the textual variants. It even has the ability to sort such annotations. However, the problems you identify – different kinds of editing, slow progress, uneven quality – all inevitably result, I feel, from the fact that each text only has a single editor. More editors would speed progress but it is not, of course, a given that more editors would improve quality. Wikipedia is still notorious for its occasional inaccuracies.

Nevertheless, such inaccuracies can be resolved by the same process that generates them. If anyone can annotate, so anyone can also review annotation and improve it. I realise that this is a rather utopian position and that people can as easily vandalise as beautify, but I feel it to be a more tenable one than that held by the websites here. The internet allows for unprecedented levels of input as well as appreciation, and such potential is not exploited by the sites reviewed in this article.

Talking of input and appreciation brings me to one further aspect of these sites that interests me, namely how easily one can print from them. The OSS shines in this respect, but attempting to print an ISE fascimile is rather more difficult. I must also admit that printing from an annotated text at The Open Shakespeare Project is currently impossible: the tool only went live fairly recently, and the site is still very much under construction. One day we hope to harness the accumulated and peer-reviewed annotations of many to produce a printed text, and thus complete a cycle between internet and ‘real world’ Shakespeare.

Such a cycle is ignored at the peril of digital scholarship, for it is the mix of real events and online responses to them that makes Facebook so addictive. Other addictive qualities, such as the relatively small time commitment and the chance to interact with other users could be profitably replicated by internet Shakespeare projects. After all, anything capable of sustaining those involved in the long task of making productive use of Shakespeare is always welcome and need not be to the detriment academic rigour.”

Here is the author’s reply:

James: thanks very much for this thoughtful and very interesting response to the review. I’ve had a quick look at your site and think it’s very interesting. It seems to me that you really are pushing forward with a Web 2.0 approach to things, making your site a good deal more interactive than the three I review here.

I like the idea of building up a ‘database’ of annotations — and you’re right, of course: textual annotation might be a way round the problems of having to use an outdated source text. I still tend to worry about Wikipedia as a model, however. I always like to tell my students stories of humourous examples of deliberate tampering with Wikipedia, as a way of warning them off using it in their research (perhaps you may know what happened to Thierry Henry’s page, after France put Ireland out of the World Cup?).

Will OSP be entirely ‘user governed’, or will you have some sort of ‘top down’ quality control mechanisms?


The discussion raises some interesting issues. How bitesize and user friendly is our website? To what extent should ‘Open Shakespeare’ be user-governed? Any comments and suggestions you may have will be very welcome.

Annotation is here!

- March 16, 2010 in Community, News, Releases, Technical, Texts

The fabled ability to annotate any text of Shakespeare is now part of the Open Shakespeare website!
Massive thanks to Nick for all his work on something far too complex for me to even describe its complexity (apparently there were difficulties with there being ‘no TextRange in the DOM’).

Here’s how to get annotating:

> 1. Click ‘read texts’ on the homepage.
> 2. Scroll down to find your play of choice in the list and click on ‘annotate’.
> 3. Find the line you wish to annotate, then highlight it, then click on the little notepad that appears.
> 4. In the newly-present dialogue box, type your words of wisdom.
> 5. Press enter to save your annotation and close the dialogue box.

Work has already begun on *Hamlet*, but feel free to annotate wherever you wish.

As to what you should write in an annotation, we currently have no guidelines: shorter is usually better, and, obviously, offensive comments will be removed – but apart from that, all insights and explications are very welcome.

Improvements to come include: restricting editing and deletion to the owner of each annotation, showing user information on annotations, the ability to filter annotations, and the capacity to use markdown in each comment.