Bringing open tools to public-domain literature

You are browsing the archive for Texts.

Word of the Day: Baker

February 27, 2010 in Texts, Word of the Day

One of the raving Ophelia’s most mysterious lines goes:

Well, God dild you! They say the owl was a baker’s daughter. (4.v)

Ever wonder what she’s talking about?

This is a reference to popular the medieval legend of Jesus asking for a loaf at a baker’s. The folk story tells us that the mistress then dutifully put one in the oven for him, but the daughter said it was too large and halved it. However, it swelled to an enormous size, and the daughter was transformed into an owl as a punishment. Reference to the legend here is possibly also related to discussion of gratitude and ingratitude; in addition, the metamorphosis, which in Ovid often happens to a woman after some kind of sexual trauma, is linked to Ophelia’s unsure position and degeneration into madness.

And now you know! (courtesy of Jude and Colette)

XML and the Natural Language Toolkit

February 26, 2010 in Technical, Texts

I’ve been playing with the nltk (natural language toolkit) and the really useful Jon Bosak xml annotated corpus these days,  and  this are some of the graphs I’ve been able to parse after analyzing the speech of the main characters of the play (characters that say more than 100 lines of code:

exclamations and interrogations

exclamations and interrogations

Here we can see that Macduff is screaming a lot, and that when everybody talks is never to question, but to assert… Poor Macbeth and Lady Macduff question everything, while Lady Macbeth just as much as asserting.

Regarding amount of words in the play, by far Macbeth is the one that talks more:

amount of words spoken by main characters

amount of words spoken by main characters

But what about lexical variety? In this next graph, we can see the variety of the words:

Macbeth - lexical variety

Macbeth - lexical variety

Here we can see the variety of characters speech.

The brown-ish words are said just once per character. The light greens are word that will repeat on their speech, and the dark greens are repetitions of the light green words. I still need to take more measures to see if this is actually the way everybody speaks: by repeating a lot of small words with just some new words once in a while. (There are more words that appear just once, than the words you will repeat through most of your speech! Think about it!)

Shakespeare en Français

February 9, 2010 in News, Texts

Bonsoir tout le monde,

If you’ve ever wondered what Hamlet looks like in French, you can now find out via the Open Shakespeare website. The standalone text, based on Guizot’s translation of Shakespeare can be found here.

If you want to see how good a job Guizot did, you can compare the English Hamlet with the French one here.

There’s some work to do on streamlining the system to make uploading further translations a bit easier, but hopefully one day you’ll be able to trace Shakespeare’s progress around the globe through our website. (Please forgive the pun).

Pour l’instant, amusez-vous bien de Hamlet!

Introductions!

December 3, 2009 in News, Texts

Members of Open Shakespeare are gradually writing and uploading a series of short introductions for each of the plays. These will eventually be supplemented by longer critical introductions and general essays to enhance your reading. All of these introductions can, like the primary texts themselves, be annotated and edited by visitors to the site.

As an example, here’s the short intro to Measure for Measure:

http://www.openshakespeare.org/work/info/measure_for_measure

Enjoy reading!

Avatar of admin

by admin

Proof-Editing Shakespeare Entry from Encyclopaedia Britannica 11th Edition

September 19, 2007 in Texts

Since the previous post we’ve succeeded in using tesseract and we now have a nice plain text version of the EB entry on shakespeare:

http://knowledgeforge.net/shakespeare/svn/trunk/shksprdata/ancillary/britannica-11th.txt

What we now need to do is ‘proof’ this to correct the OCR errors. This kind of think is perfect for distributed volunteers so if you’d like to help out just step up and starting correcting with one of the sections. To make it especially easy for people to make edits the text has in a temporary location on the Open Knowledge Foundation wiki (only the first five pages for the time being):

http://okfn.org/wiki/tmp/BritannicaShakespeare

Avatar of admin

by admin

OCRing Shakespeare Entry from Encyclopaedia Britannica 11th Edition

August 14, 2007 in Technical, Texts

One of next things we want to do for open shakespeare is provide an open introduction for to his works. The obvious idea for this was to use the Shakespeare entry in the 11th ed of the Encyclopaedia Britannica as detailed in this ticket:

http://p.knowledgeforge.net/shakespeare/trac/ticket/24

We’ve now written code to grab the relevant tiffs off wikimedia:

http://p.knowledgeforge.net/shakespeare/svn/trunk/src/shakespeare/src/eb.py

You can also find them online (28 pages) starting at:

http://upload.wikimedia.org/wikipedia/commons/scans/EB1911_tiff/VOL24%20SAINTE-CLAIRE%20DEVILLE-SHUTTLE/ED4A800.TIF

Next step is to then OCR this stuff (after that we can move on to proofing whether by ourselves or via http://pgdp.net). When we first had a stab at this back in April we tried using gocr. Unfortunately the results were so bad that they were unusable. Recently an old ocr engine of HP’s has been released as open source under the name of tesseract:

http://code.google.com/p/tesseract-ocr/

We’re going to have a go using this — though if there is anyone out there with access to an alternative system we’d love to hear about it.