Artificial Textual Tradition: Julius Caesar

This data set was created in a project led by Tuomas Heikkilä and Teemu Roos at the University of Helsinki. You can use it freely in your work as long as you attribute the data to its source (presently this blog). We are working on an article which, once appeared, can be referred to in published works.

We are (finally) pleased to announce our latest artificial manuscript tradition “Julius Caesar”, based on Shakespeare’s play with the same name. The tradition includes 64 manuscripts with 1626 words each on the average. It is therefore the most extensive artificial tradition available, and we hope to learn lots of interesting things about how our methods work and how they can be improved by applying them to the tradition.

The tradition was created by hand-copying a part of the play (mainly Act I, Scene II), and then repeatedly making new copies based on the earlier copies. The total number of manuscripts thus created was 95, out of which 31 were held back to simulate a more realistic scenario where not all of the manuscripts are extant. Furthermore, most of the remaining 64 manuscripts were partially deleted to make them appear as real fragmentary manuscripts.

We provide the data as pre-aligned plain text, as well as in Nexus format where each unique word per position is converted to a different letter.

Hence, the text format looks like this:

Eh Hu Ye Ad Zi Vo
JULIUS JULIUS
CAESAR CAESAR
[ [
ACT ACT
.
1 I
. .
]
men men men men men Men
as as as as as as
ever ever ever ever ever ever
trod trod trod trod trod tred
upon upon upon upon upon upon
neat’s neat’s neat’s neat’s neat’s veal’n
leather leather leather leather leather leather

While the Nexus formatted file looks like this:

BEGIN DATA;
      DIMENSIONS NTAX = 64 NCHAR = 4917;
      FORMAT SYMBOLS = "acdefghiklmnpqrstwy" LABELS = LEFT;
      MATRIX
           Eh      rrnrrrrnqnrqncdrnnrrr??????????????????
           Hu      rrnr?nr?qnrqncdnnnrrr????????????dr?rgc
           Ye      ????????qnrqncdnnnrrr????????????dr?rgc
           Ad      ????????qnrqncdnnnrrr????????????dr??qc
           Zi      ????????qnrqncdnnnrrr????????????dr?rgc
           Vo      ????????dnrdnhdnnnrrr????????????drrren

The correct stemma, according to which the manuscripts were copied, is below:

juliuscaesar1

The nodes represent manuscripts (labelled with random labels), and edges indicate the exemplar-copy relations. Manuscripts held back from the data appear as points where two or more arrows touch without there being a node inbetween (note that the graph is a little bit misleading just above node Ki where the arrow overlaps with the arrow leading down to Oy,Id,Aq without there being an intermediate node representing a held-back manuscript). As you can see, there are many instances of contamination and multifurcation. The coloring scheme is chosen only for ease of interpretation — it is advisable to use the same colors in estimated stemmata.

We hope you will find it interesting to apply your favorite stemmatological methods to the data, and would be very happy to hear your comments.

Download:

Note: To properly open the file aligned text file in a spreadsheet application (such as Excel or OpenOffice), you may need to “import” the data instead of “opening” it. The import functionality can usually be found under File or Data menu title. Set the file type as “CSV” or “Text”. Select “Tab” as Field delimiter, and select the choice “None” for Text qualifier. Do not select the option “Treat consecutive delimiters as one”.

Contact:

  • Teemu Roos teemu.roos@cs.helsinki.fi
  • Tuomas Heikkilä tuomas.m.heikkila@helsinki.fi
Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s