Encoding verse digitally

(This is one of the two topics for the workshop The structure of verse.)

Now that digital data is easily available, it becomes crucial to develop efficient ways of storing, sharing and exploiting it. Verse is still most often than not stored only in printed form, and easily automatisable analytical tasks are still performed by hand. Fortunately, more and more verse corpora are being built at different institutions (e.g. Czech, Estonian, Dutch). The main goal of this workshop is to bring together different initiatives and ideas so that encoding standards are developed in the benefit of the whole research community.

For encoding standards to be useful, several aspects can be considered. The system should be flexible, so that the typological diversity can be covered (spoken, chanted and sung verse, quantitative and stress systems, etc.). Flexibility can be ensured by a modular and extensible approach, where researchers develop sub-modules as required by the data. Formats should also be maximally compatible, easily convertible and platform-independent. This guarantees the circulation of corpora and the reproducibility of research. Encoding standards also give the chance to share and reuse scripts for recurrent analyses, such as the identification of stress mismatches or rhymes.

As a side topic, we are also interested in how to build typological databases of metrical forms. For both the main and the side topics, fruitful insights can be drawn from similar initiatives in two sister disciplines: musicology (MEI, MusicXML, music21), and linguistics (Phoible, Autotyp, WALS).

With this in mind, we invite contributions on the following and related topics.

  • Which features of verse are to be encoded, and how? E.g. phonological layer: phonemes, syllables, intonational phrases, stress, quantity, etc; metrical layer: metrical positions, feet, lines, stanzas, etc; musical layer: metrical-grid prominence, durations, pitch-classes, motifs, etc.
  • How to integrate the different layers of information? E.g. one metrical position may contain more than one syllable, or none; one syllable may be associated to different notes, each with a different metrical prominence, etc.
  • Which are the optimal file formats in terms of flexibility, compatibility and extensibility?
  • How to automatise the extraction of analytical features from plain text? E.g. automatic syllabification, scansion.
  • How to automatise the analysis of encoded verse? E.g. rule violations, textsetting mismatches, pattern extraction, statistical tendencies.
  • Typological databases of metrical forms. Which descriptive traits are required? How to encode non-discrete or variable features? Which are the optimal file formats?