A Data-Ship

Fit to Sail a Sea of Words

I have read Ada Palmer’s Terra Ignota series a few times1 since it came out - alone or with friends - because on every reread I discover new ideas, new calls forward or back within the text or outside of it, and new structural complexities.

I am, to be frank, obsessed with it, and I have the years-long chat logs, the dedicated Zettelkasten vault, and the nigh-unreadable ebooks bursting with rainbow-hued highlights to prove it.

A Ship Worthy of Apollo’s Call

This project is an attempt to upgrade from those first hollowed logs to a ship worthy of the perils and mysteries we gentle readers face when we heed Apollo’s call to push away from shore and sail toward terra ignota.

The Analyses

I will post demos and examples of the analyses that can be performed on the data generated from this digital edition of the novels. As these accumulate, they should progressively build a map of the feasible… more of a spur to your own explorations than anything definitive or comprehensive.

The Dev Log

I will also keep a dev log along this voyage. This will provide a more sequential view of what we found when, any shoals we hit along the way, and any issues with the crew’s morale or theology which might crop up.

The Dashboard

To help with navigation, I will set up a dashboard page to display the progress made and list any failed sanity checks & smoke tests with the latest version of the data.

The Data

This website will have links to data files generated for and used in the analyses. Feel free to download and re-use for non-commercial use with attribution.

What Is This Ship Made of?

TEI Encoding

I am encoding the novels in the TEI2 dialect of XML, as this is the de facto standard for text analysis in the Digital Humanities3.

This is truly a labor of love4: going over the text, line by line, and marking it up with various xml elements. This is also the step where most errors will be introduced by yours truly, altho some of those errors can be detected by sanity check queries.

Xquery

It’s worth it, tho, because then I can use Xquery5 to ask questions like:

R

To analyze this data further and display the results, I use the R language6. In particular I am applying the Tidy Data paradigm7 as best I understand it. This is a new language for me and this serves as practical hands-on learning for me.

TEI Publisher

Lastly, I can load this Digital Edition into TEI Publisher8 an application running on top of the XML database eXist-db9 which will display the text with various enhanced features based on the markup, such as highlighting names and displaying popups with their metadata on hover.

For interest, here is a copy of the ODD configuration file for this digital edition.

The novels are under copyright, so there are parts of this project I can share and others I cannot.

The project therefore has two repositories on Github:

What I Will Share in the Public Repo

What I Will Not Share


  1. This project counts as my eighth reread of the first book in the series, Too Like the Lighting.↩︎

  2. Text Encoding Initiative: https://tei-c.org/↩︎

  3. an area of scholarly activity at the intersection of computing and the disciplines of the humanities. https://en.wikipedia.org/wiki/Digital_humanities↩︎

  4. It took a month of sustained effort to mark up the first twelve chapters of Too Like the Lightning.↩︎

  5. a query and functional programming language that queries and transforms collections of structured and unstructured data, usually in the form of XML. https://en.wikipedia.org/wiki/XQuery↩︎

  6. a language and environment for statistical computing and graphics. https://www.r-project.org/↩︎

  7. Tidy datasets are easy to manipulate, model and visualize, and have a specific structure: each variable is a column, each observation is a row, and each type of observational unit is a table. https://vita.had.co.nz/papers/tidy-data.html↩︎

  8. http://teipublisher.com/↩︎

  9. http://exist-db.org/↩︎

Corrections

If you see mistakes or want to suggest changes, please create an issue on the source repository.