Data Ignota: Chapters

Mathieu Glachant

Origins of This Data

This data is generated by extracting all TEI <div type="chapter"> child nodes of <body> nodes in the Digital Edition of Terra Ignota.

What Are `<div type="chapter">` Nodes For?

These <div type="chapter"> nodes contain the text of the chapters in each book.

Example Nodes

Here’s a simplified view of what these nodes look like:

<text type="book" n="1">
  <body>
    <div n="1" type="chapter" resp="#MaG">
      <head type="chapterNumber">Chapter the FIRST</head>
      <head type="chapterTitle">A Prayer to the Reader</head>
      <p>You will criticize me, reader, ...</p>
    ...
  </body>
<text>

The data dictionary below maps the information above to a column in the data file.

Get the Data

The data extracted from these <div type="chapter"> nodes is available as a CSV file.

Download Link

Download the data

Last Updated

This file was last updated on 2022-11-03.

Raw Data Generation

The raw data is first extracted from the <div type="chapter"> nodes using an Xquery script.

Xquery Script

For easy ingestion with the XML package in R, the script’s output has a <records> root node and one <chapter> node per chapter in the original text.

xquery version "3.1";

declare namespace tei = "http://www.tei-c.org/ns/1.0";
declare variable $doc := doc("PATH_TO_TEI_FILE");

<records>
  {
  for $chapter in $doc//tei:body//tei:div[@type = "chapter"]
  return 
  <chapter>
    ...
    A node per column in the output file, see below for details
    ...
  </chapter>
  }
</records>

XML Output File

The Xquery script outputs an XML output file of the form:

<?xml version="1.0" encoding="UTF-8"?>
<records>
   <chapter>
      <book>1</book>
      <chapter>1</chapter>
      <number>Chapter the FIRST</number>
      <title>A Prayer to the Reader</title>
      <editor>#MaG</editor>
   </chapter>
   
   etc...
   
</records>

Data Prep

This XML output file must then be cleaned before being saved to the csv file provided above.

Clean: Fix Data Types and NA Values

First, the correct data types must be set for each column and missing values set to NA. This is very easy using the XML and tidyverse packages.

chapters <- xmlToDataFrame(xml_path) %>%
  mutate(
    # Missing values must be _NA_
    book = na_if(book, "NA"),
    chapter = na_if(chapter, "NA"),
    number = na_if(number, "NA"),
    title = na_if(title, "NA"),
    editor = na_if(editor, "NA"),
    # Column data types must be correct
    book = as.integer(book),
    chapter = as.integer(chapter),
  )

NB: Luckily for you, when you read in this data as a CSV file the readr package is smart enough to correctly guess on all of this.

Editing Progress

Too Like the Lightning

	Chapter	Edited by
1.01	A Prayer to the Read	#MaG
1.02	A Boy and His God	#MaG
1.03	The Most Important P	#MaG
1.04	A Thing Long Thought	#MaG
1.05	Aristotle’s House	#MaG
1.06	Rome Was Not Built i	#MaG
1.07	Canis Domini	#MaG
1.08	A Place of Honor	#MaG
1.09	Every Soul That Ever	#MaG
1.10	The Sun Awaits His R	#MaG
1.11	Enter Sniper	#MaG
1.12	Neither Earth nor At	#MaG
1.13	… Perhaps the Stars	#MaG

Data Dictionary

List of the columns in the data file explaining what they mean and how they were generated.

`book`

The number of the book within the series.

Required, numeric.

Derived from parameter n of the <text type="book"> ancestor node of the chapter’s node.

<book>
  {
  data($chapter/ancestor::tei:text[@type = "book"]/@n)
  }
</book>

`chapter`

The number of the chapter within its book.

Required, numeric.

Derived from parameter n of the chapter node itself.

<chapter>
  {
  if ($chapter/@n) 
  then data($chapter/@n) 
  else "NA"
  }
</chapter>

`number`

The text giving the number of the chapter within its book. This changes over the course of the series.

Alphanumeric.

Derived from the content of the child <head type="chapterNumber"> node of the chapter’s node.

<number>
  {
  if ($chapter/tei:head[@type = "chapterNumber"]) 
  then data($chapter/tei:head[@type = "chapterNumber"]) 
  else "NA"  
  }
</number>

`title`

The title of the chapter.

Alphanumeric.

Derived from the content of the child <head type="chapterTitle"> node of the chapter’s node.

<title>
  {
  if ($chapter/tei:head[@type = "chapterTitle"]) 
  then data($chapter/tei:head[@type = "chapterTitle"]) 
  else "NA" 
  }
</title>

`editor`

The editor responsible for the TEI markup for the chapter.

Alphanumeric, required. NA indicates editing has not begun for that chapter.

Derived from the resp attribute of the chapter’s node.

<editor>
  {
  if ($chapter/@resp)
  then data($chapter/@resp)
  else "NA"
  }
</editor>

Chapters

Origins of This Data

What Are `<div type="chapter">` Nodes For?

Example Nodes

Get the Data

Download Link

Last Updated

Raw Data Generation

Xquery Script

XML Output File

Data Prep

Clean: Fix Data Types and NA Values

Editing Progress

Too Like the Lightning

Data Dictionary

`book`

`chapter`

`number`

`title`

`editor`

Corrections

Citation

Chapters

Origins of This Data

What Are <div type="chapter"> Nodes For?

Example Nodes

Get the Data

Download Link

Last Updated

Raw Data Generation

Xquery Script

XML Output File

Data Prep

Clean: Fix Data Types and NA Values

Editing Progress

Too Like the Lightning

Data Dictionary

book

chapter

number

title

editor

Corrections

Citation

What Are `<div type="chapter">` Nodes For?

`book`

`chapter`

`number`

`title`

`editor`