Chapters

The chapters, useful for joins to get titles, etc…

Mathieu Glachant true
2022-10-14

Origins of This Data

This data is generated by extracting all TEI <div type="chapter"> child nodes of <body> nodes in the Digital Edition of Terra Ignota.

What Are <div type="chapter"> Nodes For?

These <div type="chapter"> nodes contain the text of the chapters in each book.

Example Nodes

Here’s a simplified view of what these nodes look like:

<text type="book" n="1">
  <body>
    <div n="1" type="chapter" resp="#MaG">
      <head type="chapterNumber">Chapter the FIRST</head>
      <head type="chapterTitle">A Prayer to the Reader</head>
      <p>You will criticize me, reader, ...</p>
    ...
  </body>
<text>

The data dictionary below maps the information above to a column in the data file.

Get the Data

The data extracted from these <div type="chapter"> nodes is available as a CSV file.

Download the data

Last Updated

This file was last updated on 2022-11-03.

Raw Data Generation

The raw data is first extracted from the <div type="chapter"> nodes using an Xquery script.

Xquery Script

For easy ingestion with the XML package in R, the script’s output has a <records> root node and one <chapter> node per chapter in the original text.

xquery version "3.1";

declare namespace tei = "http://www.tei-c.org/ns/1.0";
declare variable $doc := doc("PATH_TO_TEI_FILE");

<records>
  {
  for $chapter in $doc//tei:body//tei:div[@type = "chapter"]
  return 
  <chapter>
    ...
    A node per column in the output file, see below for details
    ...
  </chapter>
  }
</records>

XML Output File

The Xquery script outputs an XML output file of the form:

<?xml version="1.0" encoding="UTF-8"?>
<records>
   <chapter>
      <book>1</book>
      <chapter>1</chapter>
      <number>Chapter the FIRST</number>
      <title>A Prayer to the Reader</title>
      <editor>#MaG</editor>
   </chapter>
   
   etc...
   
</records>

Data Prep

This XML output file must then be cleaned before being saved to the csv file provided above.

Clean: Fix Data Types and NA Values

First, the correct data types must be set for each column and missing values set to NA. This is very easy using the XML and tidyverse packages.

chapters <- xmlToDataFrame(xml_path) %>%
  mutate(
    # Missing values must be _NA_
    book = na_if(book, "NA"),
    chapter = na_if(chapter, "NA"),
    number = na_if(number, "NA"),
    title = na_if(title, "NA"),
    editor = na_if(editor, "NA"),
    # Column data types must be correct
    book = as.integer(book),
    chapter = as.integer(chapter),
  )

NB: Luckily for you, when you read in this data as a CSV file the readr package is smart enough to correctly guess on all of this.

Editing Progress

Too Like the Lightning

Chapter Edited by
1.01 A Prayer to the Read #MaG
1.02 A Boy and His God #MaG
1.03 The Most Important P #MaG
1.04 A Thing Long Thought #MaG
1.05 Aristotle’s House #MaG
1.06 Rome Was Not Built i #MaG
1.07 Canis Domini #MaG
1.08 A Place of Honor #MaG
1.09 Every Soul That Ever #MaG
1.10 The Sun Awaits His R #MaG
1.11 Enter Sniper #MaG
1.12 Neither Earth nor At #MaG
1.13  … Perhaps the Stars #MaG

Data Dictionary

List of the columns in the data file explaining what they mean and how they were generated.

book

The number of the book within the series.

Required, numeric.

Derived from parameter n of the <text type="book"> ancestor node of the chapter’s node.

<book>
  {
  data($chapter/ancestor::tei:text[@type = "book"]/@n)
  }
</book>

chapter

The number of the chapter within its book.

Required, numeric.

Derived from parameter n of the chapter node itself.

<chapter>
  {
  if ($chapter/@n) 
  then data($chapter/@n) 
  else "NA"
  }
</chapter>

number

The text giving the number of the chapter within its book. This changes over the course of the series.

Alphanumeric.

Derived from the content of the child <head type="chapterNumber"> node of the chapter’s node.

<number>
  {
  if ($chapter/tei:head[@type = "chapterNumber"]) 
  then data($chapter/tei:head[@type = "chapterNumber"]) 
  else "NA"  
  }
</number>

title

The title of the chapter.

Alphanumeric.

Derived from the content of the child <head type="chapterTitle"> node of the chapter’s node.

<title>
  {
  if ($chapter/tei:head[@type = "chapterTitle"]) 
  then data($chapter/tei:head[@type = "chapterTitle"]) 
  else "NA" 
  }
</title>

editor

The editor responsible for the TEI markup for the chapter.

Alphanumeric, required. NA indicates editing has not begun for that chapter.

Derived from the resp attribute of the chapter’s node.

<editor>
  {
  if ($chapter/@resp)
  then data($chapter/@resp)
  else "NA"
  }
</editor>

Corrections

If you see mistakes or want to suggest changes, please create an issue on the source repository.

Citation

For attribution, please cite this work as

Glachant (2022, Oct. 14). Data Ignota: Chapters. Retrieved from https://syvwlch.github.io/Data-Ignota/tei/chapters/

BibTeX citation

@misc{glachant2022chapters,
  author = {Glachant, Mathieu},
  title = {Data Ignota: Chapters},
  url = {https://syvwlch.github.io/Data-Ignota/tei/chapters/},
  year = {2022}
}