Data Ignota: Books

Mathieu Glachant

Origins of This Data

This data is generated by extracting all TEI <text type="book"> nodes in the Digital Edition of Terra Ignota.

What Are `<text>` Nodes For?

These <text type="book"> nodes contain the text of the novels in the series.

Example Nodes

Here’s a simplified view of what these nodes look like:

<text xml:id="TLtL" type="book" n="1">
  <front>
    ...
    <div>
      <title type="main">TOO LIKE THE LIGHTNING</title>
      <title type="desc">A Narrative of Events of the year 2454</title>
      <byline>Written by MYCROFT CANNER, at the request of certain parties.</byline>
    </div>
    ...
  </front>
  <body>
    <div n="1" type="chapter"></div>
    ...
  </body>
<text>

The data dictionary below maps the information above to a column in the data file.

Get the Data

The data extracted from these <text type="book"> nodes is available as a CSV file.

Download Link

Download the data

Last Updated

This file was last updated on 2022-10-14.

Raw Data Generation

The raw data is first extracted from the <text type="book"> nodes using an Xquery script.

Xquery Script

For easy ingestion with the XML package in R, the script’s output has a <records> root node and one <book> node per book in the original text.

xquery version "3.1";

declare namespace tei = "http://www.tei-c.org/ns/1.0";
declare variable $doc := doc("PATH_TO_TEI_FILE");

<records>
  {
  for $book in $doc//tei:text[@type = "book"]
  return 
  <book>
    ...
    A node per column in the output file, see below for details
    ...
  </book>
  }
</records>

XML Output File

The Xquery script outputs an XML output file of the form:

<?xml version="1.0" encoding="UTF-8"?>
<records>
  <book>
    <book>1</book>
    <id>TLtL</id>
    <title>TOO LIKE THE LIGHTNING</title>
    <subtitle>A Narrative of Events of the year 2454</subtitle>
    <byline>Written by MYCROFT CANNER, at the request of certain parties.</byline>
  </book>
   
   etc...
   
</records>

Data Prep

This XML output file must then be cleaned before being saved to the csv file provided above.

Clean: Fix Data Types and NA Values

First, the correct data types must be set for each column and missing values set to NA. This is very easy using the XML and tidyverse packages.

books <- xmlToDataFrame(xml_path) %>%
  mutate(
    # Missing values must be _NA_
    book = na_if(book, "NA"),
    id = na_if(id, "NA"),
    title = na_if(title, "NA"),
    subtitle = na_if(subtitle, "NA"),
    byline = na_if(byline, "NA"),
    # Column data types must be correct
    book = as.integer(book),
  )

NB: Luckily for you, when you read in this data as a CSV file the readr package is smart enough to correctly guess on all of this.

Editing Progress

There are currently 4 of 4 books present in the Digital Edition.

Data Dictionary

List of the columns in the data file explaining what they mean and how they were generated.

`book`

The number of the book within the series.

Required, numeric.

Derived from parameter n of the <text type="book"> node itself.

<book>
  {
  if ($book/@n) 
  then data($book/@n) 
  else "NA"
  }
</book>

`id`

A unique identifier used within the Digital Edition.

Required, unique, must conform to xml:id requirements, e.g. can’t start with a number.

Derived from parameter xml:id of the <text type="book"> node itself.

<id>
  {
  if ($book/@xml:id) 
  then data($book/@xml:id) 
  else "NA"
  }
</id>

`title`

The title of the book.

Alphanumeric

Derived from the content of the child <title type="main"> node of the book’s <front> node.

<title>
  {
  if ($book/tei:front//tei:title[@type = "main"]) 
  then data($book/tei:front//tei:title[@type = "main"]) 
  else "NA"
  }
</title>

`subtitle`

The subtitle of the book.

Alphanumeric

Derived from the content of the child <title type="desc"> node of the book’s <front> node.

<subtitle>
  {
  if ($book/tei:front//tei:title[@type = "desc"]) 
  then data($book/tei:front//tei:title[@type = "desc"]) 
  else "NA"
  }
</subtitle>

`byline`

The byline of the book.

Alphanumeric

Derived from the content of the child <byline> node of the book’s <front> node.

<byline>
  {
  if ($book/tei:front//tei:byline) 
  then data($book/tei:front//tei:byline) 
  else "NA"
  }
</byline>

Books

Origins of This Data

What Are `<text>` Nodes For?

Example Nodes

Get the Data

Download Link

Last Updated

Raw Data Generation

Xquery Script

XML Output File

Data Prep

Clean: Fix Data Types and NA Values

Editing Progress

Data Dictionary

`book`

`id`

`title`

`subtitle`

`byline`

Corrections

Citation

Books

Origins of This Data

What Are <text> Nodes For?

Example Nodes

Get the Data

Download Link

Last Updated

Raw Data Generation

Xquery Script

XML Output File

Data Prep

Clean: Fix Data Types and NA Values

Editing Progress

Data Dictionary

book

id

title

subtitle

byline

Corrections

Citation

What Are `<text>` Nodes For?

`book`

`id`

`title`

`subtitle`

`byline`