Books

The books in the series, useful for joins to get title, etc…

Mathieu Glachant true
2022-10-14

Origins of This Data

This data is generated by extracting all TEI <text type="book"> nodes in the Digital Edition of Terra Ignota.

What Are <text> Nodes For?

These <text type="book"> nodes contain the text of the novels in the series.

Example Nodes

Here’s a simplified view of what these nodes look like:

<text xml:id="TLtL" type="book" n="1">
  <front>
    ...
    <div>
      <title type="main">TOO LIKE THE LIGHTNING</title>
      <title type="desc">A Narrative of Events of the year 2454</title>
      <byline>Written by MYCROFT CANNER, at the request of certain parties.</byline>
    </div>
    ...
  </front>
  <body>
    <div n="1" type="chapter"></div>
    ...
  </body>
<text>

The data dictionary below maps the information above to a column in the data file.

Get the Data

The data extracted from these <text type="book"> nodes is available as a CSV file.

Download the data

Last Updated

This file was last updated on 2022-10-14.

Raw Data Generation

The raw data is first extracted from the <text type="book"> nodes using an Xquery script.

Xquery Script

For easy ingestion with the XML package in R, the script’s output has a <records> root node and one <book> node per book in the original text.

xquery version "3.1";

declare namespace tei = "http://www.tei-c.org/ns/1.0";
declare variable $doc := doc("PATH_TO_TEI_FILE");

<records>
  {
  for $book in $doc//tei:text[@type = "book"]
  return 
  <book>
    ...
    A node per column in the output file, see below for details
    ...
  </book>
  }
</records>

XML Output File

The Xquery script outputs an XML output file of the form:

<?xml version="1.0" encoding="UTF-8"?>
<records>
  <book>
    <book>1</book>
    <id>TLtL</id>
    <title>TOO LIKE THE LIGHTNING</title>
    <subtitle>A Narrative of Events of the year 2454</subtitle>
    <byline>Written by MYCROFT CANNER, at the request of certain parties.</byline>
  </book>
   
   etc...
   
</records>

Data Prep

This XML output file must then be cleaned before being saved to the csv file provided above.

Clean: Fix Data Types and NA Values

First, the correct data types must be set for each column and missing values set to NA. This is very easy using the XML and tidyverse packages.

books <- xmlToDataFrame(xml_path) %>%
  mutate(
    # Missing values must be _NA_
    book = na_if(book, "NA"),
    id = na_if(id, "NA"),
    title = na_if(title, "NA"),
    subtitle = na_if(subtitle, "NA"),
    byline = na_if(byline, "NA"),
    # Column data types must be correct
    book = as.integer(book),
  )

NB: Luckily for you, when you read in this data as a CSV file the readr package is smart enough to correctly guess on all of this.

Editing Progress

There are currently 4 of 4 books present in the Digital Edition.

Data Dictionary

List of the columns in the data file explaining what they mean and how they were generated.

book

The number of the book within the series.

Required, numeric.

Derived from parameter n of the <text type="book"> node itself.

<book>
  {
  if ($book/@n) 
  then data($book/@n) 
  else "NA"
  }
</book>

id

A unique identifier used within the Digital Edition.

Required, unique, must conform to xml:id requirements, e.g. can’t start with a number.

Derived from parameter xml:id of the <text type="book"> node itself.

<id>
  {
  if ($book/@xml:id) 
  then data($book/@xml:id) 
  else "NA"
  }
</id>

title

The title of the book.

Alphanumeric

Derived from the content of the child <title type="main"> node of the book’s <front> node.

<title>
  {
  if ($book/tei:front//tei:title[@type = "main"]) 
  then data($book/tei:front//tei:title[@type = "main"]) 
  else "NA"
  }
</title>

subtitle

The subtitle of the book.

Alphanumeric

Derived from the content of the child <title type="desc"> node of the book’s <front> node.

<subtitle>
  {
  if ($book/tei:front//tei:title[@type = "desc"]) 
  then data($book/tei:front//tei:title[@type = "desc"]) 
  else "NA"
  }
</subtitle>

byline

The byline of the book.

Alphanumeric

Derived from the content of the child <byline> node of the book’s <front> node.

<byline>
  {
  if ($book/tei:front//tei:byline) 
  then data($book/tei:front//tei:byline) 
  else "NA"
  }
</byline>

Corrections

If you see mistakes or want to suggest changes, please create an issue on the source repository.

Citation

For attribution, please cite this work as

Glachant (2022, Oct. 14). Data Ignota: Books. Retrieved from https://syvwlch.github.io/Data-Ignota/tei/books/

BibTeX citation

@misc{glachant2022books,
  author = {Glachant, Mathieu},
  title = {Data Ignota: Books},
  url = {https://syvwlch.github.io/Data-Ignota/tei/books/},
  year = {2022}
}