The characters in the series, useful for joins to get names, emojis, etc…
This data is generated by extracting all TEI <person>
descendant nodes of a <listPerson type="characters">
node in the <standOff>
node of the Digital Edition of Terra Ignota.
<person>
Nodes For?These <person>
nodes contain information about characters in the series. This is metadata added by the editor1, like everything else in <standOff>
.
The <listPerson type="characters">
node lists persons who are present in the text, “on stage” as it were. This list has sub-lists for convenience in editing, e.g. the Saneer-Weeksbooth ’bash or Danaë’s brood, hence the need to include all descendant <person>
nodes, not just children.
Note that there are other lists of persons in the metadata, e.g. which contain fictional people added as part of world-building or actual historical people referenced in the text. These lists are siblings of the list of characters within <standOff>
.
Here’s a simplified view of what these nodes look like:
standOff>
<listPerson type="characters">
<person xml:id="Mycroft">
<persName type="emoji">✍</persName>
<persName type="short">Mycroft</persName>
<persName type="primary">
<forename>Mycroft</forename>
<surname>Canner</surname>
<persName>
</person>
</
...listPerson>
</
...standOff> </
The data dictionary below maps the information above to a column in the data file.
The data extracted from these <person>
nodes is available as a CSV file.
This file was last updated on 2022-11-03.
The raw data is first extracted from the <person>
nodes using an Xquery script.
For easy ingestion with the XML
package in R, the script’s output has a <records>
root node and one <character>
node per character in the original text.
xquery version "3.1";
declare namespace tei = "http://www.tei-c.org/ns/1.0";
declare variable $doc := doc("PATH_TO_TEI_FILE");
records>
<
{
for $character in $doc//tei:listPerson[@type = "characters"]//tei:person
return character>
<
...
A node per column in the output file, see below for details
...character>
</
}records> </
The Xquery script outputs an XML output file of the form:
<?xml version="1.0" encoding="UTF-8"?>
records>
<character>
<id>#Mycroft</id>
<emoji>✍</emoji>
<name>Mycroft</name>
<fullName>Mycroft Canner</fullName>
<sameAs>NA</sameAs>
<character>
</
...records> </
This XML output file must then be cleaned before being saved to the csv file provided above.
First, missing values set to NA. This is very easy using the XML
and tidyverse
packages.
NB: Luckily for you, when you read in this data as a CSV file the readr
package is smart enough to correctly guess on all of this.
Second, any rows containing delimited lists2 must be3 separated into multiple rows.
In the case of our characters, this would only happen if a <person>
node had more than one sameAs
alternate identities. There are currently 2 characters with alternate identities but 0 of them have more than one.
characters <- characters %>%
# Break space-delimited columns across multiple rows
separate_rows(sameAs, sep = " ")
There are currently 82 characters in the file, including 12 generic ones like ‘unknown Junior Scientist’ or ‘unknown Servicer’.
Names are required and there are currently 0 character(s) without one.
Emojis are optional and there are currently 31 characters with one.
List of the columns in the data file explaining what they mean and how they were generated.
id
A unique identifier used within the Digital Edition.
Required, unique, must conform to xml:id
requirements, e.g. can’t start with a number.
Derived from parameter xml:id
of the <person>
node itself, with the #
tagged onto the front to match the ref
attribute syntax used when pointing to this characters, e.g. in a line of dialog.
id>
<
#{data($character/@xml:id)}id> </
emoji
An emoji evocative of the character, used for visualizations where space is at a premium.
Optional, unique when it exists.
Derived from the <persName type="emoji">
child node of the <person>
node itself.
emoji>
<
{
if ($character/tei:persName[@type = "emoji"])
then normalize-space(data($character/tei:persName[@type = "emoji"]))
else "NA"
}emoji> </
name
A short name for the character, attested in the text.
Alphanumeric, required. NA means the character has neither short nor full name in the metadata, an omission by the editor.
Derived from the <persName type="short">
child node of the <person>
node itself, or if it does not exist, the full name (see below).
name>
<
{
if ($character/tei:persName[@type = "short"])
then normalize-space(data($character/tei:persName[@type = "short"]))
else if ($character/tei:persName[@type = "primary"])
then normalize-space(data($character/tei:persName[@type = "primary"]))
else "NA"
}name> </
fullName
A full name for the character, attested in the book.
Alphanumeric, required. NA means the character has no full name in the metadata, an omission by the editor.
Derived from the <persName type="primary">
child node of the <person>
node itself.
fullName>
<
{
if ($character/tei:persName[@type = "primary"])
then normalize-space(data($character/tei:persName[@type = "primary"]))
else "NA"
}fullName> </
sameAs
The unique identifier of another character that is actually the same person, aka an alternate identity. Alternate identities result in multiple rows for the same person.
Optional, NA
indicates no alternate identity exists for that person.
Derived from parameter sameAs
of the <person>
node itself.
sameAs>
<
{
if ($character/@sameAs)
then data($character/@sameAs)
else "NA"
}sameAs> </
If you see mistakes or want to suggest changes, please create an issue on the source repository.
For attribution, please cite this work as
Glachant (2022, Oct. 15). Data Ignota: Characters. Retrieved from https://syvwlch.github.io/Data-Ignota/tei/characters/
BibTeX citation
@misc{glachant2022characters, author = {Glachant, Mathieu}, title = {Data Ignota: Characters}, url = {https://syvwlch.github.io/Data-Ignota/tei/characters/}, year = {2022} }