A map of where all this data comes from…
This project rests on a personal digital edition of the Terra Ignota series. All of the analyses use data that ultimately come from a single file containing the text of all four books marked up in a dialect of XML called TEI1.
This article explains the high-level structure of that file and serves as an orientation for those interested in where the data lives in relation to the text itself. Other articles in this collection will drill down into specific XML elements from the TEI schema.
All TEI documents must have the following structure:
TEI xmlns="http://www.tei-c.org/ns/1.0">
<teiHeader>
<
Metadata about this Digital EditionteiHeader>
</text>
<
Ada Palmer's text(s)text>
</TEI> </
The root is a <TEI>
node with a xmlns
parameter that declares the schema. The metadata about the file itself goes into the required <teiHeader>
element (sources, editorial and encoding decisions, change control, contributors, etc…) and the text(s) go into the aptly named <text>
node.
I have also included an optional <standoff>
node. This contains anything I want to add alongside the text to support my analysis, while keeping them separate:
TEI xmlns="http://www.tei-c.org/ns/1.0">
<teiHeader>
<
Metadata about this Digital EditionteiHeader>
</standoff>
<
My many long lists of things, people, places, and events I'm trackingstandoff>
</text>
<
Ada Palmer's text(s)text>
</TEI> </
This is particularly useful to create lists of <person>
,<place>
, <object>
, or <event>
nodes which I can then point to from within the text, e.g. each time a character is mentioned, I can mark that mention with a <persName>
tag pointing to that character’s <person>
entry in <standoff>
, to disambiguate who is being referred to:
standoff>
<person xml:id="Mycroft">
<name>Mycroft Canner</name>
<person>
</standoff>
</text>
<
....persName ref="#Mycroft">stray</persName>?"
"Where hast thou been, <
....text> </
I opted to group the four novels into a single file to simplify data retrieval and analysis across the series. I may change this later if the file just gets too cumbersome.
Because Terra Ignota is a series, I have structured the contents of the top-level <text>
node as a <group>
node for the series containing a child <text>
node for each novel. Each of these nodes has a a unique xml:id
parameter we can query or point to.
text>
<group xml:id="TerraIgnota">
<text xml:id="TooLikeTheLightning"></text>
<text xml:id="SevenSurrenders"></text>
<text xml:id="TheWillToBattle"></text>
<text xml:id="PerhapsTheStars"></text>
<group>
</text> </
Lastly, I have broken down each novel into different nodes for the front matter, the main body of the text, and the back matter. The <front>
node contains the title page, the dedication, epigraph, and permissions, the <body>
node contains the chapters, and the <back>
node contains the acknowledgments, author bio, etc…
text xml:id="TooLikeTheLightning">
<front>
<
Dedication
Permissions
Epigraph
Title Pagefront>
</body>
<
Chapter the First
Chapter the Second
etc...body>
</back>
<
Acknowledgments
Author's Bio
Copyright Noticeback>
</text> </
Putting it all together, we get the following structure against which to run our queries to extract that sweet, sweet data:
TEI xmlns="http://www.tei-c.org/ns/1.0">
<teiHeader>
<
Metadata about this Digital EditionteiHeader>
</standoff>
<
My many long lists of things, people, places, and events I'm trackingstandoff>
</text>
<group xml:id="TerraIgnota">
<text xml:id="TooLikeTheLightning">
<front></front>
<body></body>
<back></back>
<text>
</text xml:id="SevenSurrenders">
<front></front>
<body></body>
<back></back>
<text>
</text xml:id="TheWillToBattle">
<front></front>
<body></body>
<back></back>
<text>
</text xml:id="PerhapsTheStars">
<front></front>
<body></body>
<back></back>
<text>
</group>
</text>
</TEI> </
Text Encoding Initiative: https://tei-c.org/↩︎
If you see mistakes or want to suggest changes, please create an issue on the source repository.
For attribution, please cite this work as
Glachant (2022, Oct. 2). Data Ignota: TEI - High Level Structure. Retrieved from https://syvwlch.github.io/Data-Ignota/tei/tei-structure/
BibTeX citation
@misc{glachant2022tei, author = {Glachant, Mathieu}, title = {Data Ignota: TEI - High Level Structure}, url = {https://syvwlch.github.io/Data-Ignota/tei/tei-structure/}, year = {2022} }