Added Some Data Files

Useful lists for joins to add book or chapter titles, or character names…

Mathieu Glachant https://github.com/syvwlch
2022-10-15

It took more effort to explain the files here on the website than to generate them via Xquery from the Digital Edition itself… but decent documentation is not a nice-to-have it’s a must-have.

What It Took to Get Here

Enhancements:

For all three, that meant:

  1. Write an Xquery script, which
  2. Generates an XML output file, which
  3. An Rmarkdown post here, which
  1. Grabs, cleans, and saves this output to a *.csv file,
  2. Provides the download link,
  3. Displays some stats on the contents,
  4. Explains the origin of the data, and
  5. Provides a data dictionary explaining each column in the data.

Lessons Learned

While it’s tempting to put data about chapters in the books file, or about pages in both, with tidy data that would add a ton of rows, cleaner to have a separate file with the pages, and have those point up to books and chapters.

Open Questions?

The only time it makes sense to have multiple rows per unit of observation is when the relationship is inherently many-to-many, I think?

The current structure of the said.csv file, for example, might be warranted, or it might be better to have a file for speakers, another for addressee, and certainly for mentions of persons or orgs (since those can happen outside quoted speech) and have those point to said.csv.

To be revisited once I have a sturdy unique reference structure for the lines that isn’t generated on the fly when building said.csv.

Corrections

If you see mistakes or want to suggest changes, please create an issue on the source repository.

Citation

For attribution, please cite this work as

Glachant (2022, Oct. 15). Data Ignota: Added Some Data Files. Retrieved from https://syvwlch.github.io/Data-Ignota/posts/2022-10-15-added-some-data-files/

BibTeX citation

@misc{glachant2022added,
  author = {Glachant, Mathieu},
  title = {Data Ignota: Added Some Data Files},
  url = {https://syvwlch.github.io/Data-Ignota/posts/2022-10-15-added-some-data-files/},
  year = {2022}
}