A few visualizations based on the speaker
and addressee
columns.
Our said.csv
data file contains information about who said a line of dialog, and to whom. What can we do with this data for the chapters edited to date, i.e. the first twelve?
EDIT 2022-10-16: Let’s also use the new characters.csv
file to pull in names and emojis for characters.
knitr::opts_chunk$set(dev = "ragg_png") # Use Ragg device so emojis work
library(tidyverse)
library(gt) # For pretty tables
library(tidygraph) # For network graphs
library(ggraph) # To plot the network graphs
PATH_TO_SAID <- "../../data/said.csv"
PATH_TO_CHARACTERS <- "../../data/characters.csv"
said <- read.csv(PATH_TO_SAID) %>%
filter(
book == 1,
chapter <= 12,
# Let's remove the asides to the reader from this analysis
speaker != "#reader",
addressee != "#reader",
)
dram_pers <- read.csv(PATH_TO_CHARACTERS) %>%
select(-fullName)
Let’s build a list of characters with a speaking part.
We need to:
Here are the top five out of 38 speakers:
Name | Emoji | speaks |
---|---|---|
Carlyle | 🙏 | 246 |
Mycroft | ✍️ | 218 |
Bridger | 🧸 | 89 |
Thisbe | 🦨 | 80 |
Vivien | 🧮 | 77 |
We can do the same for characters who are directly addressed.
top_addressees <- said %>%
# Group & summarize
# to drop rows irrelevant to `addressee`
group_by(line, addressee) %>%
summarize(.groups = "drop") %>%
# Group & summarize
# to count lines addressed to character
group_by(id = addressee) %>%
summarize(spokenTo = n()) %>%
# Arrange by count
arrange(desc(spokenTo))
Here are the top five out of 50 addressees:
Name | Emoji | spokenTo |
---|---|---|
Carlyle | 🙏 | 258 |
Mycroft | ✍️ | 235 |
Bridger | 🧸 | 99 |
Vivien | 🧮 | 81 |
Ganymede | 🌞 | 80 |
We can now combine the two lists using a join1 by the character
column.
Tables are nice and all, but this seems like a good time for a graph. Here are the top 20 out of 51 characters by lines of dialog:
No real surprise there, Carlyle and Mycroft are our two chatter boxes, even when we remove the reader from the conversation, and it falls off pretty fast after that. Another Power Law Distribution?
It looks like some characters speaks more or less than they are spoken to. Anyone stand out in particular?
Looks like MASON speaks less than he is spoken to. This probably reflects how much ’splaining Ganymede and Andō have to do once he arrives on stage. At the other extreme, the Major speaks much more than he is spoken to. He can be rather intimidating, after all!
Ok, looking at the speaker
and addressee
columns separately is nice, and so is comparing their totals per character… but we can do more than that, right?
What happens if we look at pairs of characters speaking to each other?
Let’s build a list of the pairs of speakers and addressees with at least one line of dialog. We’ll also keep track of total line and word count.
pair_list <- said %>%
# Group & summarize
# to drop rows irrelevant to speaker or addressee
group_by(line, speaker, addressee, words) %>%
summarize(.groups = "drop") %>%
# Group & summarize
# to add line and word counts
group_by(speaker, addressee) %>%
summarize(
lines = n(),
words = sum(words),
.groups = "drop"
) %>%
# Arrange by count
arrange(desc(lines))
That’s it, it’s that easy!
Here are the top four directed pairs by number of lines, out of 206:
Speaker | Addressee | Lines | Words |
---|---|---|---|
Carlyle | Bridger | 72 | 1864 |
Bridger | Carlyle | 71 | 965 |
Carlyle | Mycroft | 63 | 745 |
Mycroft | Carlyle | 63 | 1520 |
Note that both conversations are balanced by line count but not by word count? It’s more like two-to-one, reflecting how one character is explaining things and answering questions for the other.
We can draw some radar plots showing the total word count our top four speakers direct at our top six addressees:
See below for the emoji mapping for the addressees. We are missing an emoji for Martin at 11 o’clock.
Everybody speaks to Carlyle, but Carlyle mostly speaks to Mycroft and Bridger. Only Mycroft speaks to the outside world.
This works pretty well for small groups of characters but it won’t scale well to the 38 speaking parts we have so far, which is only going to grow as we add chapters.
We have a list of pairs of characters engaged in dialog with a few different measures of how much dialog it was… that’s all we need to build a directed, weighted network graph, but we’ll also pass in our list of characters for convenience.
network_graph <- tbl_graph(
edges = pair_list,
nodes = characters,
node_key = "id",
)
That’s all it takes to make a network graph, which we can then visualize in various ways, e.g. a hairball of characters with at least fifteen spoken lines of dialog:
See below for the emoji mapping for the other characters. We are missing emojis for Lesley, Martin, and Su-Hyeon, going from left to right.
Mycroft is the hinge between several conversations, one dominated by Carlyle currently taking place at the Saneer-Weeksbooth bash’house, another one over at Ganymede’s palace during the Renunciation Day party, and some more isolated pairs around the periphery, like Ockham and Martin or Dominic and Lesley.
Emoji | Name |
---|---|
🙏 | Carlyle |
✍️ | Mycroft |
🧸 | Bridger |
🦨 | Thisbe |
🧮 | Vivien |
🌞 | Ganymede |
NA | Martin |
🪒 | Ockham |
👸 | Danaë |
🌸 | Andō |
🤺 | Dominic |
NA | Lesley |
🎯 | Sniper |
💡 | Eureka |
💊 | Kosala |
🪖 | the Major |
👑 | Spain |
🧱 | MASON |
NA | Su-Hyeon |
If we use a full_join()
any characters that are not on both lists will be kept, but with an NA for the missing count.↩︎
If you see mistakes or want to suggest changes, please create an issue on the source repository.
For attribution, please cite this work as
Glachant (2022, Oct. 11). Data Ignota: Who Talks to Whom?. Retrieved from https://syvwlch.github.io/Data-Ignota/viz/2022-10-11-who-talks-to-whom/
BibTeX citation
@misc{glachant2022who, author = {Glachant, Mathieu}, title = {Data Ignota: Who Talks to Whom?}, url = {https://syvwlch.github.io/Data-Ignota/viz/2022-10-11-who-talks-to-whom/}, year = {2022} }