Watch the Episode First! It's a friendly suggestion...

The final #GoT episode was transmitted last Sunday, now two years waiting for the next season... How can HBO be so cruel??? And how can I find interesting content for my future blog posts???
At least now European football (not soccer) leagues are back, so TV-side I'm covered!

via GIPHY

Going back to serious discussions, Game of Thrones last episode: Yay or Nay? The average sentiment for the episode (taking into account only tweets since Monday) was `-0.012`: it is negative but represents an improvement when compared to the two previous ones (with episode 6 having the most negative sentiment score).

But... Hey! What is the line on top going in time? The line it's due to the external R call and the fact that is forcing us to include the `Tweet Text` column in the analysis in order to be evaluated. The evaluation of the sentiment is applied on `ATTR(Tweet Text)` which means kind of `SELECT DISTINCT Tweet_Text` in Oracle terms. The line on top is drawn because the same `Tweet Text` was tweeted across several weeks.

Please notice that the three overall sentiments are close (between `0.01` and `0.10`) so, when looking in detail at the distribution of sentiment scores across the episodes we can see that, as expected, are similar.

Zooming to single characters we can see the scatterplot of the last episode, with Jon Snow (or should I say Targaryen?) leading the number of mentions with surprisingly Littlefinger on the second spot and Arya on the third: probably the Baelish dying scene at Winterfell was something highly appreciated by the fans.

On the positive negative feeling almost nothing changed with Arya and the Night King being the negative and positive poles. I've been telling you about change of leadership on the various axes of the scatterplot by visually comparing today's scatterplot with the previous two. However the transition of the character position in the graph can be visualized again on multiple scatterplots.

By creating a scatterplot for each character and assigning to the episodes a different number (`E05`-`1`, `E06`-`2`, `E07`-`3`) I can clearly see how Davos Seaworth for example had a big sentiment variation going very positive in the last episode while Jaime Lanninster was more stable. Zooming into Davos position we can see how the sentiment distribution changed across episodes with the E06 representing the most negative while the E07 has almost all positive tweets.

Looking at the words composing Davos tweets we can immediately spot few thigs:

• `SIR` has a positive sentiment (`Sir Davos` is how several characters call him) which is driving the overall score in the final episode
• The number of tweets mentioning Davos was very small in `E06` compared to the other two (we can see the same from the related scatterplot above)
• In `E07` we see a good number of circles having the same (big) size, possibly is the same text which has been tweeted several times.

To verify the last point we can simply show the `Tweet Text` along the `# of Tweets` and discover that almost the same positive Text count for over the 99% of the whole reference to the character.

Emotions

One of the cool functions of the Syuzhet package is named `get_nrc_sentiment` and allows the extrapolation of emotions from a text based on the NRC emotion lexicon. The function takes a text as input and returns a data frame containing a row for each sentence and a column for emotion or sentiment.
The sentiment can either be `positive` or `negative` which we already discussed a lot previously. The emotion is split in eight categories: anger, fear, anticipation, trust, surprise, sadness, joy, and disgust.

We can extract the eight different emotions into eight calculations with the following code

``````SCRIPT_INT("library(syuzhet);
r<-(get_nrc_sentiment(.arg1))\$anger",
ATTR([Text]))
``````

To calculate the `Anger Emotion Score` we are passing `ATTR(Text)`, the list of Tweet's texts, and taking the output of the `anger` column of the dataframe. We can do the same for all the other emotions and create separate graphs to show their average across characters for the last episode. In this case I took `Disgust`, `Anger`, `Fear`, `Joy` and `Trust`.

We can then clearly see that Bran Stark is the character that has most `Disgust` associated to. Bron has a special mix of emotions, he's in the top for `Anger`, `Fear` and `Joy`, such a mix can justify the average sentiment which is close to neutral (see scatterplot above). On the `Trust` side we can clearly see that the North wins with Arya and Sansa on the top, interesting here is to see also Lord Varys.
Looking into Bran `Disgust` detail we can see that is driven by the categorization of the `BRAN` word as disgusting, probably the dictionary doesn't like cereals.

Scene Emotions

In my previous post I've been talking about the "Game of Couples" and how a single character sentiment score could be impacted by a reference to a second character. For the last episode of the series I wanted to look at different scenes: the main characters I want to analyse are Jon Snow, Littlefinger and Sansa. Specifically I want to understand how people on Twitter reacted to the scenes where the two characters had a big impact: the death of Littlefinger declared by Sansa and the revelation of Jon Targaryen.

The first thing I wanted to check is the `Surprise`: How are characters categorized by this emotion? We can see Bron on top being driven by the word `GOOD` in the related tweets.

We can also notice that Petyr score is quite high (`0.2590` and 2nd position) while Jon score is pretty low, probably averaged by the huge number of tweets. We can also see that Sansa score is not very high, even if she is the character providing quite a big shock when accusing Littlefinger.

The overall character average surprise doesn't seem to be very relevant, we need to find a way to filter tweets related to those particular scenes: we can do that by including only few keywords in the Tweet Text. Please note we are going to filter words that will create an `OR` condition. If a tweet contain ANY of the words mentioned, it will be included.

First I wanted to check which are the words in Jon's tweets driving the `Surprise` sentiment alongside the `# of Tweets`

However this is only giving us details on which words are classified as `Surprise` for Jon, nothing really related to the scenes. I can however filter only the tweets with an overall `Surprise` sentiment for Jon and check which words are mostly associated with them. I also added a filter for Tweets containing the words `TARGARYEN` OR `SON` since I assumed those two could be more frequently used describing the scene.

We can clearly see some patterns that are well recognized correctly by the `Surprise` metric: both Aegon (a reference to Jon's real name) and Aunt (reference to Lyanna or Deanerys?) are in the top 20 and a little bit further right in the graph we can also spot Father. There probably is also some surprise in tweets related to what's going to happen when Jon finds out he's a Targaryen since all keywords are present in the top 20.

When doing a similar analysis on Sansa I wanted to add another metric to the picture: the `Average Sentence Emotion Score` for all sentences including a word. With this metric we can see how a word (for example `AMAZING`) changes the average emotion of the sentences where is included. Analysing this metric alone however wouldn't be useful: obviously the words having more impact on emotion are the ones categorized as such in the related dictionary.

I found interesting the following view for Sansa: we see across all the tweets categorized as `Surprizing`, which are the words most mentioned (Y-axes) and what's the average `Surprise` emotion value for the sentences were those words were included.

We can spot that `MURDER` and `TREASON` were included with a big number of tweets (>500) having an average `Surprise` score around `2`. This seems to indicate that the scene of Sansa convicting Lord Baelish wasn't expected from the fans.

One last graph shows how the character couples (remember the game of couples in my previous post?) have been perceived: the square color defines the average `Surprise` score while the position in the X-axis confidence (by the `# of Tweets`).

We can spot that the couple Cercei and Sansa is the one having most `Surprise` emotion, followed by Cercei and Daenerys. Those two couples may be expected since the single characters had major parts in the last episode. Something unexpected is the couple Sandor Clegane and Brienne, looking in detail, the surprise is driven by a mention to the word `MURDER` which is included in 57.76% of the Tweets mentioning both.

A last technical note: during the last few weeks I've collected about 700 thousands tweets, the time to analyse them highly depends on the complexity of the query. For simple counts or sums based only on BigQuery data I could obtain replies in few seconds. For other analysis, especially when sentiment or emotion was included, a big portion of the raw dataset was retrieved from BigQuery into Tableau, passed to R with the function results moved back to Tableau to be displayed. Those queries could take minutes to be evaluated.
As written in my previous blog post, the whole process could be speed up only by pre-processing the data and storing the sentiment/emotion in BigQuery alongside with the data.

my series of blog post about Game of Thrones tweet and press analysis with Kafka, BigQuery and Tableau! See you in two years for the analysis of the next season with probably a whole new set of technology!

via GIPHY