Trump Tweet Word Clouds
Mining Twitter Data
Is rather easy. You have to arrange a developer account with Twitter and set up an app. After that, Twitter gives you access to a consumer key and secret and an access token and access secret. My tool of choice for this is rtweet because it automagically processes tweet elements and makes them easy to slice and dice. I also played with twitteR
but it was harder to work with for what I wanted. The first section involves setting up a token for `rtweet.
# Change the next four lines based on your own consumer_key, consume_secret, access_token, and access_secret.
token <- create_token(
app = "MyAppName",
consumer_key <- "CK",
consumer_secret <- "CS",
access_token <- "AT",
access_secret <- "AS")
Now I want to collect some tweets from a particular user’s timeline and look into them. For this example, I will use @realDonaldTrump
.
Who does Trump tweet about?
A cool post on sentiment analysis can be found here. The first step is to grab his timeline. rtweet
makes this quite easy. I will grab it and then save it in the code below so that I do not spam the API. I will get at the time series characteristics of his tweets and the sentiment stuff in a further analysis. For now, let me just show some wordclouds.
tml.djt <- get_timeline("realDonaldTrump", n = 3200)
save(tml.djt, file="../data/TMLS.RData")
I start by loading the tmls object that I created above. What does it look like?
library(wordcloud2)
library(tidyverse)
library(tidytext)
library(rtweet)
load(url("https://github.com/robertwwalker/academic-mymod/raw/master/data/TMLS.RData"))
names(tml.djt)
## [1] "user_id" "status_id"
## [3] "created_at" "screen_name"
## [5] "text" "source"
## [7] "display_text_width" "reply_to_status_id"
## [9] "reply_to_user_id" "reply_to_screen_name"
## [11] "is_quote" "is_retweet"
## [13] "favorite_count" "retweet_count"
## [15] "hashtags" "symbols"
## [17] "urls_url" "urls_t.co"
## [19] "urls_expanded_url" "media_url"
## [21] "media_t.co" "media_expanded_url"
## [23] "media_type" "ext_media_url"
## [25] "ext_media_t.co" "ext_media_expanded_url"
## [27] "ext_media_type" "mentions_user_id"
## [29] "mentions_screen_name" "lang"
## [31] "quoted_status_id" "quoted_text"
## [33] "quoted_created_at" "quoted_source"
## [35] "quoted_favorite_count" "quoted_retweet_count"
## [37] "quoted_user_id" "quoted_screen_name"
## [39] "quoted_name" "quoted_followers_count"
## [41] "quoted_friends_count" "quoted_statuses_count"
## [43] "quoted_location" "quoted_description"
## [45] "quoted_verified" "retweet_status_id"
## [47] "retweet_text" "retweet_created_at"
## [49] "retweet_source" "retweet_favorite_count"
## [51] "retweet_retweet_count" "retweet_user_id"
## [53] "retweet_screen_name" "retweet_name"
## [55] "retweet_followers_count" "retweet_friends_count"
## [57] "retweet_statuses_count" "retweet_location"
## [59] "retweet_description" "retweet_verified"
## [61] "place_url" "place_name"
## [63] "place_full_name" "place_type"
## [65] "country" "country_code"
## [67] "geo_coords" "coords_coords"
## [69] "bbox_coords" "status_url"
## [71] "name" "location"
## [73] "description" "url"
## [75] "protected" "followers_count"
## [77] "friends_count" "listed_count"
## [79] "statuses_count" "favourites_count"
## [81] "account_created_at" "verified"
## [83] "profile_url" "profile_expanded_url"
## [85] "account_lang" "profile_banner_url"
## [87] "profile_background_url" "profile_image_url"
I want to first get rid of retweets to render President Trump in his own voice.
DJTDF <- tml.djt %>% filter(is_retweet==FALSE)
With just his tweets, a few things can be easily accomplished. Who does he mention?
library(wordcloud)
## Loading required package: RColorBrewer
MNTDJT <- DJTDF %>% filter(!is.na(mentions_screen_name)) %>% select(mentions_screen_name)
Ments <- as.character(unlist(MNTDJT))
TMents <- data.frame(table(Ments))
pal <- brewer.pal(8,"Spectral")
wordcloud(TMents$Ments,TMents$Freq, colors=pal)
That’s interesting. But that is twitter accounts. That is far less interesting that his actual text. I want to look at words and bigrams for this segment.
What does Trump tweet about?
Some more stuff from stack overflow. There is quite a bit of code in here. I simply wrote a function that takes an input character string and cleans it up. Uncomment the various components and pipe them. The sequencing is important and I found this to get everything that I wanted.
library(RColorBrewer)
TDF <- DJTDF %>% select(text)
# TDF contains the text of tweets.
library(stringr)
tweet_cleaner <- function(text) {
temp1 <- str_replace_all(text, "&", "") %>%
str_replace_all(., "https://t+", "") %>%
str_replace_all(.,"@[a-z,A-Z]*","")
# str_replace_all(., "[[:punct:]]", "")
# str_replace_all(., "[[:digit:]]", "") %>%
# str_replace_all(., "[ \t]{2,}", "") %>%
# str_replace_all(., "^\\s+|\\s+$", "") %>%
# str_replace_all(., " "," ") %>%
# str_replace_all(., "http://t.co/[a-z,A-Z,0-9]*{8}","")
# str_replace_all(.,"RT @[a-z,A-Z]*: ","") %>%
# str_replace_all(.,"#[a-z,A-Z]*","")
return(temp1)
}
clean_tweets <- data.frame(text=sapply(1:dim(TDF)[[1]], function(x) {tweet_cleaner(TDF[x,"text"])}))
clean_tweets$text <- as.character(clean_tweets$text)
Trumps.Words <- clean_tweets %>% unnest_tokens(., word, text) %>% anti_join(stop_words, "word")
TTW <- table(Trumps.Words)
TTW <- TTW[order(TTW, decreasing = T)]
TTW <- data.frame(TTW)
names(TTW) <- c("word","freq")
wordcloud(TTW$word, TTW$freq)
Well, that is kinda cool. Now, I want to do a bit more with it using more complicated word combinations.
The Wonders of tidytext
The tidytext section on n-grams is great. I will start with a tweet identifier – something I should have deployed long ago – before parsing these; I will not need this now but it will be encessary when the sentiment stuff comes around.
library(tidyr)
CT <- clean_tweets %>% mutate(tweetno= row_number())
DJT2G <- clean_tweets %>% unnest_tokens(bigram, text, token = "ngrams", n=2)
bigrams_separated <- DJT2G %>%
separate(bigram, c("word1", "word2"), sep = " ")
bigrams_filtered <- bigrams_separated %>%
filter(!word1 %in% stop_words$word) %>%
filter(!word2 %in% stop_words$word)
# new bigram counts:
bigram_counts <- bigrams_filtered %>%
count(word1, word2, sort = TRUE)
bigram_counts
## # A tibble: 10,514 x 3
## word1 word2 n
## <chr> <chr> <int>
## 1 fake news 160
## 2 witch hunt 128
## 3 north korea 84
## 4 white house 71
## 5 news media 56
## 6 total endorsement 49
## 7 law enforcement 47
## 8 crooked hillary 43
## 9 supreme court 39
## 10 border security 38
## # … with 10,504 more rows
bigrams_united <- bigrams_filtered %>%
unite(bigram, word1, word2, sep = " ")
my.df <- data.frame(table(bigrams_united))
my.df <- my.df[order(my.df$Freq, decreasing=TRUE),]
my.df <- my.df[c(1:500),]
head(my.df)
## bigrams_united Freq
## 3599 fake news 160
## 10268 witch hunt 128
## 6583 north korea 84
## 10220 white house 71
## 6517 news media 56
## 9375 total endorsement 49
With that, we have the data for the bigram cloud.
library(wordcloud2)
wordcloud2(my.df, color="random-light", backgroundColor = "black")
After seeing a few competing renditions, I prefer wordcloud2
. One thing to be careful about is scaling. In this case, the most frequent bigram is missing because the ratio makes it too large to fit. With size smaller, it can be made to show. It appears that embedding multiple of these in one post does not render. I will stick with the one correct one.
library(wordcloud2)
hhww <- wordcloud2(my.df, color="random-light", backgroundColor = "black", size = 0.5)
library(widgetframe)
## Loading required package: htmlwidgets
frameWidget(hhww, width=600)
Getting this to work with frame widgets is tricky. I started something below but cannot seem to make it work so I am constrained to one wordcloud2 per document because they rely on underlying html rendering.
library(htmlwidgets)
library(webshot)
library(widgetframe)
hw1 <- wordcloud2(my.df, color="random-light", backgroundColor = "black", size = 0.5)
frameWidget(hw1, width=600)
I think that works quite nicely. The use of jpg for shapes has not worked for me. Nor has letterCloud. I found some code on github that will supposedly solve this but it does not seem to work either. It is supposed to render as an htmlwidget but something about that seems not to work properly.
library(htmlwidgets)
library(webshot)
library(widgetframe)
Ments.Tab <- data.frame(table(Ments))
Ments.Tab <- Ments.Tab[order(Ments.Tab$Freq, decreasing=TRUE),]
my.df.short <- my.df[c(1:40),]
hw1 <- letterCloud(Ments.Tab, "@", size=4, color='random-light')
frameWidget(hw1, width=600)