So this Robert Mueller guy wrote a report
I may as well analyse it a bit.
First, let me see if I can get a hold of the data. I grabbed the report directly from the Department of Justice website. You can follow this link.
library(tidyverse)
library(pdftools)
# Download report from link above
mueller_report_txt <- pdf_text("../data/report.pdf")
# Create a tibble of the text with line numbers and pages
mueller_report <- tibble(
page = 1:length(mueller_report_txt),
text = mueller_report_txt) %>% separate_rows(text, sep = "\n") %>% group_by(page) %>% mutate(line = row_number()) %>% ungroup() %>% select(page, line, text)
write_csv(mueller_report, "data/mueller_report.
Trump’s Tone
A cool post on sentiment analysis can be found here. I will now get at the time series characteristics of his tweets and the sentiment stuff.
I start by loading the tmls object that I created in the previous post.
Trump’s Overall Tweeting
What does it look like?
library(tidyverse)
library(tidytext)
library(SnowballC)
library(tm)
library(syuzhet)
library(rtweet)
load(url("https://github.com/robertwwalker/academic-mymod/raw/master/data/TMLS.RData"))
names(tml.djt)
## [1] "user_id" "status_id" ## [3] "created_at" "screen_name" ## [5] "text" "source" ## [7] "display_text_width" "reply_to_status_id" ## [9] "reply_to_user_id" "reply_to_screen_name" ## [11] "is_quote" "is_retweet" ## [13] "favorite_count" "retweet_count" ## [15] "hashtags" "symbols" ## [17] "urls_url" "urls_t.
Mining Twitter Data
Is rather easy. You have to arrange a developer account with Twitter and set up an app. After that, Twitter gives you access to a consumer key and secret and an access token and access secret. My tool of choice for this is rtweet because it automagically processes tweet elements and makes them easy to slice and dice. I also played with twitteR but it was harder to work with for what I wanted.