tidytext

Some Basic Text on the Mueller Report

So this Robert Mueller guy wrote a report I may as well analyse it a bit. First, let me see if I can get a hold of the data. I grabbed the report directly from the Department of Justice website. You can follow this link. library(tidyverse) library(pdftools) # Download report from link above mueller_report_txt <- pdf_text("../data/report.pdf") # Create a tibble of the text with line numbers and pages mueller_report <- tibble( page = 1:length(mueller_report_txt), text = mueller_report_txt) %>% separate_rows(text, sep = "\n") %>% group_by(page) %>% mutate(line = row_number()) %>% ungroup() %>% select(page, line, text) write_csv(mueller_report, "data/mueller_report.

Trump's Tweets, Part II

Trump’s Tone A cool post on sentiment analysis can be found here. I will now get at the time series characteristics of his tweets and the sentiment stuff. I start by loading the tmls object that I created in the previous post. Trump’s Overall Tweeting What does it look like? library(tidyverse) library(tidytext) library(SnowballC) library(tm) library(syuzhet) library(rtweet) load(url("https://github.com/robertwwalker/academic-mymod/raw/master/data/TMLS.RData")) names(tml.djt) ## [1] "user_id" "status_id" ## [3] "created_at" "screen_name" ## [5] "text" "source" ## [7] "display_text_width" "reply_to_status_id" ## [9] "reply_to_user_id" "reply_to_screen_name" ## [11] "is_quote" "is_retweet" ## [13] "favorite_count" "retweet_count" ## [15] "hashtags" "symbols" ## [17] "urls_url" "urls_t.

Trump Tweet Word Clouds

Mining Twitter Data Is rather easy. You have to arrange a developer account with Twitter and set up an app. After that, Twitter gives you access to a consumer key and secret and an access token and access secret. My tool of choice for this is rtweet because it automagically processes tweet elements and makes them easy to slice and dice. I also played with twitteR but it was harder to work with for what I wanted.