You will create a document showcasing your analysis of two TED talks and presenting a list of 10 qualities, techniques and/or presentation skills that made the presentations you watched inspiring, captivating, creative and effective
You will create a document showcasing your analysis of two TED talks and presenting a list of 10 qualities, techniques and/or presentation skills that made the presentations you watched inspiring, captivating, creative and effective (from your own perspective). The document should include an analysis and images for each TED Talk.
TED, which is an acronym standing for Technology, Entertainment and Design, “is a nonprofit devoted to spreading ideas.” A powerful way that TED works to achieve this mission of spreading ideas is through their TED Talks. TED Talks are short presentations on any topic imaginable. The common link, however, is that every TED Talk seeks to educate listeners on something new.
https://www.ted.com/about/our-organization
This project aims to analyze whether there is a correlation between the positive/negative sentiment of a TED Talk, and the popularity of the talk.
I predict that the sentiment in the top five most viewed talks will be more positive than negative. Conversely, the sentiment of the top five least popular TED Talks is negative. Viewers want to listen to TED Talks that are positive.
In order to run all of my analysis functions, I first had to run the necessary packages.
library(ggthemes)
library(ggplot2)
library(wordcloud2)
library(tidyverse)
library(stringr)
library(tidytext)
library(textdata)
After downloading the required packages, I then imported two datasets from Kaggle. The datasets from Kaggle include every TED Talk up until September 21, 2017. Below is the main dataset https://www.kaggle.com/rounakbanik/ted-talks?select=ted_main.csv Below is the dataset that includes the transcripts https://www.kaggle.com/rounakbanik/ted-talks?select=transcripts.csv
tedMain <- read.csv("~/Desktop/ted_main.csv", stringsAsFactors=FALSE)
tedTranscripts <- read.csv("~/Desktop/transcripts.csv", stringsAsFactors=FALSE)
After importing the two datasets, I then merged them.
tedTalks <- merge(tedMain, tedTranscripts, by = "url")
top_n(tedTalks,5,views) -> top5views
top_n(tedTalks, -5, views) -> bottom5views
Then, I ensured that, from the top and bottom viewed TED Talks, the ten most popular words were extracted. Before utilizing sentiment analysis on the five most popular words in each category, I wanted to see the top ten most popular to gain a larger understanding before closing in on five.
top5views %>%
unnest_tokens(word, transcript) ->top5words
top5words %>%
count(word, sort = TRUE) %>%
anti_join(stop_words) %>%
arrange(desc(n)) %>%
head(10) %>%
ggplot(aes(reorder(word, n), n)) +
geom_col() +
coord_flip() +
theme_calc()
bottom5views %>%
unnest_tokens(word, transcript) -> bottom5words
bottom5words %>%
count(word, sort = TRUE) %>%
anti_join(stop_words) %>%
arrange(desc(n)) %>%
head(10) %>%
ggplot(aes(reorder(word, n), n)) +
geom_col() +
coord_flip() +
theme_calc()
In order to run sentiment analyses with Afinn, Bing, and NRC, I first had to import the data sets, unnest the tokens, and filter out unnecessary words.
top5views %>%
unnest_tokens(word, transcript) %>%
anti_join(stop_words) %>%
filter(!word %in% c("laughter", "la", "music", "ha")) -> top5WordsFiltered
## Joining, by = "word"
bottom5views %>%
unnest_tokens(word, transcript) %>%
anti_join(stop_words) %>%
filter(!word %in% c("laughter", "la", "music", "ha")) -> bottom5WordsFiltered
## Joining, by = "word"
top5views %>%
unnest_tokens(word, transcript) %>%
anti_join(stop_words) %>%
filter(!word %in% c("laughter", "la", "music", "ha")) -> top5WordsFiltered
## Joining, by = "word"
Unnesting the tokens helped pull the top words from the specific category of ‘transcript’ from the datasets. Filtering helped ensure that noises were excluded from the analysis.
In order to understand which TED Talks are more positive, it is necessary to run Afinn sentiment analyses. The Afinn analyses help understand the mean sentiment of these TED Talks, thus providing better insight into which talks are more positive than others. The Afinn scale goes from -5 (most negative rating) to 5 (most positive rating). The mean of each TED Talk provides insight to which use more positive language than others.
Afinn top5words provides the five words that score highest on the sentiment analysis, from the top five viewed videos. Filtering the value and providing two afinn tables, one with sentiment values over 0 and one under 0, produces the five words that score highest in these filter categories. Afinn provides the top five words, with the highest positive sentiment score from the top five videos.
Afinn for the five most popular TED Talks. Mean = 0.38
top5words %>%
anti_join(stop_words) %>%
inner_join(get_sentiments("afinn")) ->top5words_afinn
## Joining, by = "word"
## Joining, by = "word"
mean(top5words_afinn$value)
## [1] 0.3848921
top5words_afinn %>%
filter(value > 0) %>%
count(word, sort = TRUE) %>%
head (5) %>%
knitr::kable()
word | n |
---|---|
love | 21 |
powerful | 15 |
applause | 14 |
feeling | 11 |
god | 7 |
Setting the value to greater than 0 collects all of the words that have a sentiment score above 0 (positive). Setting the value to less than 0 collects all of the words that have a sentiment score less than 0 (negative).
top5words_afinn %>%
filter(value < 0) %>%
count(word, sort = TRUE) %>%
head (5) %>%
knitr::kable()
word | n |
---|---|
vulnerability | 16 |
numb | 10 |
shame | 10 |
wrong | 10 |
dead | 9 |
Afinn for the five least popular TED Talks Mean=0.50
bottom5words %>%
anti_join(stop_words) %>%
inner_join(get_sentiments("afinn")) ->bottom5words_afinn
## Joining, by = "word"
## Joining, by = "word"
mean(bottom5words_afinn$value)
## [1] 0.505814
bottom5words_afinn %>%
filter(value > 0) %>%
count(word, sort = TRUE) %>%
head (5) %>%
knitr::kable()
word | n |
---|---|
god | 30 |
love | 15 |
compassionate | 8 |
advantage | 6 |
rich | 6 |
bottom5words_afinn %>%
filter(value < 0) %>%
count(word, sort = TRUE) %>%
head (5) %>%
knitr::kable()
word | n |
---|---|
fail | 4 |
wrong | 4 |
bad | 3 |
blah | 3 |
criminal | 3 |
After understanding the mean of each TED Talk through the Afinn, it is valuable to see the most common words used in each talk. NRC analysis provides insight to the most popular words and how many times they are used in the context of the talk. ### NRC
top5words_nrc <- top5words %>%
anti_join(stop_words) %>%
inner_join(get_sentiments("nrc"))
## Joining, by = "word"
## Joining, by = "word"
ggplot(top5words_nrc) + geom_bar(aes(sentiment))
bottom5words_nrc <- bottom5words %>%
anti_join(stop_words) %>%
inner_join(get_sentiments("nrc"))
## Joining, by = "word"
## Joining, by = "word"
ggplot(bottom5words_nrc) + geom_bar(aes(sentiment))
top5words_bing <- top5words %>%
anti_join(stop_words) %>%
inner_join(get_sentiments("bing"))
## Joining, by = "word"
## Joining, by = "word"
ggplot(top5words_bing) + geom_bar(aes(sentiment))
bottom5words_bing <- bottom5words %>%
anti_join(stop_words) %>%
inner_join(get_sentiments("bing"))
## Joining, by = "word"
## Joining, by = "word"
ggplot(bottom5words_bing) + geom_bar(aes(sentiment))
Word Clouds present, in a visual way, an illustration of the most popular words in each category (top 5, bottom 5). The Word Clouds below easily illustrate which words were used the most in the talks by presenting them in different sizes that correlate with their usage.
library(wordcloud2)
top5words_afinn %>%
filter(value > 0) %>%
count(word, sort = TRUE) %>%
wordcloud2()