Analysis Of Two TED Talks ¦ 2022 Best Answer

October 3, 2022

You will create a document showcasing your analysis of two TED talks and presenting a list of 10 qualities, techniques and/or presentation skills that made the presentations you watched inspiring, captivating, creative and effective

Analysis of two TED talks

Introduction

TED, which is an acronym standing for Technology, Entertainment and Design, “is a nonprofit devoted to spreading ideas.” A powerful way that TED works to achieve this mission of spreading ideas is through their TED Talks. TED Talks are short presentations on any topic imaginable. The common link, however, is that every TED Talk seeks to educate listeners on something new.
https://www.ted.com/about/our-organization

This project aims to analyze whether there is a correlation between the positive/negative sentiment of a TED Talk, and the popularity of the talk.

Hypothesis

I predict that the sentiment in the top five most viewed talks will be more positive than negative. Conversely, the sentiment of the top five least popular TED Talks is negative. Viewers want to listen to TED Talks that are positive.

Setup

In order to run all of my analysis functions, I first had to run the necessary packages.

library(ggthemes)
library(ggplot2)
library(wordcloud2)
library(tidyverse)
library(stringr)
library(tidytext)
library(textdata)

After downloading the required packages, I then imported two datasets from Kaggle. The datasets from Kaggle include every TED Talk up until September 21, 2017. Below is the main dataset https://www.kaggle.com/rounakbanik/ted-talks?select=ted_main.csv Below is the dataset that includes the transcripts https://www.kaggle.com/rounakbanik/ted-talks?select=transcripts.csv

Data Import

tedMain <- read.csv("~/Desktop/ted_main.csv", stringsAsFactors=FALSE)

tedTranscripts <- read.csv("~/Desktop/transcripts.csv", stringsAsFactors=FALSE)

After importing the two datasets, I then merged them.

tedTalks <- merge(tedMain, tedTranscripts, by = "url")

top_n(tedTalks,5,views) -> top5views 
top_n(tedTalks, -5, views) -> bottom5views

Top Ten

Then, I ensured that, from the top and bottom viewed TED Talks, the ten most popular words were extracted. Before utilizing sentiment analysis on the five most popular words in each category, I wanted to see the top ten most popular to gain a larger understanding before closing in on five.

top5views %>%
  unnest_tokens(word, transcript) ->top5words

top5words %>% 
  count(word, sort = TRUE) %>% 
anti_join(stop_words) %>% 
  arrange(desc(n)) %>% 
  head(10) %>% 
ggplot(aes(reorder(word, n), n)) +
  geom_col() +
  coord_flip() +
theme_calc()

Analysis of two TED talks ¦ 2022 Best answer

bottom5views %>%
  unnest_tokens(word, transcript) -> bottom5words

bottom5words %>% 
count(word, sort = TRUE) %>% 
anti_join(stop_words) %>% 
  arrange(desc(n)) %>% 
  head(10) %>% 
ggplot(aes(reorder(word, n), n)) +
  geom_col() +
  coord_flip() +
theme_calc()

Analysis of two TED talks ¦ 2022 Best answer

Filter

In order to run sentiment analyses with Afinn, Bing, and NRC, I first had to import the data sets, unnest the tokens, and filter out unnecessary words.

top5views %>%
  unnest_tokens(word, transcript) %>%
  anti_join(stop_words) %>%
  filter(!word %in% c("laughter", "la", "music", "ha")) -> top5WordsFiltered

## Joining, by = "word"

bottom5views %>%
  unnest_tokens(word, transcript) %>%
  anti_join(stop_words) %>%
  filter(!word %in% c("laughter", "la", "music", "ha")) -> bottom5WordsFiltered

## Joining, by = "word"

top5views %>%
  unnest_tokens(word, transcript) %>%
  anti_join(stop_words) %>%
  filter(!word %in% c("laughter", "la", "music", "ha")) -> top5WordsFiltered

## Joining, by = "word"

Unnesting the tokens helped pull the top words from the specific category of ‘transcript’ from the datasets. Filtering helped ensure that noises were excluded from the analysis.

Sentiment Lexicons

Afinn, Bing, & NRC

In order to understand which TED Talks are more positive, it is necessary to run Afinn sentiment analyses. The Afinn analyses help understand the mean sentiment of these TED Talks, thus providing better insight into which talks are more positive than others. The Afinn scale goes from -5 (most negative rating) to 5 (most positive rating). The mean of each TED Talk provides insight to which use more positive language than others.

Afinn top5words provides the five words that score highest on the sentiment analysis, from the top five viewed videos. Filtering the value and providing two afinn tables, one with sentiment values over 0 and one under 0, produces the five words that score highest in these filter categories. Afinn provides the top five words, with the highest positive sentiment score from the top five videos.

Afinn for the five most popular TED Talks. Mean = 0.38

top5words %>%
  anti_join(stop_words) %>%
  inner_join(get_sentiments("afinn")) ->top5words_afinn

## Joining, by = "word"
## Joining, by = "word"

mean(top5words_afinn$value)

## [1] 0.3848921

top5words_afinn %>% 
filter(value > 0) %>% 
  count(word, sort = TRUE) %>% 
  head (5) %>% 
  knitr::kable()

word	n
love	21
powerful	15
applause	14
feeling	11
god	7

Setting the value to greater than 0 collects all of the words that have a sentiment score above 0 (positive). Setting the value to less than 0 collects all of the words that have a sentiment score less than 0 (negative).

top5words_afinn %>% 
  filter(value < 0) %>% 
  count(word, sort = TRUE) %>% 
  head (5) %>% 
  knitr::kable()

word	n
vulnerability	16
numb	10
shame	10
wrong	10
dead	9

Afinn for the five least popular TED Talks Mean=0.50

bottom5words %>%
  anti_join(stop_words) %>%
  inner_join(get_sentiments("afinn")) ->bottom5words_afinn

## Joining, by = "word"
## Joining, by = "word"

mean(bottom5words_afinn$value)

## [1] 0.505814

bottom5words_afinn %>% 
  filter(value > 0) %>% 
  count(word, sort = TRUE) %>% 
  head (5) %>% 
  knitr::kable()

word	n
god	30
love	15
compassionate	8
advantage	6
rich	6

bottom5words_afinn %>% 
  filter(value < 0) %>% 
  count(word, sort = TRUE) %>% 
  head (5) %>% 
  knitr::kable()

word	n
fail	4
wrong	4
bad	3
blah	3
criminal	3

After understanding the mean of each TED Talk through the Afinn, it is valuable to see the most common words used in each talk. NRC analysis provides insight to the most popular words and how many times they are used in the context of the talk. ### NRC

top5words_nrc <- top5words %>% 
  anti_join(stop_words) %>% 
  inner_join(get_sentiments("nrc"))

## Joining, by = "word"
## Joining, by = "word"

ggplot(top5words_nrc) + geom_bar(aes(sentiment))

Analysis of two TED talks ¦ 2022 Best answer

bottom5words_nrc <- bottom5words %>% 
  anti_join(stop_words) %>% 
  inner_join(get_sentiments("nrc"))

## Joining, by = "word"
## Joining, by = "word"

ggplot(bottom5words_nrc) + geom_bar(aes(sentiment))

Analysis of two TED talks ¦ 2022 Best answer

Bing

top5words_bing <- top5words %>% 
anti_join(stop_words) %>%
  inner_join(get_sentiments("bing"))

## Joining, by = "word"
## Joining, by = "word"

ggplot(top5words_bing) + geom_bar(aes(sentiment))

Analysis of two TED talks ¦ 2022 Best answer

bottom5words_bing <- bottom5words %>% 
  anti_join(stop_words) %>%
  inner_join(get_sentiments("bing"))

## Joining, by = "word"
## Joining, by = "word"

ggplot(bottom5words_bing) + geom_bar(aes(sentiment))

Analysis of two TED talks ¦ 2022 Best answer

Word Clouds

Word Clouds present, in a visual way, an illustration of the most popular words in each category (top 5, bottom 5). The Word Clouds below easily illustrate which words were used the most in the talks by presenting them in different sizes that correlate with their usage.

library(wordcloud2)

top5words_afinn %>% 
  filter(value > 0) %>% 
  count(word, sort = TRUE) %>% 
  wordcloud2()

Attachments

Click Here To Download

Analysis of two TED talks

Sentiment Analysis of TED Talks-Bo Hawkes

Introduction

Hypothesis

Setup

Data Import

Top Ten

Filter

Sentiment Lexicons

Afinn, Bing, & NRC

Bing

Word Clouds

Attachments