How to transform data and the dplyr function. For each scenario below, describe how you would transform the data and the dplyr function(s) you could use to achieve that.
Write 250 words (maximum) in response to these questions.
Answer the three short-answer questions by giving the answer in the space provided. You are not expected to use any specific formatting for the essay answer questions. Once you have answered the questions here, click the Submit Quiz button. The live session will be used to discuss the answers to these questions.
Flag question: Question 1
Question 10 pts
You are working with the Salaries dataset from the car data package, which includes salary information for a group of professors. You want to see how Salaries differ by sex.
Flag question: Question 2
Question 20 pts
You are working with a parking dataset from the City of Seattle that contains street parking data for the entire city over multiple years, but you want to focus your analysis on only two neighborhoods in the past year.
Flag question: Question 3
Question 30 pts
You are working with employee satisfaction survey data. It currently has a column for each question answered, but you would like to have a calculated summary score rather than looking at each question.
Although many fundamental data manipulation functions exist in R, they have been a bit convoluted to date and have lacked consistent coding and the ability to easily flow together. This leads to difficult-to-read nested functions and/or choppy code. R Studio is driving a lot of new packages to collate data management tasks and better integrate them with other analysis activities. As a result, a lot of data processing tasks are becoming packaged in more cohesive and consistent ways, which leads to:
dplyr
is one such package which was built for the sole purpose of simplifying the process of manipulating, sorting, summarizing, and joining data frames. This tutorial serves to introduce you to the basic functions offered by the dplyr
package. These fundamental functions of data transformation that the dplyr package offers includes:
select()
selects variablesfilter()
provides basic filtering capabilitiesgroup_by()
groups data by categorical levelssummarise()
summarizes data by functions of choicearrange()
orders datajoin()
joins separate dataframesmutate()
creates new variables