Intro to data.frame
This week we will begin to familiarize ourselves with Data Frames. We are given mock data representing the polls for the 2016 U.S Presidential Election by ABC and CBS.
Here are the information given to us exactly as it was presented:
> Name <- c("Jeb", “Donald”, "Ted”, “Marco” “Carly”, “Hillary”, “Berine”)
> ABC political poll results <- c(4, 62,51, 21, 2, 14, 15)
> CBS political poll results <- c(12, 75, 43, 19, 1, 21, 19)
Our goal here is to take what we have learned about Data Frames and try to organize our data in R with what we've learned so far.
Let's begin!
First we assign the different candidates to vector named 'Name'.*
>Name <-c("Jeb","Donald","Ted","Marco","Carly", "Hillary", "Bernie")
*Note: That in order to do this we couldn't simply copy and paste the line that was provided to us due to some issues in how the line was presented. Two main issues and one minor.
1. The quotation marks used around some of the names were "Curly marks" which are different from what are used for coding in R. See this article for more information on the different quotation marks if interested: https://practicaltypography.com/straight-and-curly-quotes.html
2. We were missing a comma (,) between "Marco" and "Carly".
3. Lastly, (the minor issue) there is a typo for Bernie Sanders' name?
Next:
We assign the polling data for ABC and CBS.
> ABC <- c(4, 62, 51, 21, 2, 14, 15)
> CBS <- c(12, 75, 43, 19, 1, 21, 19)
Again we can't simply copy and paste what was given to us. This is because the variable names can't have spaces. To solve this we could rename the variables using underscores instead of spaces (ABC_political_poll_results , CBS_political_poll_results). But as we've seen in our reading (Matloff) long wordy headers can lead to the presentation of the data frame looking odd. I propose we simply use the network name (ABC, CBS). We can add #comments explaining what we mean if we feel the header names are not clear after we are done.
Now we can create our data frame!
>df <- data.frame(Name,ABC,CBS,stringsAsFactors = FALSE)
Great! Now we have data frame that looks like this :
Using our Data Frame
The whole point of a data frame is to allow us to do stats. So lets play around and do some stats!
One command I decide to do is to see how many people each network polled.
So we run two sum functions on ABC and CBS:
>sum(df$ABC)
>sum(df$CBS)
This returns the total number of people each networks polled! 169 and 190 respectively.
This returns the total number of people each networks polled! 169 and 190 respectively.
Another thing I'm interested in is seeing how ABC and CBS differ from one another. So I ran this line of code:
>dif <- abs(df$ABC-df$CBS)
Now we can when we return dif we get the difference between ABC and CBS for each candidate in a vector.
[1] 8 13 8 2 1 7 4
It would probably be easier for us to view these values next to the candidates names, so why don't we add these differences to our existing data frame in an extra column.
>df$diff <- dif
Now our data from looks like this:
Much better!Thank you for reading my post. Please feel free to comment any concerns you may have.
Anthony
Comments
Post a Comment