Homework 1 assignment

For this homework, I chose the dataset Donation.Disease to practice making different kinds of graphs.

dd <- read.csv("~/Documents/qmssviz/Lab1/Donation-Disease.csv")

First let us observe some basic information about this dataset:

## 'data.frame':    8 obs. of  4 variables:
##  $ Name       : Factor w/ 8 levels "Breast Cancer",..: 4 8 1 3 5 7 6 2
##  $ Description: Factor w/ 8 levels "ALS Ice Bucket Challenge",..: 3 6 4 8 7 5 1 2
##  $ MoneyRaised: num  54.1 3.2 257.9 4.2 14 ...
##  $ Death      : int  596577 39518 41374 73831 7683 21176 6849 142942

There are three sets of information provided for each disease: the name of a fundraising event for that disease, the amount of money collected and the number of deaths from the disease. We may want start off by looking at the amount of money collected from fundraising for each disease in a scatterplot. I looked up plotting in R on the internet and read through a post that helps to plot this very basic graph.

plot of chunk unnamed-chunk-3

A couple things about this may be helpful for future plotting. There are seven types of plotting in r and I chose type “o” for my graph, which means connected scatterplot. The other types are:

“p”:Points

“l”:Lines

“b”:Both

“c”: Lines except at data points

“h”: Histogram

“n”: no plotting

Similarly, we can make a graph showing the number of people who have died from certain disease. For this graph, I may want to put the two scatterplots together and compare them:

plot of chunk unnamed-chunk-4

When I was doing this homework, I referred to Yang Yang’s Lab 1 for help in plotting. I did not realize that I could use tools from ggplot2 until I read through her section about that. I decided to try her way of plotting and see if there is anything I can do to improve the graphs.

The graph she plots has money raised as the y vairable and deaths from disease as the x variable. I only did some minor adjustment such as the angle the text is tilted and the color of the text to make it more readable.
plot of chunk unnamed-chunk-5

Because this dataset is pretty small, there are limited things that we can do to manipulate it. I thus download a dataset from National Data Climate Center about JFK’s monthly snow precipitation from August 1st, 2004 to August 1st, 2014 and see if there are different graphs that I can create. First I need to read the data into R.

Similar to what we did above we want to know some information about this dataset:

str(JFK)

## 'data.frame':    121 obs. of  5 variables:
##  $ Months: Factor w/ 121 levels "01/01/2005","01/01/2006",..: 71 82 92 102 112 1 11 21 31 41 ...
##  $ EMXP  : int  1041 869 147 338 185 173 198 650 307 124 ...
##  $ MXSD  : int  0 0 0 0 25 305 152 152 0 0 ...
##  $ TPCP  : int  1869 2095 321 944 892 863 641 1079 1227 451 ...
##  $ TSNW  : int  0 0 0 0 66 336 336 199 0 0 ...

Note that in this dataset, EMXP refers to extreme maximum daily precipitation total in that month, MXSD maximum snow depth in that month, TPCP total precipitation and TSNW total snow fall amount. EMXP, MXSD and TPCP are in tenth millimeter precision and TSNW in millimeter precision. The difference between TPCP and TSNW is mainly that TPCP includes sleet, ice rain and others while TSNW does not.

Now I want to add two time-series plots for total precipitation and total snow fall amount.

## Loading required package: xts
## Loading required package: zoo
## 
## Attaching package: 'zoo'
## 
## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric

plot of chunk unnamed-chunk-8

We can do similar things with EMXP and MXSD. The last thing I want to do is to put all these four graphs together in one framework. plot of chunk unnamed-chunk-9

Homework 1 assignment

Yue Long

September 18, 2014