Learn to create a branch, commit, pull, push in github, and use Rmarkdown.to create html document.

This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.

When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:

summary(cars)
##      speed           dist    
##  Min.   : 4.0   Min.   :  2  
##  1st Qu.:12.0   1st Qu.: 26  
##  Median :15.0   Median : 36  
##  Mean   :15.4   Mean   : 43  
##  3rd Qu.:19.0   3rd Qu.: 56  
##  Max.   :25.0   Max.   :120

You can also embed plots, for example:

plot of chunk unnamed-chunk-2

Now let’s exploe the history of the process.

Now it’s time to make the second commit.

Then we can change the axes of speed and dist.

And make the third commit.

plot of chunk unnamed-chunk-3

Note that the echo = FALSE parameter was added to the code chunk to prevent printing of the R code that generated the plot.

Now let’s finish today’s hw(2014-09-18). First load sample data.

setwd("C:/Liangquan Zhou/Study/2014 fall/data visualization/hw1")
sample_data= read.csv("sample_health_facilities.csv") # read the .csv file
  1. Select all facilities located in the southern zones of Nigeria.
new_data= subset(sample_data, zone %in% c("Southwest","Southeast","South-South"))
  1. Incorporate the pop2006 column from the lgas.csv file into the new data.frame containing only those facilities located in Southern Nigeria. (Hint: your id column is lga_id)
lgas= read.csv("lgas.csv", stringsAsFactors=T)
new_data= merge(new_data, lgas[c("lga_id","pop_2006")],by= "lga_id")

And we can use str to see the new dataset:

str(new_data)
## 'data.frame':    26 obs. of  11 variables:
##  $ lga_id                : int  49 67 76 101 183 191 218 304 312 316 ...
##  $ lga                   : Factor w/ 50 levels "Aliero","Anaocha",..: 2 3 4 8 12 13 14 19 20 21 ...
##  $ state                 : Factor w/ 24 levels "Abia","Adamawa",..: 3 20 3 6 10 3 10 20 3 19 ...
##  $ zone                  : Factor w/ 6 levels "North-Central",..: 5 6 5 4 5 5 5 6 5 6 ...
##  $ c_section_yn          : logi  FALSE FALSE FALSE FALSE FALSE TRUE ...
##  $ num_nurses_fulltime   : int  2 0 0 3 1 0 7 6 2 0 ...
##  $ gps                   : Factor w/ 50 levels "10.50716994 7.39845258 633.4000244140625 5.0",..: 27 41 30 34 22 26 18 44 25 37 ...
##  $ num_lab_techs_fulltime: int  NA 0 1 2 0 0 1 1 0 0 ...
##  $ management            : Factor w/ 1 level "public": 1 1 1 1 NA NA NA NA NA 1 ...
##  $ num_doctors_fulltime  : int  NA 0 1 0 0 1 0 1 1 0 ...
##  $ pop_2006              : int  285002 68643 158410 105822 130931 158231 165593 96748 302158 284336 ...
  1. Calculate the total number of full time nurses and doctors for all health facilities in each state.
tapply(new_data$num_doctors_fulltime,  new_data$state,sum)
##        Abia     Adamawa     Anambra      Bauchi       Benue Cross River 
##         308          NA          NA          NA          NA           0 
##       Delta         Edo       Ekiti         Imo      Jigawa      Kaduna 
##           2           0           1           0          NA          NA 
##        Kano     Katsina       Kebbi        Kogi       Lagos       Niger 
##          NA          NA          NA          NA           4          NA 
##        Ogun        Osun     Plateau      Rivers      Taraba     Zamfara 
##           2           1          NA           2          NA          NA
tapply(new_data$num_nurses_fulltime,  new_data$state,sum)
##        Abia     Adamawa     Anambra      Bauchi       Benue Cross River 
##          NA          NA           4          NA          NA           3 
##       Delta         Edo       Ekiti         Imo      Jigawa      Kaduna 
##          10           0           2           8          NA          NA 
##        Kano     Katsina       Kebbi        Kogi       Lagos       Niger 
##          NA          NA          NA          NA           4          NA 
##        Ogun        Osun     Plateau      Rivers      Taraba     Zamfara 
##           0           6          NA           2          NA          NA
  1. Sort the resulting dataset by state population, in descending order.
data1=subset(new_data,select=c(num_doctors_fulltime,num_nurses_fulltime,pop_2006,state))
data1$state=as.factor(as.character(data1$state))
result=data.frame(tapply(data1$num_doctors_fulltime, data1$state,sum),
  tapply(data1$num_nurses_fulltime,  data1$state,sum),
  tapply(data1$pop_2006,  data1$state,sum))
names(result)=c("num_doctors_fulltime","num_nurses_fulltime","pop_2006")
result=result[order(result$pop_2006),]

The result is:

result
##             num_doctors_fulltime num_nurses_fulltime pop_2006
## Ekiti                          1                   2   113754
## Edo                            0                   0   120813
## Osun                           1                   6   165391
## Abia                         308                  NA   220660
## Rivers                         2                   2   284010
## Imo                            0                   8   439241
## Cross River                    0                   3   470167
## Ogun                           2                   0   597659
## Delta                          2                  10   828912
## Anambra                       NA                   4   903801
## Lagos                          4                   4  1802377