This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.
When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:
summary(cars)
## speed dist
## Min. : 4.0 Min. : 2
## 1st Qu.:12.0 1st Qu.: 26
## Median :15.0 Median : 36
## Mean :15.4 Mean : 43
## 3rd Qu.:19.0 3rd Qu.: 56
## Max. :25.0 Max. :120
You can also embed plots, for example:
Now let’s exploe the history of the process.
Then we can change the axes of speed and dist.
Note that the echo = FALSE
parameter was added to the code chunk to prevent printing of the R code that generated the plot.
setwd("C:/Liangquan Zhou/Study/2014 fall/data visualization/hw1")
sample_data= read.csv("sample_health_facilities.csv") # read the .csv file
new_data= subset(sample_data, zone %in% c("Southwest","Southeast","South-South"))
lgas.csv
file into the new data.frame containing only those facilities located in Southern Nigeria. (Hint: your id column is lga_id
)lgas= read.csv("lgas.csv", stringsAsFactors=T)
new_data= merge(new_data, lgas[c("lga_id","pop_2006")],by= "lga_id")
And we can use str
to see the new dataset:
str(new_data)
## 'data.frame': 26 obs. of 11 variables:
## $ lga_id : int 49 67 76 101 183 191 218 304 312 316 ...
## $ lga : Factor w/ 50 levels "Aliero","Anaocha",..: 2 3 4 8 12 13 14 19 20 21 ...
## $ state : Factor w/ 24 levels "Abia","Adamawa",..: 3 20 3 6 10 3 10 20 3 19 ...
## $ zone : Factor w/ 6 levels "North-Central",..: 5 6 5 4 5 5 5 6 5 6 ...
## $ c_section_yn : logi FALSE FALSE FALSE FALSE FALSE TRUE ...
## $ num_nurses_fulltime : int 2 0 0 3 1 0 7 6 2 0 ...
## $ gps : Factor w/ 50 levels "10.50716994 7.39845258 633.4000244140625 5.0",..: 27 41 30 34 22 26 18 44 25 37 ...
## $ num_lab_techs_fulltime: int NA 0 1 2 0 0 1 1 0 0 ...
## $ management : Factor w/ 1 level "public": 1 1 1 1 NA NA NA NA NA 1 ...
## $ num_doctors_fulltime : int NA 0 1 0 0 1 0 1 1 0 ...
## $ pop_2006 : int 285002 68643 158410 105822 130931 158231 165593 96748 302158 284336 ...
tapply(new_data$num_doctors_fulltime, new_data$state,sum)
## Abia Adamawa Anambra Bauchi Benue Cross River
## 308 NA NA NA NA 0
## Delta Edo Ekiti Imo Jigawa Kaduna
## 2 0 1 0 NA NA
## Kano Katsina Kebbi Kogi Lagos Niger
## NA NA NA NA 4 NA
## Ogun Osun Plateau Rivers Taraba Zamfara
## 2 1 NA 2 NA NA
tapply(new_data$num_nurses_fulltime, new_data$state,sum)
## Abia Adamawa Anambra Bauchi Benue Cross River
## NA NA 4 NA NA 3
## Delta Edo Ekiti Imo Jigawa Kaduna
## 10 0 2 8 NA NA
## Kano Katsina Kebbi Kogi Lagos Niger
## NA NA NA NA 4 NA
## Ogun Osun Plateau Rivers Taraba Zamfara
## 0 6 NA 2 NA NA
data1=subset(new_data,select=c(num_doctors_fulltime,num_nurses_fulltime,pop_2006,state))
data1$state=as.factor(as.character(data1$state))
result=data.frame(tapply(data1$num_doctors_fulltime, data1$state,sum),
tapply(data1$num_nurses_fulltime, data1$state,sum),
tapply(data1$pop_2006, data1$state,sum))
names(result)=c("num_doctors_fulltime","num_nurses_fulltime","pop_2006")
result=result[order(result$pop_2006),]
The result is:
result
## num_doctors_fulltime num_nurses_fulltime pop_2006
## Ekiti 1 2 113754
## Edo 0 0 120813
## Osun 1 6 165391
## Abia 308 NA 220660
## Rivers 2 2 284010
## Imo 0 8 439241
## Cross River 0 3 470167
## Ogun 2 0 597659
## Delta 2 10 828912
## Anambra NA 4 903801
## Lagos 4 4 1802377