Data Discription

The dataset concerns a study of teenage gambling in Britain. It has five variables, sex, status, income, verbal and gamble. Status is the socioeconomic status score based on parents’ occupation, verbal is the verbal score in words out of 12 correctly defined which could indicate the teenager’s education level. Gamble is the expenditure on gambling in pounds per year.

Get Data

library(faraway)
attach(teengamb)
str(teengamb)
## 'data.frame':    47 obs. of  5 variables:
##  $ sex   : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ status: int  51 28 37 28 65 61 28 27 43 18 ...
##  $ income: num  2 2.5 2 7 2 3.47 5.5 6.42 2 6 ...
##  $ verbal: int  8 8 6 4 8 6 7 5 6 7 ...
##  $ gamble: num  0 0 0 7.3 19.6 0.1 1.45 6.6 1.7 0.1 ...

Check for Missing value

sum(is.na(teengamb))
## [1] 0
There is no missing value in the dataset

Boxplot of gamble expense by verbal score and sex

plot of chunk unnamed-chunk-2

From the boxplot, we can see that people with lower verbal score tends to spend more money on gambling which indicates tnat people with high education tend to spend less. Among the people with same verbal score, male tends to spend more than female.

Line charts of gamble expense for status and income by sex

plot of chunk unnamed-chunk-3plot of chunk unnamed-chunk-3

From these two plots, we can see that in both cases female tends to spend much less money on gambling than male. It is interesting to see that people with higher socialeconomic status spend less while people with higher income spend more.

Create a Linear Model

gamblemod=lm(gamble~sex+status+income+verbal)
summary(gamblemod)
## 
## Call:
## lm(formula = gamble ~ sex + status + income + verbal)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -51.08 -11.32  -1.45   9.45  94.25 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  22.5557    17.1968    1.31     0.20    
## sex         -22.1183     8.2111   -2.69     0.01 *  
## status        0.0522     0.2811    0.19     0.85    
## income        4.9620     1.0254    4.84  1.8e-05 ***
## verbal       -2.9595     2.1722   -1.36     0.18    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 22.7 on 42 degrees of freedom
## Multiple R-squared:  0.527,  Adjusted R-squared:  0.482 
## F-statistic: 11.7 on 4 and 42 DF,  p-value: 1.81e-06

From the model we can see that sex and verbal have negative relationship with gamble expenditure, and income has positive relationship with gamble expenditure, which lies along with the results that we get from the graphs above. However, the status also has weak positive relationship with gamble expenditure while the plot above shows a negative relationship. This might due to outliers and influential point since the estimated coefficient is very small.