The dataset concerns a study of teenage gambling in Britain. It has five variables, sex, status, income, verbal and gamble. Status is the socioeconomic status score based on parents’ occupation, verbal is the verbal score in words out of 12 correctly defined which could indicate the teenager’s education level. Gamble is the expenditure on gambling in pounds per year.
library(faraway)
attach(teengamb)
str(teengamb)
## 'data.frame': 47 obs. of 5 variables:
## $ sex : int 1 1 1 1 1 1 1 1 1 1 ...
## $ status: int 51 28 37 28 65 61 28 27 43 18 ...
## $ income: num 2 2.5 2 7 2 3.47 5.5 6.42 2 6 ...
## $ verbal: int 8 8 6 4 8 6 7 5 6 7 ...
## $ gamble: num 0 0 0 7.3 19.6 0.1 1.45 6.6 1.7 0.1 ...
sum(is.na(teengamb))
## [1] 0
There is no missing value in the dataset
From the boxplot, we can see that people with lower verbal score tends to spend more money on gambling which indicates tnat people with high education tend to spend less. Among the people with same verbal score, male tends to spend more than female.
From these two plots, we can see that in both cases female tends to spend much less money on gambling than male. It is interesting to see that people with higher socialeconomic status spend less while people with higher income spend more.
gamblemod=lm(gamble~sex+status+income+verbal)
summary(gamblemod)
##
## Call:
## lm(formula = gamble ~ sex + status + income + verbal)
##
## Residuals:
## Min 1Q Median 3Q Max
## -51.08 -11.32 -1.45 9.45 94.25
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 22.5557 17.1968 1.31 0.20
## sex -22.1183 8.2111 -2.69 0.01 *
## status 0.0522 0.2811 0.19 0.85
## income 4.9620 1.0254 4.84 1.8e-05 ***
## verbal -2.9595 2.1722 -1.36 0.18
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 22.7 on 42 degrees of freedom
## Multiple R-squared: 0.527, Adjusted R-squared: 0.482
## F-statistic: 11.7 on 4 and 42 DF, p-value: 1.81e-06
From the model we can see that sex and verbal have negative relationship with gamble expenditure, and income has positive relationship with gamble expenditure, which lies along with the results that we get from the graphs above. However, the status also has weak positive relationship with gamble expenditure while the plot above shows a negative relationship. This might due to outliers and influential point since the estimated coefficient is very small.