Approach
The explorations here look into the time-domain aspects of the NYC vehicle collision data. They are also structured in a question-plot-observation format. First, a question is asked, then a plot is drawn and some observations are made from the plot. The goal is to gradually build up our understanding of the time-domain structure of the collisions.
Data retrieved from NYC OpenData site. I downloaded the full dataset on Feb 11,2015. So the that’s the last day we have data for in these graphs.
Hourly collision rates
How many collisions occurr per hour ?
Here is a plot of all dates until Feb 11, 2015. The counts are grouped by hour.
Observations:
- We see too many data points on the graph, diificult to glean anything except …
- Some outlier points stand out
Outliers in hourly collision rate
What can we say about the hourly collision rate outliers?
Keeping the top 10 and removing all other data points.
Top 10 collision hours.
time
|
2015-01-18 08:00:00
|
2014-01-21 10:00:00
|
2014-01-21 11:00:00
|
2015-01-18 07:00:00
|
2014-01-21 09:00:00
|
2014-02-03 08:00:00
|
2014-02-03 09:00:00
|
2013-03-08 08:00:00
|
2013-03-08 09:00:00
|
2015-01-09 08:00:00
|
Observations:
- Top collision rates since July 2012 are all around the winter time.
- Spot-checking some these dates show that it snowed on those days. We should explore weather event corrlelations.
Daily collision rates
What does the daily collision counts look like ? Do we see any change over the years?
Observations: * There doesn’t seem to be an evidence that collision rates are increasing or decreasing. We need to do a statistical significance analysis. But just from visualizations, it doesn’t jump out (not to me at least) * We do see a rise and fall of the collisions over time. This needs further exploration
Smoothed daily collisions rates
Do we see any pattern in a smoothed daily collision rate?
Observations:
- There seems to be some type of pattern, although that’s not really clear from this plot.
- To see if this rise and fall has anything to with the yearly cycle we need to plot different years separately.
Observation:
- There seems to be a pattern like this: at the start of the year collisions are very low and then they start to climb up. The rate incrases until about late June. Then it stays down until the first week of September. Could this related to schools being closed over the summer, people in vacation etc?
- Although the loess smoothing shows a consistent spring-to-summer rise, the data is very noisy.
- Collision rates take a nosedive during the Christmas season. This is clear and support for it will probably be very very strong.
Collisions by date of month
Does daily collision rate change over the course of the month?
(This graph turned out to be a bit crazy and difficult to explain. I still kept it because it looks interesting).
Each dot represents a collision count for a specific date. The color is associated with the month. The blue lines are smoothed rates for different months. There are twelve of these blue. The idea of plotting them all is to see if there are clusters of months that show similar behavior.
Observation: It doesn’t seem that there is a strong relationship between collision rate and what day of the month it is.
Time-of-day effect
Zoom into hourly collisions rates for a given random week
(would be cool to use ggvis here)
Observation: We can see how the time-of-day changes the collision rates.
Consistency of the time-of-day effect
Is the time-of-day effect consistent over different quarters?
This graph shows one line for each quarter that we have the data for. Each line represents the total collisions grouped by week-hour (0-168).
Observations:
- The effect is very consistent.
- We have fewer days of data for the first quarter of 2015. That’s pretty clear from the plot.
- We consistently see two peaks during a weekday and one peak during the weekends.
- Friday afternoon peak seems to be consistently worse than other weekday peaks.
Summary
Things we can see so far
- Outliers that seem to be related to weather events
- Seasonality: spring rise, summer taper, holiday period lull
- No date-of-month effect
- Strong Hour-of-day effect
- Weekday vs weekends very distinct patterns
- Friday evening peaks
Other things we could explore
- Is there a weather effect in accident data outside of snow days ?
- Do we have more collisions the more it rains ?
- Are there different number of vehicles involved in collisions at different times of day ?
- Is there a pattern of different types of vehicles involved over different times of day ?
- Is there a difference between typical accident on a Friday evening rush hour rush our on another day ?
- Is there a difference between morning and evening weekday peaks in vehicles involved, fatalites etc?
- Are there locations that are more likely to have a crash during peak hours ?
- Do injury or fatality rates change over the course of the week ?
- Are there months when pedestrians are more likely to be in a collision (e.g. summer months)? Is there a location affinity ?