top of page

ANALYSIS & PREDICTION OF REAL-TIME TRAFFIC SPEED DATA OF NYC

Home: Image

PROBLEM DEFINITION

Our aim here is to analyze New York City's real-time traffic data using a Density Estimation technique.

Home: Welcome

DATA EXPLORATION

Dimension: 24.9 M rows and 13 columns. 

About Data: The TMC maintains a map of traffic speed detectors throughout the City. This data feed contains 'real-time' traffic information from locations where DOT picks up sensor feeds within the five boroughs, mostly on major arterials and highways.


Reliability of Data: linkTimeStamp indicates when data was last received from the sensor. If the date is other than the current date it indicates an error with that sensor and that data should

not be used.

Home: Text
Home: Image
Unknown_edited.png

THE DISTRIBUTION OF THE SPEED CAN BE SEEN FROM THE HISTOGRAM.

Home: Body

FURTHER, EXPLORE THE TRAFFIC SPEED DATA BY CALCULATING AVERAGE SPEED PER TIME PERIOD.

The column - Data_as_of represents the time. Time is accurate to the second. We decided to use minutes as the time period.

Unknown-2.png
Home: Body

LIBRARIES USED

Numpy,SciPy,Pandas,Geopandas,Matplotlib,PyKridge,GridSearchCV,Shapely

Home: Text

ALGORITHMS & TECHNIQUES

Density Estimation: Use statistical models to find an underlying probability distribution that gives rise to the observed variables. It is an unsupervised learning method.


Objective: estimate the underlying probability distribution over variables X, p(X), using examples in D


D = {D1 , D2 ,.., Dn }
Di =xi ;a vector of attribute values


Kriging:  Kriging or Gaussian process regression is a method of interpolation for which the interpolated values are modeled by a Gaussian process governed by prior covariances.

Under suitable assumptions on the priors, kriging gives the best linear prediction of the intermediate values.

Grid Search : Grid search is a model hyperparameter optimization technique.

In scikit-learn this technique is provided in the GridSearchCV class.


Home: Text

RESULTS

Here, the black grid patterns denote different zipcodes whereas the blue dots denote the traffic density in respective areas of NYC.

IMG-20190505-WA0004_edited.jpg
Home: Body

TO CONCLUDE

Here, the R square value is 0.815 and best parameters are {'variogram_model': 'spherical', 'method': 'ordinary'} for the most optimal technique.

Home: Conclusion

©2019 by Analysis & Prediction of Real-Time Traffic Speed Data of NYC. Proudly created with Wix.com

bottom of page