
ANALYSIS & PREDICTION OF REAL-TIME TRAFFIC SPEED DATA OF NYC
PROBLEM DEFINITION
Our aim here is to analyze New York City's real-time traffic data using a Density Estimation technique.
DATA EXPLORATION
Dimension: 24.9 M rows and 13 columns.
About Data: The TMC maintains a map of traffic speed detectors throughout the City. This data feed contains 'real-time' traffic information from locations where DOT picks up sensor feeds within the five boroughs, mostly on major arterials and highways.
Reliability of Data: linkTimeStamp indicates when data was last received from the sensor. If the date is other than the current date it indicates an error with that sensor and that data should
not be used.


THE DISTRIBUTION OF THE SPEED CAN BE SEEN FROM THE HISTOGRAM.
FURTHER, EXPLORE THE TRAFFIC SPEED DATA BY CALCULATING AVERAGE SPEED PER TIME PERIOD.
The column - Data_as_of represents the time. Time is accurate to the second. We decided to use minutes as the time period.

LIBRARIES USED
Numpy,SciPy,Pandas,Geopandas,Matplotlib,PyKridge,GridSearchCV,Shapely
ALGORITHMS & TECHNIQUES
Density Estimation: Use statistical models to find an underlying probability distribution that gives rise to the observed variables. It is an unsupervised learning method.
Objective: estimate the underlying probability distribution over variables X, p(X), using examples in D
D = {D1 , D2 ,.., Dn }
Di =xi ;a vector of attribute values
Kriging: Kriging or Gaussian process regression is a method of interpolation for which the interpolated values are modeled by a Gaussian process governed by prior covariances.
Under suitable assumptions on the priors, kriging gives the best linear prediction of the intermediate values.
Grid Search : Grid search is a model hyperparameter optimization technique.
In scikit-learn this technique is provided in the GridSearchCV class.
RESULTS
Here, the black grid patterns denote different zipcodes whereas the blue dots denote the traffic density in respective areas of NYC.

TO CONCLUDE
Here, the R square value is 0.815 and best parameters are {'variogram_model': 'spherical', 'method': 'ordinary'} for the most optimal technique.