Data Methodology & Limitations
Understanding the NYC cyclist incident data, its sources, limitations, and our analysis approach
Data Source
The data used in this dashboard is sourced from the NYPD Motor Vehicle Collisions Dataset provided by the City of New York through their Open Data platform. This dataset is maintained by the New York Police Department and is updated daily.
Dataset Description
The dataset contains information about all motor vehicle collisions in NYC reported by the police since July 2012. Each record represents a collision event and includes details such as:
- Date and time of the collision
- Location (borough, street names, latitude/longitude)
- Number of persons injured/killed (categorized by type: pedestrians, cyclists, motorists)
- Contributing factors
- Vehicle types involved
Cyclist-Specific Data
For our analysis, we focus specifically on collisions involving cyclists. The dataset provides two key fields for this purpose:
number_of_cyclist_injured
: Number of cyclists injured in the collisionnumber_of_cyclist_killed
: Number of cyclists killed in the collision
Additionally, we use the vehicle type fields to distinguish between traditional bicycles and e-bikes/e-scooters:
vehicle_type_code1
: Type of the first vehicle involvedvehicle_type_code2
: Type of the second vehicle involvedvehicle_type_code3
,vehicle_type_code4
,vehicle_type_code5
: Types of additional vehicles involved (if applicable)
Data Update Frequency
The NYC Open Data platform updates this dataset daily, typically with a 1-2 day lag from when incidents are reported. Our dashboard does not automatically update with new data; instead, we fetch the latest data when you apply date filters.
// Fetch data for bike type comparison
const fetchData = async () => {
try {
setRegularBikeLoading(true)
setEBikeLoading(true)
setRegularBikeError(null)
setEBikeError(null)
// Format date for API query (YYYY-MM-DD)
const formatDateForQuery = (date) => {
return date.toISOString().split("T")[0]
}
// Use the selected date range or default to 1 year ago
let fromDate = dateRange?.from || subYears(new Date(), 1)
let toDate = dateRange?.to || new Date()
const dateFilter = `crash_date between '${formatDateForQuery(fromDate)}' and '${formatDateForQuery(toDate)}'`
// Fetch all incidents for the date range
const allIncidentsResponse = await fetch(
`https://data.cityofnewyork.us/resource/h9gi-nx95.json?$where=${dateFilter}&$limit=50000`
)
if (!allIncidentsResponse.ok) {
throw new Error(`Failed to fetch data: ${allIncidentsResponse.status} ${allIncidentsResponse.statusText}`)
}
const allIncidents = await allIncidentsResponse.json()
console.log(`Fetched ${allIncidents.length} total incidents`)
// Process and categorize the data
const regularBikeIncidents = []
const eBikeIncidents = []
// Track vehicle types for analysis
const vehicleTypeCounts = {}
// Process each incident to categorize and count vehicle types
allIncidents.forEach((incident) => {
// Count vehicle types for analysis
const vehicleType1 = incident.vehicle_type_code1 || "Unknown"
const vehicleType2 = incident.vehicle_type_code2 || "Unknown"
if (vehicleType1 !== "Unknown") {
vehicleTypeCounts[vehicleType1] = (vehicleTypeCounts[vehicleType1] || 0) + 1
}
if (vehicleType2 !== "Unknown") {
vehicleTypeCounts[vehicleType2] = (vehicleTypeCounts[vehicleType2] || 0) + 1
}
// Categorize based on vehicle type
if (isEBikeIncident(incident)) {
eBikeIncidents.push(incident)
} else if (isRegularBikeIncident(incident)) {
regularBikeIncidents.push(incident)
}
})
console.log(
`Categorized ${regularBikeIncidents.length} regular bike incidents and ${eBikeIncidents.length} e-bike incidents`
)
// Filter out entries without valid coordinates for both datasets
const filterValidCoordinates = (data) => {
return data.filter((item) => {
if (!item.latitude || !item.longitude) return false
const lat = Number.parseFloat(item.latitude)
const lng = Number.parseFloat(item.longitude)
return !isNaN(lat) && !isNaN(lng) && lat >= 40.4 && lat <= 41.0 && lng >= -74.3 && lng <= -73.6
})
}
const validRegularBikeData = filterValidCoordinates(regularBikeIncidents)
const validEBikeData = filterValidCoordinates(eBikeIncidents)
setRegularBikeData(validRegularBikeData)
setEBikeData(validEBikeData)
} catch (err) {
console.error("Error fetching data:", err)
const errorMessage = err instanceof Error ? err.message : "An unknown error occurred"
setRegularBikeError(errorMessage)
setEBikeError(errorMessage)
// Initialize with empty arrays on error
setRegularBikeData([])
setEBikeData([])
} finally {
setRegularBikeLoading(false)
setEBikeLoading(false)
}
}
Frequently Asked Questions
The NYC cyclist incident data has several limitations affecting its accuracy. These include underreporting of minor incidents, inconsistent classification of vehicle types (especially e-bikes vs. traditional bicycles), and missing geographic coordinates for approximately 15-20% of incidents. Our methodology attempts to account for these limitations through careful data processing and transparent reporting of data constraints.
We use a keyword-based classification system that searches for specific terms in all vehicle type fields. For e-bikes, we look for terms like "e-bike," "electric," "motorized," and "scooter." For traditional bicycles, we identify entries containing "bicycle" or "bike" that don't match our e-bike criteria. This approach has limitations due to inconsistent terminology in the dataset, which we acknowledge in our methodology section.
The NYC Open Data platform updates the collision dataset daily, typically with a 1-2 day lag from when incidents are reported. Our dashboard does not automatically update with new data; instead, we fetch the latest data when you apply date filters. This ensures you always have access to the most current information available when performing your analysis.
Yes, you can access the raw data directly from the NYC Open Data platform. The platform offers various download formats including CSV, JSON, and API access. Our methodology section provides details on the API endpoints and query parameters we use, which you can adapt for your own analysis.
Related Resources
Explore our comprehensive analysis of cyclist safety trends in New York City, including hotspots, contributing factors, and temporal patterns.
View DashboardCompare incident patterns between e-bikes/e-scooters and traditional bicycles, including differences in contributing factors and temporal distribution.
View ComparisonCompare cyclist incident data across different time periods to identify trends, seasonal patterns, and changes in incident characteristics.
Compare Periods