Data Methodology & Limitations

Understanding the NYC cyclist incident data, its sources, limitations, and our analysis approach

Navigation
Data Overview
Understanding the NYC Motor Vehicle Collision dataset and its relevance to cyclist incidents

Data Source

The data used in this dashboard is sourced from the NYPD Motor Vehicle Collisions Dataset provided by the City of New York through their Open Data platform. This dataset is maintained by the New York Police Department and is updated daily.

Dataset Description

The dataset contains information about all motor vehicle collisions in NYC reported by the police since July 2012. Each record represents a collision event and includes details such as:

  • Date and time of the collision
  • Location (borough, street names, latitude/longitude)
  • Number of persons injured/killed (categorized by type: pedestrians, cyclists, motorists)
  • Contributing factors
  • Vehicle types involved

Cyclist-Specific Data

For our analysis, we focus specifically on collisions involving cyclists. The dataset provides two key fields for this purpose:

  • number_of_cyclist_injured: Number of cyclists injured in the collision
  • number_of_cyclist_killed: Number of cyclists killed in the collision

Additionally, we use the vehicle type fields to distinguish between traditional bicycles and e-bikes/e-scooters:

  • vehicle_type_code1: Type of the first vehicle involved
  • vehicle_type_code2: Type of the second vehicle involved
  • vehicle_type_code3, vehicle_type_code4, vehicle_type_code5: Types of additional vehicles involved (if applicable)

Data Update Frequency

The NYC Open Data platform updates this dataset daily, typically with a 1-2 day lag from when incidents are reported. Our dashboard does not automatically update with new data; instead, we fetch the latest data when you apply date filters.

Implementation Code Sample
Example code showing how we fetch and process data for the bike type comparison
// Fetch data for bike type comparison
const fetchData = async () => {
  try {
    setRegularBikeLoading(true)
    setEBikeLoading(true)
    setRegularBikeError(null)
    setEBikeError(null)

    // Format date for API query (YYYY-MM-DD)
    const formatDateForQuery = (date) => {
      return date.toISOString().split("T")[0]
    }

    // Use the selected date range or default to 1 year ago
    let fromDate = dateRange?.from || subYears(new Date(), 1)
    let toDate = dateRange?.to || new Date()

    const dateFilter = `crash_date between '${formatDateForQuery(fromDate)}' and '${formatDateForQuery(toDate)}'`

    // Fetch all incidents for the date range
    const allIncidentsResponse = await fetch(
      `https://data.cityofnewyork.us/resource/h9gi-nx95.json?$where=${dateFilter}&$limit=50000`
    )

    if (!allIncidentsResponse.ok) {
      throw new Error(`Failed to fetch data: ${allIncidentsResponse.status} ${allIncidentsResponse.statusText}`)
    }

    const allIncidents = await allIncidentsResponse.json()
    console.log(`Fetched ${allIncidents.length} total incidents`)

    // Process and categorize the data
    const regularBikeIncidents = []
    const eBikeIncidents = []

    // Track vehicle types for analysis
    const vehicleTypeCounts = {}

    // Process each incident to categorize and count vehicle types
    allIncidents.forEach((incident) => {
      // Count vehicle types for analysis
      const vehicleType1 = incident.vehicle_type_code1 || "Unknown"
      const vehicleType2 = incident.vehicle_type_code2 || "Unknown"

      if (vehicleType1 !== "Unknown") {
        vehicleTypeCounts[vehicleType1] = (vehicleTypeCounts[vehicleType1] || 0) + 1
      }

      if (vehicleType2 !== "Unknown") {
        vehicleTypeCounts[vehicleType2] = (vehicleTypeCounts[vehicleType2] || 0) + 1
      }

      // Categorize based on vehicle type
      if (isEBikeIncident(incident)) {
        eBikeIncidents.push(incident)
      } else if (isRegularBikeIncident(incident)) {
        regularBikeIncidents.push(incident)
      }
    })

    console.log(
      `Categorized ${regularBikeIncidents.length} regular bike incidents and ${eBikeIncidents.length} e-bike incidents`
    )

    // Filter out entries without valid coordinates for both datasets
    const filterValidCoordinates = (data) => {
      return data.filter((item) => {
        if (!item.latitude || !item.longitude) return false

        const lat = Number.parseFloat(item.latitude)
        const lng = Number.parseFloat(item.longitude)

        return !isNaN(lat) && !isNaN(lng) && lat >= 40.4 && lat <= 41.0 && lng >= -74.3 && lng <= -73.6
      })
    }

    const validRegularBikeData = filterValidCoordinates(regularBikeIncidents)
    const validEBikeData = filterValidCoordinates(eBikeIncidents)

    setRegularBikeData(validRegularBikeData)
    setEBikeData(validEBikeData)
  } catch (err) {
    console.error("Error fetching data:", err)
    const errorMessage = err instanceof Error ? err.message : "An unknown error occurred"
    setRegularBikeError(errorMessage)
    setEBikeError(errorMessage)
    
    // Initialize with empty arrays on error
    setRegularBikeData([])
    setEBikeData([])
  } finally {
    setRegularBikeLoading(false)
    setEBikeLoading(false)
  }
}

Frequently Asked Questions

How accurate is the NYC cyclist incident data?

The NYC cyclist incident data has several limitations affecting its accuracy. These include underreporting of minor incidents, inconsistent classification of vehicle types (especially e-bikes vs. traditional bicycles), and missing geographic coordinates for approximately 15-20% of incidents. Our methodology attempts to account for these limitations through careful data processing and transparent reporting of data constraints.

How do you distinguish between e-bikes and traditional bicycles?

We use a keyword-based classification system that searches for specific terms in all vehicle type fields. For e-bikes, we look for terms like "e-bike," "electric," "motorized," and "scooter." For traditional bicycles, we identify entries containing "bicycle" or "bike" that don't match our e-bike criteria. This approach has limitations due to inconsistent terminology in the dataset, which we acknowledge in our methodology section.

How often is the data updated?

The NYC Open Data platform updates the collision dataset daily, typically with a 1-2 day lag from when incidents are reported. Our dashboard does not automatically update with new data; instead, we fetch the latest data when you apply date filters. This ensures you always have access to the most current information available when performing your analysis.

Can I download the raw data for my own analysis?

Yes, you can access the raw data directly from the NYC Open Data platform. The platform offers various download formats including CSV, JSON, and API access. Our methodology section provides details on the API endpoints and query parameters we use, which you can adapt for your own analysis.

NYC Cyclist Safety Analysis

Explore our comprehensive analysis of cyclist safety trends in New York City, including hotspots, contributing factors, and temporal patterns.

View Dashboard
E-Bike vs. Traditional Bicycle Comparison

Compare incident patterns between e-bikes/e-scooters and traditional bicycles, including differences in contributing factors and temporal distribution.

View Comparison
Time Period Comparison Tool

Compare cyclist incident data across different time periods to identify trends, seasonal patterns, and changes in incident characteristics.

Compare Periods