• Skip to primary navigation
  • Skip to main content
Logo

Work Zone Safety Information Clearinghouse

Library of Resources to Improve Roadway Work Zone Safety for All Roadway Users

  • About
  • Newsletter
  • Contact
  • X
  • Facebook
  • LinkedIn

  • Work Zone Data
    • At a Glance
    • National & State Traffic Data
    • Work Zone Traffic Crash Trends and Statistics
    • Worker Fatalities and Injuries at Road Construction Sites
  • Topics of Interest
    • Commercial Motor Vehicle Safety
    • Smart Work Zones
    • Work Zone Safety and MobilityTransportation Management Plans
    • Accommodating Pedestrians
    • Worker Safety and Welfare
    • Project Coordination in Work Zones
  • Training
    • Online Courses
    • FHWA Safety Grant Products
    • Toolboxes
    • Flagger
    • Certification and
      Accreditation
  • Work Zone Devices
  • Laws, Standards & Policies
  • Public Awareness
  • About
  • Events
  • Contact
  • Search
Publication

AI for Data Quality Auditing: Detecting Mislabeled Work Zone Crashes Using Large Language Models

Author/Presenter: Jaradat, Shadi; Acharya, Nirmal; Shivshankar, Smitha; Alhadidi, Taqwa I.; Elhenawy, Mohammad
Abstract:

Ensuring high data quality in traffic crash datasets is critical for effective safety analysis and policymaking. This study presents an AI-assisted framework for auditing crash data integrity by detecting potentially mislabeled records related to construction zone (czone) involvement. A GPT-3.5 model was fine-tuned using a fusion of structured crash attributes and unstructured narrative text (i.e., multimodal input) to predict work zone involvement. The model was applied to 6400 crash reports to flag discrepancies between predicted and recorded labels. Among 80 flagged mismatches, expert review confirmed four records as genuine misclassifications, demonstrating the framework’s capacity to surface high-confidence labeling errors. The model achieved strong overall accuracy (98.75%) and precision (86.67%) for the minority class, but showed low recall (14.29%), reflecting its conservative design that minimizes false positives in an imbalanced dataset. This precision-focused approach supports its use as a semi-automated auditing tool, capable of narrowing the scope for expert review and improving the reliability of large-scale traffic safety datasets. The framework is also adaptable to other misclassified crash attributes or domains where structured and unstructured data can be fused for data quality assurance.

Source: Algorithms
Volume: 18
Issue: 6
Publication Date: May 2025
Full Text URL: Link to URL
Publication Types: Books, Reports, Papers, and Research Articles
Topics: Artificial Intelligence; Crashes; Data Quality; Input Output Models; Work Zones

Copyright © 2025 American Road & Transportation Builders Association (ARTBA). The National Work Zone Safety Information Clearinghouse is a project of the ARTBA Transportation Development Foundation. It is operated in cooperation with the U.S. Federal Highway Administration and Texas A&M Transportation Institute. | Copyright Statement · Privacy Policy · Disclaimer
American Road and Transportation Builders Association Transportation Development Foundation, American Road and Transportation Builders Association U.S. Department of Transportation Federal Highway Administration Texas A&M Transportation Institute