CV Pilots find success in implementing data collection techniques that focus on edge computing to avoid the unmanageable flow and processing of data at a central TMC.

Lessons Learned from the Design/Build/Test Phase of the USDOT’s Connected Vehicle Pilot Program.


Background (Show)

Lesson Learned

The following lessons were identified regarding the sensitivities with the type and amount of data that needs to be collected and the need for a data governance framework that outlines how data will be collected, managed and archived.

Assess data collection needs and requirements
    While the CV system can provide terabytes of data, it is important to have a good understanding of what data is needed for what purposes and where. Data collection must be scalable and sustainable and should provide value during system operation. For example, recording and uploading every Basic Safety Message (BSM) the RSU hears when vehicles are in range in an urban environment to a TMC will typically result in over 500 BSMs from each instrumented vehicle within range of an RSU traveling at 25-30 MPH. If the goal is to compute travel times between RSUs – then a single vehicle will result in the TMC receiving on the order of 1,000 BSMs under free-flow conditions (and this can easily double or triple if the vehicle is stopped). In reality, to compute travel times, the TMC only needs a single BSM from a configured zone within each intersection to begin the matching operation and measurement of travel times. The result is a reduction in 99.8 percent of the network data flow, and a reduction in processing at the TMC by a similar amount.

    Further, consider the scalability problem with the processing for travel time data. If every vehicle were equipped, then the TMC’s task is unmanageable at a reasonable cost. The NYC pilot project had to address these issues due to the expected density of CV equipped vehicles, the limitation of the backhaul bandwidth, and a limit to the processing power at the TMC. Travel times are a critical element to the City’s adaptive control system, and by using the RSU to determine when the vehicle is within a small zone at the intersection makes it possible to compute the travel times. Likewise, as one looks to more sophisticated local monitoring, the combination of the RSU and the Advanced Transportation Controllers (ATC) can convert the data streams to usable information such as queue lengths such that it can share data with the TMC to improve the allocation of phase time, progression, and platoon management.

Have a plan for how the data will be handled both during and post-deployment
    Connected vehicle, mobile device, and infrastructure sensor data captured during the operational phase of the Pilot’s was required to be shared with the independent evaluator in support of the broader evaluation. In addition, data stripped of personally identifiable information (PII) was required to be posted on the ITS Public Data Hub. However, uncertainties regarding data ownership led to sites concerns over subpoenas. After some back-and-forth around the issue, specific language was developed that clarified protections for the data. All CV data sent to the IE was sworn to protection from PII disclosures and the potential to expose privacy-related tracking information.

    Regarding the fate of the data post-Pilots, the USDOT plans to follow the standard data access and retention contract language for JPO-funded projects, which states that JPO-funded data should be retained in a research data access system for two years past the date of original data collection. If there proves to be sufficient value in retaining the data past that point, it will be done on a case-by-case basis. This could include transferring the data to a more persistent operational archive.

Implement data collection procedures and techniques that reduce the burden on the communications network and account for the limitations of backhaul bandwidth
    All municipal systems within New York City utilize the New York City Wireless Network (NYCWiN), limiting the bandwidth that the NY CV Pilot had access to. While the Tampa and Wyoming pilots are collecting vehicle data continuously, the NYC Pilot is only doing event-based data collection to address these limitations. Whenever a configurable event occurs (e.g. hard breaks, steering turns or hard accelerations), all BSMs before and after an event for a configurable amount of time are combined and encrypted into what becomes an "event" record.

    CV infrastructure naturally provides the opportunity for edge processing and the aggregation of CV information to foster better mobility. NYC looked to incorporating edge computing concepts into their data management plans to further address their needs for a more scalable data collection. As opposed to having all data processing occur at the TMC, New York City designed their system architecture to have some data processing occur at "edge" devices (RSUs, OBUs). By performing local processing at the edge instead of streaming all the data to a central cloud for processing, NYC was able to reduce the amount of bandwidth used.

Plan accordingly for data storage requirements
    Preliminary vehicle, mobile device and infrastructure data estimates should be calculated early on to determine the data storage systems needed (including CPU and disk needs). Note that the estimate for interactions between CVs is highly dependent on how often connected vehicles will be traveling within range of each other and interacting. Note that fleet vehicles may have higher daily operational hours than private passenger vehicles and produce proportionally more data.

    During the data collection period, the magnitude of raw and processed data volume should be closely monitored over time to anticipate and respond to any needed data storage needs, such as increasing storage at the TMC or changing the frequency at which devices upload data.

Adopt a metadata standard that all data providers agree to and comply with
    Metadata standards defining what needs to be included in the metadata associated with a data set should be adopted for all data that is uploaded for evaluation/public consumption.

    Uploads of preliminary sample data to USDOT’s Secure Data Commons (SDC) Portal, a cloud-based analytic sandbox, was unorganized and lacked critical data dictionaries that the independent evaluator (IE) needed. To prevent further undocumented data in the SDC, the IE eventually incorporated a "form" of contextual data that the sites were required to fill out for every new table or data type uploaded to the platform.

Lesson Comments

No comments posted to date

Comment on this Lesson

To comment on this lesson, fill in the information below and click on submit. An asterisk (*) indicates a required field. Your name and email address, if provided, will not be posted, but are to contact you, if needed to clarify your comments.


Connected Vehicle Pilot Deployment Program Driving Towards Deployment: Lessons Learned from the Design/Build/Test Phase

Author: Thompson, Kathy

Published By: USDOT Federal Highway Administration

Source Date: 12/13/2018

Other Reference Number: FHWA-JPO-18-712

URL: https://rosap.ntl.bts.gov/view/dot/37681

Other Lessons From this Source

Lesson Contacts

Lesson Analyst:

Kathy Thompson


Average User Rating

0 ( ratings)

Rate this Lesson

(click stars to rate)

Lesson ID: 2019-00877