Skip to Main Content
Peer Review Program
HomeUnited States Department of Transportation, HomeContact UsSite Map
Travel Model Improvement Program - TMIP
About TMIPTMIP ServicesClearinghouse, selectedConferences and CoursesContactsTravel Model DiscussionsTRANSIMSLinks

BROWSE

SEARCH

ORDER DOCUMENTS

NEWSLETTER


DOCUMENT NAVIGATION

Previous Section
Table of Contents
Next Section

Model Validation and Reasonableness Checking Manual

2.0 Model Inputs

There are two major types of data which are used as inputs to travel models. The first are socioeconomic data, which describe population, households, employment, and land use characteristics of the region by transportation analysis zone (TAZ). The second are transportation network data, which describe the region's transportation system.

It is critical that socioeconomic and transportation network data be checked prior to other steps in validation. If these data are accurate, the level of effort needed to perform other validation steps is greatly reduced. Usually the most common causes of error in travel models are inaccuracies in socioeconomic and transportation network data.

2.1 Land Use and Socioeconomic Data

Current travel demand models are based on the concept that travel is derived from the need to participate in a number of daily activities which are distributed spatially such as work, school, shopping, entertainment, etc. Travel models use zonal socioeconomic or land use data in order to reflect the underlying activity in the study area. The process by which socioeconomic data are estimated in the base year and forecast for future years has a significant impact on model results.

Regional planning agencies often provide input socioeconomic data for base-year validation of travel models. These data are nearly always based on census data, but are often revised for any year other than the decennial census year. These data and measures calculated from the data should be compared to census data from previous years to check for reasonable rates of change. Base year model input data are not actually validated against an independent data source. However, reviewing the socioeconomic inputs for reasonableness is still an important step to ensure that changes are not made to models to improve validation results when, in fact, the problems have been caused by the exogenous data used for the validation.

2.1.1 Sources of Data

In the base year, estimates of zonal population and employment should be based on the best available estimates. Primary data sources provide the information necessary for aggregate travel model validation. The decennial United States Census is an excellent source of socioeconomic data for input into models. Data from both Summary Tape File 3 (STF3) and the Census Transportation Planning Package (CTPP) can be used. STF3 provides univariate distributions of household and population data such as households by household size, households by income group, households by structure type, households by auto ownership, and population in households. The CTPP data provide multivariate distributions of household and population data such as households by auto ownership and household size, and households by income group and auto ownership.

Another source of socioeconomic data available for validation is the 1990 U.S. Census Public Use Microdata Sample (PUMS). This dataset contains individual records of responses to full Census questionaires, but with unique identifiers (names, addresses, etc.) removed to protect the confidentiality of the respondent. PUMS is available for the entire United States for areas that meet a 100,000 minimum population threshold. The standard PUMS datasets include the 5% sample county level file and 1% sample metropolitan area file. Households are geocoded to a Public Use Microdata Area (PUMA), each with population in the range from 100,000 to 200,000.

The state employment/unemployment department can usually provide information on existing numbers of jobs and employed residents, by industry sector. County Business Patterns provides estimates of employment by type of industry and employer size. This information is also available through the U.S. Department of Labor's Employment, Wages, and Contributions file ES-202 (Employment Securities Manual) with employment classified by Standard Industrial Codes (SIC). Employment data are often difficult to obtain because the reported employment location may not reflect the true work location of an employee. For example, franchises may list all of the employees at one single location for the purpose of the Labor Department file.

After regional totals of population and employment have been estimated, the next step is to allocate jobs and households to each traffic analysis zone. This process, often referred to as land use forecasting, occurs outside of the typical travel modeling process. Three techniques used to allocate socioeconomic data include negotiated estimates, scenario approaches, and formal mathematical land use models. Errors in allocation of data can affect both the quantity of trips generated and the distribution of those trips around the region.

2.1.2 Types of Checks

Typically regional and county control totals for socioeconomic data can be easily matched and verified. However, the allocation of regional totals to the subregional level is a process involving both technical and political challenges, particularly when developing forecasts for future year application of the model set. The main sources of error in estimating socioeconomic data for a validation year include:

The first aggregate checks of model input data should involve summarizing data at the city/county/regional levels and comparing with control totals (if available). If local estimates or forecasts have not been developed, the data can be compared with other regions in terms of typical household characteristics or rates of growth. Comparisons to measures from previous models of the same region, models from other regions, and information provided in the forthcoming National Cooperative Highway Research Program (NCHRP) Report 365, Travel Estimation Techniques for Urban Planning, provide insight on reasonable values for these measures. Since socioeconomic characteristics do vary by region, they are best checked against local data sources.

Table 2-1 shows the national trends from the Nationwide Personal Transportation Survey (NPTS) for key demographic characteristics. Checks of these data are very straightforward and provide a simple overview of the reasonableness of the data. Basic checks include total population, total households, total employment, average household size (persons per household) and population/ employment ratio for the region. Appendix A includes summary statistics from the Census Journey-to-Work Data showing demographic statistics for some of the largest metropolitan areas.

Items that directly affect the travel models should be reviewed. For example, if trip generation models are based on workers per household and auto ownership, regional summaries of workers per household and average autos per household should be made. If the trip generation models include socioeconomic submodels to project some of the required socioeconomic data (e.g., accessibility to transit is used along with income and household size to estimate a distribution of households by auto ownership), the interim results of socioeconomic submodels at the regional level should be checked.

Some items that are not used directly by the models do provide a basis for checking input data. For example, resident labor force information is collected by the Census and Bureau of Labor Statistics and can be compared with employment (from establishments) at the regional level. That is,

(Employed Residents + External Residents Working in Region - Residents Working Outside the Region) divided by Total Employment (Jobs) in Region should approximately equal 1.0.

Table 2-1
Summary of Demographic Trends from the NPTS

  1969 1977 1983 1990
Persons per household 3.16 2.83 2.69 2.56
Vehicles per household 1.16 1.59 1.68 1.77
Workers per household 1.21 1.23 1.21 1.27
Vehicles per worker 0.96 1.29 1.39 1.40
Vehicles per licensed driver 0.70 0.94 0.98 1.01
Source: 1969, 1977, 1983, and 1990 NPTS

Percent of Households by Vehicles Available (thousands)

Number of Vehicles Available 1969 1977 1983 1990
No vehicle 20.6% 15.3% 13.5% 9.2%
One vehicle 48.4% 34.6% 33.7% 32.8%
Two vehicles 26.4% 34.4% 33.5% 38.4%
Three or more vehicles 4.6% 15.7% 19.2% 19.5%
Source: 1969, 1977, 1983, and 1990 NPTS

Localized checks of socioeconomic data are used to review the allocation of regional totals to the subregional level. These levels can include districts (subregional aggregations of TAZs), individual TAZs, and TAZs or groups of TAZs which constitute major trip generators, such as CBDs, shopping malls, and suburban activity centers.

Almost any district-level or TAZ-level data can be effectively displayed using a geographic information system (GIS). Because of its graphic presentation capabilities, a GIS is an excellent tool for presenting the results of disaggregate data checks. Example zonal socioeconomic data which can be checked using a GIS include population, households, average household size, shares of households by socioeconomic stratum (e.g., income level or auto ownership), employment, and employment by category. An example plot is shown in Figure 2-1.

Figure 2-1
GIS Plot of Socioeconomic Data

Figure 2-1

Two types of checks which can be performed with a GIS include:

2.2 Transportation Network Definition

The second type of input data to check are roadway and transit networks. Most regional planning agencies assemble data from state departments of transportation, local governments, and transit operators as major inputs to transportation network development. They also carry out primary data collection activities, such as verification of link characteristics, speed/delay studies, and transit wait time studies. These verification efforts are critical to the accuracy of the networks.

2.2.1 Highway Networks

The coded highway network represents the streets, roads, thoroughfares, and freeways that make up the regional highway system. The estimation of travel demand requires an accurate representation of the network. The most likely sources of error are from the coding process and error inherent in the base maps or digital files (i.e. TIGER files, highway attribute inventory) used to develop the network.

Centroids represent the center of activity of a TAZ. They should be located in the center of existing development for model validation. They should represent, as closely as possible, local streets within the TAZ, and the nodes connecting them with the roadway network should represent reasonable access points. Zones should not be split by any major physical barriers. The size and density of zones should correspond to the level of detail of the coded highway network.

Regional validation checks for roadway networks should include an overall visual inspection of the network, but focuses on checking ranges of speeds and capacities by facility type and area type, such as:

Detailed network checks should be made both in terms of network connectivity and network attributes.

Connectivity Checks
Visual roadway network inspections of individual links can be made using network editing and viewing routines or plotting routines provided with travel modeling software packages. Most travel modeling software packages have interactive network editors. These provide good network checking capabilities.

Network coding conventions have a significant impact on path building. Figure 2-2 gives examples of varying levels of coding detail. A simple network intersection, shown at the top, allows for unrestricted turns. In the other coding examples, freeway ramps are coded explicitly so that only permitted movements can be made. While many modeling software packages provide capabilities for adding turn prohibitors, good traffic assignments can be performed without heavy reliance on them.

Figure 2-3 displays an example of how centroid connector coding can impact travel paths and validation results. Ideally, connectors will be attached at the points at which local streets or driveways enter the coded highway network. In a typical urban setting, zones should be connected on all four sides roughly mid-block. If centroid connections are made on only one-side or at the intersection, assigned volumes can be over- or under-projected on the streets immediately adjacent to the zone.

Some network editors provide the capability to build and display shortest paths between pairs of centroids. This process is also known as skimming the network. Skim trees show the minimum path from one zone to multiple zones; skim forests show paths from multiple zones to multiple zones. An example plot of a network path tree is shown in Figure 2-4.

The construction and plotting of paths from one zone (or node) to other zones (or nodes) provides the capability to discover illogical travel paths. In network development, skim trees are used primarily to identify missing/incorrect links or test the coding of freeway interchanges. Typically, distance (miles) or freeflow times (e.g. based on posted speed limit) are used as the measure of network impedance. Zone pairs should be selected so that a majority of the network links are tested. At a minimum, paths should use all major facilities crossing network screenlines (see section 2.2.3). By using skim trees early in the process, network coding errors can be discovered before loading the vehicle trip table onto the network.

One of the most severe (and common) network connectivity problems is when a zone centroid is not connected to the highway network. An easy method for locating unconnected zones is by creating a skim matrix for all zones. Unconnected zones will either cause an error detected by the software, or else the matrix will contain a row of extremely large impedances (i.e. 99999) for that zone.

Similar path-building checks can be performed after highway assignment using paths based on congested travel times. Selected travel time paths can be compared to results from speed/delay studies.

Other network coding errors which affect path-building and assignment results include:

Figure 2-2
Network Coding Convention

Noncontrol Access Facilities Intersection Figure 2-2: Noncontrol Access Facilities Intersection
Control Access Interchange with Noncontrol Access (Freeway with 2-way arterial) Figure 2-2: Control Access Interchange with Noncontrol Access
Control Access Interchange with Control Access (Freeway to Freeway) Figure 2-2: Control Access Interchange with Control Access

Figure 2-3
Coding of Centroid Connectors

Figure 2-3

Figure 2-4
Shortest Path Between Two Nodes

Figure 2-4

Highway Attributes
Highway attribute data can be reviewed in one of two ways: range checking to verify valid ranges of input values, and color plotting using graphical capabilities of interactive network analysis programs or geographic information systems. Paper or screen plots of attributes are effective tools for verifying network accuracy. The plot displayed in Figure 2-5 shows this type of information graphically.

The following attributes should be checked and plotted where appropriate:

Equation 2-1

where:

xa = x-coordinate of the a-node
xb = x-coordinate of the b-node
ya = y-coordinate of the a-node
yb = y-coordinate of the b-node

The ratio of coded length versus straight-line length can be plotted so that links falling outside of an acceptable range (e.g. 0.9 to 1.1) can be identified.

Figure 2-5
Color-Coding of Network Attributes

Figure 2-5

2.2.2 Transit Networks

Public transportation system networks and data should be reviewed. Network plots color-coded by mode can be used to help verify access links, transfer points, stop locations, station connectivity, parking lots, fare coding, etc. If possible, the route itineraries should be plotted so that they can be compared with the transit operator's system map.

System level checks for transit networks include checks on minimum and maximum headways and range checks of walk or auto access times to stations/bus stops. Walk links often have associated walk percentages by zone which can be reviewed by looking at the zone structure along a transit route. Transit system characteristics can be listed by mode, type of vehicle, company, or route.

In addition to hard coded transit speeds, most travel modeling software packages provide the means to directly relate bus speeds to highway (auto) speeds. The relationship between transit speed and highway speed is not the same for all highway links. For example, buses on a freeway operate at speeds that approximate auto speeds, while buses on downtown streets may operate much more slowly than auto traffic. Checks should be made to ensure that bus speeds are less than or equal to, and not greater than, auto speeds (except for bus express lanes).

In some transit modeling software, it may be possible to trace shortest transit paths and compare differences between competing routes in a corridor. For example, routes coded over the same roadway section should have the same stop nodes (unless explicitly different as between a local and express route).

One typical source of error with transit modeling occurs when bus routes traverse local streets not coded in highway networks. It may be desirable to code special transit-only links to allow for routes that deviate significantly from the coded network in order to account for additional travel time on local streets.

2.2.3 System Performance and Validation Data

In adddition to the data collected as inputs to the travel models, it is important to collect and review system performance data which will be used in the validation process. The most common types of validation data include highway traffic volumes, highway speeds and travel times, and transit ridership.

Traffic Volumes
Average daily traffic (ADT) and peak hour traffic volumes are collected at a number of locations throughout the region. Two methods are commonly used: 1) using an automatic traffic counter (either in one or both directions), and 2) manually counting vehicles. ADT is typically collected using automatic counters, while counts classifying vehicles by type (e.g. automobile, light truck, heavy truck, motorcycle) are typically done manually. Manual counts can also be used to collect vehicle occupancy data.

Sufficient coverage of traffic counts may be available already at permanent count locations. Additional counts may be needed at critical links, especially along imaginary lines that are used to assess model validation. These are described below and shown in Figures 2-6 and 2-7:

Figure 2-6
Example of Screenline Locations

Figure 2-6

Figure 2-7
Example of Cutline Locations

Figure 2-7

If multiple counties are included in the modelled area, then each county boundary can form either a cordon or screenline, dependent upon its location within the area. Using county boundaries as cordon and screenlines allow the use of the Census data for validating the home based work trip distribution. The Census data provides summaries, by county, of place of residence versus place of employment. This county-to-county distribution of home and work place can be used as a surrogate for the observed work trips.

Each roadway that crosses a screenline must be taken into account. Roadways which carry significant traffic volumes should be coded into the roadway network and traffic count data should be included. Minor roadways which carry very low traffic volumes may be omitted from the network and from the traffic count database, but their volumes should be estimated and accounted for in the validation analysis.

The traffic counts should be collected during the same year for which the model is being validated. In order to obtain the most typical estimate of ADT, FHWA recommends that a minimum of one midweek 24-hour count be taken at least every two years. Three-day counts can be averaged to improve reliability. Factors can also be applied to the count to relate weekday to average week traffic, and to relate a given month to average monthly conditions. Peak and off-peak traffic volumes can be taken directly from automatic tube counters or from hourly classification counts.

Counts should be reviewed for reasonableness using measures such as volume per lane (e.g. 4,000 vehicles/lane-hour might be unreasonable, etc.).

Speed (or Travel Time)
Speed measurements are particularly important for validating modeled speeds which are used as inputs to air quality emissions models. Observed speed data can be posted on network links with other attribute data Speeds can be collected for peak and/or off-peak time periods using floating car runs or radar detection. Due to the cost of collecting speed data, many areas have very limited information for the highway network. Ideally, speed data should be collected for as many locations as possible for a given area type and facility type (e.g. urban-freeway, suburban-principal arterial, etc.)

Transit Ridership
Three sources of public transportation data include onboard origin-destination surveys, load point checks, and ride checks. Onboard surveys are typically used in the calibration process and should be used to validate total transit trips, trips by route, and trip interchanges made on public transportation. Passenger load checks are performed at location selected for proximity to the maximum load point of a route. Typical information would include headways and schedule, passenger loads compared with seats available, and boarding/alighting activity at that particular location. Ride checks involve having an individual ride a transit vehicle and monitor the number of passengers boarding and alighting at each bus stop.

2.3 Forecasting and Monitoring Model Inputs

Checking the reasonabless of model inputs for the validation base year is only the first step in the process. In order to produce projections of future travel, the models must be applied to forecasts of future population, employment, and other socioeconomic variables. Monitoring is used to determine if development trends and transportation system characteristics are evolving as forecasted.

Socioeconomic Inputs
There is more uncertainty (and potential for error) in predicting socioeconomic inputs to TAZ's. Forecasts are typically made at the traffic analysis zone level for population and households, mean or median income, auto ownership or availability, and employment by type (retail vs. non-retail). Demographic relationships (persons/household, workers/ household, employment/population ratios, etc.) and growth rates should be checked for consistency with expectations, assumptions, and policies. Significant changes in land use must also be carefully evaluated for reasonableness with respect to regional and local growth rates, in both absolute and relative terms.

Care must also be taken to maintain constant dollars with respect to income, parking and operating costs, transit fares, etc.

Transportation Networks
The transportation networks (infrastructure and operational characteristics) for future alternatives are typically well-defined in long-range transportation plans and other documents. They are essentially treated like base networks, although capacities and speeds may be adjusted to reflect more advanced signal coordination systems or ITS strategies.

For future year analyses which involve updates to the base network, once the existing base (or "no-build") highway network has already been checked, the simplest check of the accuracy of coding highway network changes is to overlay the build network over the no-build network to check the differences.