Model Validation and Reasonableness Checking Manual
2.0 Model Inputs
There are two major types of data which are used as inputs to travel models. The first are socioeconomic data, which describe population, households, employment, and land use characteristics of the region by transportation analysis zone (TAZ). The second are transportation network data, which describe the region's transportation system.
It is critical that socioeconomic and transportation network data be checked prior to other steps in validation. If these data are accurate, the level of effort needed to perform other validation steps is greatly reduced. Usually the most common causes of error in travel models are inaccuracies in socioeconomic and transportation network data.
2.1 Land Use and Socioeconomic Data
Current travel demand models are based on the concept that travel is derived from the need to participate in a number of daily activities which are distributed spatially such as work, school, shopping, entertainment, etc. Travel models use zonal socioeconomic or land use data in order to reflect the underlying activity in the study area. The process by which socioeconomic data are estimated in the base year and forecast for future years has a significant impact on model results.
Regional planning agencies often provide input socioeconomic data for base-year validation of travel models. These data are nearly always based on census data, but are often revised for any year other than the decennial census year. These data and measures calculated from the data should be compared to census data from previous years to check for reasonable rates of change. Base year model input data are not actually validated against an independent data source. However, reviewing the socioeconomic inputs for reasonableness is still an important step to ensure that changes are not made to models to improve validation results when, in fact, the problems have been caused by the exogenous data used for the validation.
2.1.1 Sources of Data
In the base year, estimates of zonal population and employment should be based on the best available estimates. Primary data sources provide the information necessary for aggregate travel model validation. The decennial United States Census is an excellent source of socioeconomic data for input into models. Data from both Summary Tape File 3 (STF3) and the Census Transportation Planning Package (CTPP) can be used. STF3 provides univariate distributions of household and population data such as households by household size, households by income group, households by structure type, households by auto ownership, and population in households. The CTPP data provide multivariate distributions of household and population data such as households by auto ownership and household size, and households by income group and auto ownership.
Another source of socioeconomic data available for validation is the 1990 U.S. Census Public Use Microdata Sample (PUMS). This dataset contains individual records of responses to full Census questionaires, but with unique identifiers (names, addresses, etc.) removed to protect the confidentiality of the respondent. PUMS is available for the entire United States for areas that meet a 100,000 minimum population threshold. The standard PUMS datasets include the 5% sample county level file and 1% sample metropolitan area file. Households are geocoded to a Public Use Microdata Area (PUMA), each with population in the range from 100,000 to 200,000.
The state employment/unemployment department can usually provide information on existing numbers of jobs and employed residents, by industry sector. County Business Patterns provides estimates of employment by type of industry and employer size. This information is also available through the U.S. Department of Labor's Employment, Wages, and Contributions file ES-202 (Employment Securities Manual) with employment classified by Standard Industrial Codes (SIC). Employment data are often difficult to obtain because the reported employment location may not reflect the true work location of an employee. For example, franchises may list all of the employees at one single location for the purpose of the Labor Department file.
After regional totals of population and employment have been estimated, the next step is to allocate jobs and households to each traffic analysis zone. This process, often referred to as land use forecasting, occurs outside of the typical travel modeling process. Three techniques used to allocate socioeconomic data include negotiated estimates, scenario approaches, and formal mathematical land use models. Errors in allocation of data can affect both the quantity of trips generated and the distribution of those trips around the region.
2.1.2 Types of Checks
Typically regional and county control totals for socioeconomic data can be easily matched and verified. However, the allocation of regional totals to the subregional level is a process involving both technical and political challenges, particularly when developing forecasts for future year application of the model set. The main sources of error in estimating socioeconomic data for a validation year include:- Collection (or reporting) of data, e.g. Census data collection problems, reporting all workers for an employer at a headquarters location.
- Retrieving data - data not at same geographic level as models require, e.g. Census tracts instead of traffic analysis zones.
- Specification errors - data needed are not exactly data available, e.g. auto ownership is forecasted regionally, but model is based on income level.
The first aggregate checks of model input data should involve summarizing data at the city/county/regional levels and comparing with control totals (if available). If local estimates or forecasts have not been developed, the data can be compared with other regions in terms of typical household characteristics or rates of growth. Comparisons to measures from previous models of the same region, models from other regions, and information provided in the forthcoming National Cooperative Highway Research Program (NCHRP) Report 365, Travel Estimation Techniques for Urban Planning, provide insight on reasonable values for these measures. Since socioeconomic characteristics do vary by region, they are best checked against local data sources.
Table 2-1 shows the national trends from the Nationwide Personal Transportation Survey (NPTS) for key demographic characteristics. Checks of these data are very straightforward and provide a simple overview of the reasonableness of the data. Basic checks include total population, total households, total employment, average household size (persons per household) and population/ employment ratio for the region. Appendix A includes summary statistics from the Census Journey-to-Work Data showing demographic statistics for some of the largest metropolitan areas.
Items that directly affect the travel models should be reviewed. For example, if trip generation models are based on workers per household and auto ownership, regional summaries of workers per household and average autos per household should be made. If the trip generation models include socioeconomic submodels to project some of the required socioeconomic data (e.g., accessibility to transit is used along with income and household size to estimate a distribution of households by auto ownership), the interim results of socioeconomic submodels at the regional level should be checked.
Some items that are not used directly by the models do provide a basis for checking input data. For example, resident labor force information is collected by the Census and Bureau of Labor Statistics and can be compared with employment (from establishments) at the regional level. That is,
(Employed Residents + External Residents Working in Region - Residents Working Outside the Region) divided by Total Employment (Jobs) in Region should approximately equal 1.0.
Table 2-1
Summary of Demographic Trends
from the NPTS
| 1969 | 1977 | 1983 | 1990 | |
|---|---|---|---|---|
| Persons per household | 3.16 | 2.83 | 2.69 | 2.56 |
| Vehicles per household | 1.16 | 1.59 | 1.68 | 1.77 |
| Workers per household | 1.21 | 1.23 | 1.21 | 1.27 |
| Vehicles per worker | 0.96 | 1.29 | 1.39 | 1.40 |
| Vehicles per licensed driver | 0.70 | 0.94 | 0.98 | 1.01 |
| Source: 1969, 1977, 1983, and 1990 NPTS | ||||
Percent of Households by Vehicles Available (thousands)
| Number of Vehicles Available | 1969 | 1977 | 1983 | 1990 |
|---|---|---|---|---|
| No vehicle | 20.6% | 15.3% | 13.5% | 9.2% |
| One vehicle | 48.4% | 34.6% | 33.7% | 32.8% |
| Two vehicles | 26.4% | 34.4% | 33.5% | 38.4% |
| Three or more vehicles | 4.6% | 15.7% | 19.2% | 19.5% |
| Source: 1969, 1977, 1983, and 1990 NPTS | ||||
Localized checks of socioeconomic data are used to review the allocation of regional totals to the subregional level. These levels can include districts (subregional aggregations of TAZs), individual TAZs, and TAZs or groups of TAZs which constitute major trip generators, such as CBDs, shopping malls, and suburban activity centers.
Almost any district-level or TAZ-level data can be effectively displayed using a geographic information system (GIS). Because of its graphic presentation capabilities, a GIS is an excellent tool for presenting the results of disaggregate data checks. Example zonal socioeconomic data which can be checked using a GIS include population, households, average household size, shares of households by socioeconomic stratum (e.g., income level or auto ownership), employment, and employment by category. An example plot is shown in Figure 2-1.
Figure 2-1
GIS Plot of
Socioeconomic Data

Two types of checks which can be performed with a GIS include:
- Calculate densities and plot using thematic mapping. Calculate population and employment density in persons per acre (or square mile). Densities should be grouped either using 4 or 5 equal area (or equal number of zones) categories. Color, shading, or bar symbols can be used to convey densities. Base year densities should be compared with forecasted densities.
- Compare existing to forecasted totals by zone or district and plot changes. Subtract existing totals from forecasted totals and plot so that positive and negative changes can be easily identified.
2.2 Transportation Network Definition
The second type of input data to check are roadway and transit networks. Most regional planning agencies assemble data from state departments of transportation, local governments, and transit operators as major inputs to transportation network development. They also carry out primary data collection activities, such as verification of link characteristics, speed/delay studies, and transit wait time studies. These verification efforts are critical to the accuracy of the networks.
2.2.1 Highway Networks
The coded highway network represents the streets, roads, thoroughfares, and freeways that make up the regional highway system. The estimation of travel demand requires an accurate representation of the network. The most likely sources of error are from the coding process and error inherent in the base maps or digital files (i.e. TIGER files, highway attribute inventory) used to develop the network.Centroids represent the center of activity of a TAZ. They should be located in the center of existing development for model validation. They should represent, as closely as possible, local streets within the TAZ, and the nodes connecting them with the roadway network should represent reasonable access points. Zones should not be split by any major physical barriers. The size and density of zones should correspond to the level of detail of the coded highway network.
Regional validation checks for roadway networks should include an overall visual inspection of the network, but focuses on checking ranges of speeds and capacities by facility type and area type, such as:
- Summarize route miles or lane miles by functional class, capacity, or speed.
- Calculate average speed or per-lane capacity by facility type and area type.
Detailed network checks should be made both in terms of network connectivity and network attributes.
Connectivity Checks
Visual roadway network inspections of individual links can be made using
network editing and viewing routines or plotting routines provided with
travel modeling software packages. Most travel modeling software packages
have interactive network editors. These provide good network checking
capabilities.
Network coding conventions have a significant impact on path building. Figure 2-2 gives examples of varying levels of coding detail. A simple network intersection, shown at the top, allows for unrestricted turns. In the other coding examples, freeway ramps are coded explicitly so that only permitted movements can be made. While many modeling software packages provide capabilities for adding turn prohibitors, good traffic assignments can be performed without heavy reliance on them.
Figure 2-3 displays an example of how centroid connector coding can impact travel paths and validation results. Ideally, connectors will be attached at the points at which local streets or driveways enter the coded highway network. In a typical urban setting, zones should be connected on all four sides roughly mid-block. If centroid connections are made on only one-side or at the intersection, assigned volumes can be over- or under-projected on the streets immediately adjacent to the zone.
Some network editors provide the capability to build and display shortest paths between pairs of centroids. This process is also known as skimming the network. Skim trees show the minimum path from one zone to multiple zones; skim forests show paths from multiple zones to multiple zones. An example plot of a network path tree is shown in Figure 2-4.
The construction and plotting of paths from one zone (or node) to other zones (or nodes) provides the capability to discover illogical travel paths. In network development, skim trees are used primarily to identify missing/incorrect links or test the coding of freeway interchanges. Typically, distance (miles) or freeflow times (e.g. based on posted speed limit) are used as the measure of network impedance. Zone pairs should be selected so that a majority of the network links are tested. At a minimum, paths should use all major facilities crossing network screenlines (see section 2.2.3). By using skim trees early in the process, network coding errors can be discovered before loading the vehicle trip table onto the network.
One of the most severe (and common) network connectivity problems is when a zone centroid is not connected to the highway network. An easy method for locating unconnected zones is by creating a skim matrix for all zones. Unconnected zones will either cause an error detected by the software, or else the matrix will contain a row of extremely large impedances (i.e. 99999) for that zone.
Similar path-building checks can be performed after highway assignment using paths based on congested travel times. Selected travel time paths can be compared to results from speed/delay studies.
Other network coding errors which affect path-building and assignment results include:
- missing nodes or links,
- one-way links going in the wrong direction, and
- trip passing through centroids instead of staying on highway links.
Figure 2-2
Network Coding
Convention
| Noncontrol Access Facilities Intersection | ![]() |
| Control Access Interchange with Noncontrol Access (Freeway with 2-way arterial) | ![]() |
| Control Access Interchange with Control Access (Freeway to Freeway) | ![]() |
Figure 2-3
Coding of Centroid
Connectors

Figure 2-4
Shortest Path Between
Two Nodes

Highway Attributes
Highway attribute data can be reviewed in one of two ways: range
checking to verify valid ranges of input values, and color plotting using
graphical capabilities of interactive network analysis programs or
geographic information systems. Paper or screen plots of attributes are
effective tools for verifying network accuracy. The plot displayed in
Figure 2-5 shows this type of information
graphically.
The following attributes should be checked and plotted where appropriate:
- Link Distance (length): Roadway link distances should be compared to straight-line distances calculated from node locations and coordinate geometry. Minimum and maximum link distances should be checked for reasonableness. Straight-line, or air-line, distances are calculated using the formula:
![]()
where:
- xa = x-coordinate of the a-node
- xb = x-coordinate of the b-node
- ya = y-coordinate of the a-node
- yb = y-coordinate of the b-node
The ratio of coded length versus straight-line length can be plotted so that links falling outside of an acceptable range (e.g. 0.9 to 1.1) can be identified.
- Posted Speed Limit (in m.p.h.): Speed limits may be used as inputs to a trip distribution model. However, motorists typically will travel faster than posted speeds under free-flow conditions.
- Facility Class: Roadways are typically classified by type such as freeway/expressway, principal arterial, minor arterial, collector, and local-access streets. High-occupancy lanes may be designated as a separate facility type.
- Area Type: e.g. urban, suburban, and rural. If area type and facility type are used to determine default speeds and capacities, the combined code should be checked.
- Number of Lanes: The number of functional lanes by direction is most important, but parking and turn lanes may also be used.
- Tolls or parking costs: May be coded either in dollars or minutes.
- Intersection Type
Figure 2-5
Color-Coding of
Network Attributes

2.2.2 Transit Networks
Public transportation system networks and data should be reviewed. Network plots color-coded by mode can be used to help verify access links, transfer points, stop locations, station connectivity, parking lots, fare coding, etc. If possible, the route itineraries should be plotted so that they can be compared with the transit operator's system map.System level checks for transit networks include checks on minimum and maximum headways and range checks of walk or auto access times to stations/bus stops. Walk links often have associated walk percentages by zone which can be reviewed by looking at the zone structure along a transit route. Transit system characteristics can be listed by mode, type of vehicle, company, or route.
In addition to hard coded transit speeds, most travel modeling software packages provide the means to directly relate bus speeds to highway (auto) speeds. The relationship between transit speed and highway speed is not the same for all highway links. For example, buses on a freeway operate at speeds that approximate auto speeds, while buses on downtown streets may operate much more slowly than auto traffic. Checks should be made to ensure that bus speeds are less than or equal to, and not greater than, auto speeds (except for bus express lanes).
In some transit modeling software, it may be possible to trace shortest transit paths and compare differences between competing routes in a corridor. For example, routes coded over the same roadway section should have the same stop nodes (unless explicitly different as between a local and express route).
One typical source of error with transit modeling occurs when bus routes traverse local streets not coded in highway networks. It may be desirable to code special transit-only links to allow for routes that deviate significantly from the coded network in order to account for additional travel time on local streets.
2.2.3 System Performance and Validation Data
In adddition to the data collected as inputs to the travel models, it is important to collect and review system performance data which will be used in the validation process. The most common types of validation data include highway traffic volumes, highway speeds and travel times, and transit ridership.Traffic Volumes
Average daily traffic (ADT) and peak hour traffic volumes are collected
at a number of locations throughout the region. Two methods are commonly
used: 1) using an automatic traffic counter (either in one or both
directions), and 2) manually counting vehicles. ADT is typically collected
using automatic counters, while counts classifying vehicles by type (e.g.
automobile, light truck, heavy truck, motorcycle) are typically done
manually. Manual counts can also be used to collect vehicle occupancy
data.
Sufficient coverage of traffic counts may be available already at permanent count locations. Additional counts may be needed at critical links, especially along imaginary lines that are used to assess model validation. These are described below and shown in Figures 2-6 and 2-7:
- Screenlines typically extend completely across the modelled area and go from boundary cordon to boundary cordon. For example, a river that passes completely through the area makes an excellent screenline. Travel demand that goes from one side of the river to the other must cross this river screenline within the study area boundary. Screenlines are often associated with physical barriers such as rivers or railroads, however jurisdictional boundaries such as county lines that extend through the study area make excellent screenlines.
- Cutlines extend across a corridor containing multiple facilities. They should be used to intercept travel along only one axis.
- Cordon lines completely encompass a designated area. Cordon lines are typically associated with the boundary of the area being modelled. However, for model validation purposes, it is also helpful to develop internal cordon lines or boundaries. For example, a cordon around the central business district is useful in validating the "ins and outs" of the CBD related traffic demand. Over or under estimates of trips bound for the CBD could indicate errors in the socioeconomic data (employment data for the CBD) or errors in the trip distribution or mode choice model.
Figure 2-6
Example of Screenline
Locations

Figure 2-7
Example of Cutline
Locations

If multiple counties are included in the modelled area, then each county boundary can form either a cordon or screenline, dependent upon its location within the area. Using county boundaries as cordon and screenlines allow the use of the Census data for validating the home based work trip distribution. The Census data provides summaries, by county, of place of residence versus place of employment. This county-to-county distribution of home and work place can be used as a surrogate for the observed work trips.
Each roadway that crosses a screenline must be taken into account. Roadways which carry significant traffic volumes should be coded into the roadway network and traffic count data should be included. Minor roadways which carry very low traffic volumes may be omitted from the network and from the traffic count database, but their volumes should be estimated and accounted for in the validation analysis.
The traffic counts should be collected during the same year for which the model is being validated. In order to obtain the most typical estimate of ADT, FHWA recommends that a minimum of one midweek 24-hour count be taken at least every two years. Three-day counts can be averaged to improve reliability. Factors can also be applied to the count to relate weekday to average week traffic, and to relate a given month to average monthly conditions. Peak and off-peak traffic volumes can be taken directly from automatic tube counters or from hourly classification counts.
Counts should be reviewed for reasonableness using measures such as volume per lane (e.g. 4,000 vehicles/lane-hour might be unreasonable, etc.).
Speed (or Travel Time)
Speed measurements are particularly important for validating modeled
speeds which are used as inputs to air quality emissions models. Observed
speed data can be posted on network links with other attribute data Speeds
can be collected for peak and/or off-peak time periods using floating car
runs or radar detection. Due to the cost of collecting speed data, many
areas have very limited information for the highway network. Ideally,
speed data should be collected for as many locations as possible for a
given area type and facility type (e.g. urban-freeway, suburban-principal
arterial, etc.)
Transit Ridership
Three sources of public transportation data include onboard
origin-destination surveys, load point checks, and ride checks. Onboard
surveys are typically used in the calibration process and should be used
to validate total transit trips, trips by route, and trip interchanges
made on public transportation. Passenger load checks are performed at
location selected for proximity to the maximum load point of a route.
Typical information would include headways and schedule, passenger loads
compared with seats available, and boarding/alighting activity at that
particular location. Ride checks involve having an individual ride a
transit vehicle and monitor the number of passengers boarding and
alighting at each bus stop.
2.3 Forecasting and Monitoring Model Inputs
Checking the reasonabless of model inputs for the validation base year is only the first step in the process. In order to produce projections of future travel, the models must be applied to forecasts of future population, employment, and other socioeconomic variables. Monitoring is used to determine if development trends and transportation system characteristics are evolving as forecasted.
Socioeconomic Inputs
There is more uncertainty (and potential for error) in predicting
socioeconomic inputs to TAZ's. Forecasts are typically made at the traffic
analysis zone level for population and households, mean or median income,
auto ownership or availability, and employment by type (retail vs.
non-retail). Demographic relationships (persons/household, workers/
household, employment/population ratios, etc.) and growth rates should be
checked for consistency with expectations, assumptions, and policies.
Significant changes in land use must also be carefully evaluated for
reasonableness with respect to regional and local growth rates, in both
absolute and relative terms.
Care must also be taken to maintain constant dollars with respect to income, parking and operating costs, transit fares, etc.
Transportation Networks
The transportation networks (infrastructure and operational
characteristics) for future alternatives are typically well-defined in
long-range transportation plans and other documents. They are essentially treated like base networks,
although capacities and speeds may be adjusted to reflect more advanced
signal coordination systems or ITS strategies.
For future year analyses which involve updates to the base network, once the existing base (or "no-build") highway network has already been checked, the simplest check of the accuracy of coding highway network changes is to overlay the build network over the no-build network to check the differences.




