Model Validation and Reasonableness Checking Manual
1.0 Introduction
A major shortcoming of many travel demand models is the lack of attention and effort placed on the validation phase of model development. Validation involves testing the model's predictive capabilities. Travel models need to be able to replicate observed conditions within reason before being used to produce future-year forecasts. As metropolitan areas continue to refine and improve the travel demand forecasting process, the credibility of the process with decision makers will depend largely on the ability of analysts to properly validate procedures and models used.
The travel modeling process has undergone many changes in the past few years in order to evaluate more complex policy actions resulting from legislation such as ISTEA and the Clean Air Act. As travel models have become more complex, so have the procedures needed to validate them. Often there is a tradeoff between increasing confidence in the level of accuracy of the models and the cost of data collection and effort required to validate models. Tests or checks used to evaluate the reliability of models can range from a simple assessment of the reasonableness of model outputs to sophisticated statistical techniques.
1.1 Purpose of Manual
This manual builds upon the 1990 Federal Highway Administration publication Calibration and Adjustment of System Planning Models (FHWA-ED-90-015). That manual provided a set of simple procedures for calibrating travel models that reflected the limited number of regions with current household travel survey data available.
Since 1990, many regions have conducted, or are planning to conduct, new household travel surveys and other data collection efforts to improve their ability to develop and validate more detailed and rigorous models. In addition, the Travel Model Improvement Program has provided technical assistance, aiding planning organizations in implementing state-of-the-art modeling practices. This validation manual provides guidance on how to perform reasonableness checks on the latest generation of models commonly included in the four-step modeling process. While it is impossible to specify exact checks for every possible model, this manual will describe families of checks and provide concrete examples of validation checks. The manual also provides tips for regions with limited resources for model validation.
The manual should serve as a set of guidelines for best practice, not as a list of required steps. The process used to validate a travel model is dependent on the purpose of the model, available data resources, model structure, and desired level of accuracy. Improving the performance of travel models depends not only on the proper calibration of parameters, but also on careful review of exogenous inputs. Typical inputs include (1) zonal socioeconomic inputs such as population, households, employment, income or auto ownership, and school enrollment; and (2) transportation system characteristics such as highway and transit network definition and attributes. Therefore, this manual prescribes a number of reasonableness tests for model inputs, model parameters, and model outputs.
One difficulty in prescribing a set of procedures for validating models is that the concepts of model validation, calibration, and estimation have taken on different meanings and sometimes overlap in their objectives. In practice, travel model development usually involves all three steps, as well as model application, as shown in Figure 1-1. In this manual, the following definitions are used:
Figure 1-1
Role of Model
Validation

- Model Estimation: Statistical estimation procedures are used to find the values of the model parameters (esp. coefficients) which maximize the likelihood of fitting observed travel data, such as a household travel survey or on-board transit survey. The focus is on correctly specifying the form of the model and determining the statistical significance of the variables. For example, the initial cross-classification of a trip production model or the logit estimation of level-of-service coefficients in a mode choice model are developed in the estimation phase. If local data are not available, then this initial step is often skipped and the coefficients are borrowed from another urban area.
- Model Calibration: After the model parameters have been estimated, calibration is used to adjust parameter values until predicted travel matches observed travel demand levels in the region. For example, calibration of the mode-specific constants in a mode choice model ensures that the estimated shares match the observed shares by mode (and often by mode of access).
- Model Validation: In order to test the ability of the model to predict future behavior, validation requires comparing the model predictions with information other than that used in estimating the model. This step is typically an iterative process linked to model calibration. It involves checking the model results against observed data and adjusting parameters until model results fall within an acceptable range of error. If the only way that a model will replicate observed data is through the use of unusual parameters and procedures or localized "quick-fixes", then it is unlikely that the model can reliably forecast future conditions.
- Model Application: Although the model may replicate base year conditions, the application of the model to future year conditions and policy options requires checking the reasonableness of projections, so there is a link between application and validation as well. The sensitivity of the models in response to system or policy changes is often the main issue in model application.
The focus of this manual is on the iterative process shaded in the figure which links validation with calibration. It is not a manual on travel model development. While the estimation phase of model development does have a link to validation, this manual assumes that the final model structure, especially the inclusion of relevant variables and specification of initial parameters, has already been determined.
1.2 Target Audience
The model validation manual should prove to be a useful reference for the following persons:
- Travel Forecasters
- responsible for model calibration and/or validation
- responsible for model application
- employees of metropolitan planning organizations (MPOs), states, municipalities and counties, and consultants;
- Transportation Planners
- responsible for evaluation of plans
- responsible for designing alternatives
- employees of metropolitan planning organizations (MPOs), states, municipalities and counties, and consultants;
- Decision-makers
- at overview level to know the questions to ask
- employees of metropolitan planning organizations (MPOs), states, municipalities and counties;
- Members of the public with an interest in travel forecasting.
1.3 Overview of the Model Validation Process
Typically the calibration and validation processes focus only on the overall results of the travel model, especially highway volumes at screenline crossings. The models are run to obtain the necessary output such as mode shares, overall transit ridership, transit boardings for a specific line, or traffic volumes, without detailed checking of results from individual model components. This "all-too-common" approach to model validation might be used under the justification that traffic counts or transit boardings are the only historical data available or because time constraints preclude detailed checking of interim model steps.
The approach advocated in this Validation Manual is to apply reasonableness checks during the processes of calibrating each individual model component. After each component has been validated, the overall set of models is validated to ensure that each is properly interfaced and that modeling error is not propagated by chaining the models together. Figure 1-2 presents an overview of the validation process contrasting the desired approach with the "all-too-common" approach.
Individual model validations are used as part of calibration to show that each component reasonably reproduces observed travel characteristics. For example, trip generation models should be checked to ensure that trip productions and attractions estimated on a district and regional basis are reasonably similar to the observed number of trips; trip distribution models are checked to ensure that they reasonably reproduce the observed average trip lengths by trip purpose; etc.
Validation of the overall set of models tests the effects of compounding errors. For example, suppose that the trip production model produced too few trips from a zone that was relatively close to a large attractor of trips. If these trip generation results are input to the trip distribution model, they would have a tendency to increase trip lengths because of the error in trip production modeling. Overall measures of model performance, such as regional VMT and screenline volumes, should be reviewed with the possibility of error propagation in mind.
The following steps summarize the recommended overall model calibration and validation process:
- Estimate model parameters and test the specification of the model structure using household travel survey data set.
- Calibrate model parameters to reproduce desired regional control totals.
- Validate each model component to ensure that reasonable results are produced, and that observed conditions are replicated. When available, use independent data sets to validate individual model components.
- Apply travel model chain using initial calibrated parameters. Check overall aggregate measures (such as VMT by facility type and speed ranges, and screenline/cutline volumes). Compare modeled volumes with observed traffic counts.
- Evaluate results from the steps above to determine whether systemwide and/or localized problems have occurred in the model application.
1.4 Validation Issues
Before presenting the validation checks in the following chapters, it is useful to consider a number of issues regarding the types of checks which are used, the level of aggregation, data sources, accuracy requirements, and sources of error.
1.4.1 Types of Validation Checks
As noted earlier in the Introduction, the approach used to validate travel models can vary a great deal depending on a variety of factors such as the types of policy options being tested and the availability of historical data. This Validation Manual provides a range of validation measures for both base year calibration and future year application of models.Two major categories of validation checks are used in this report:
Reasonableness Checks: These include comparison of rates and parameters, total regional values, subregional values, logic tests, etc. Parameters should be checked against observed values, parameters estimated in other regions, or secondary data sources for consistency. The models should be evaluated in terms of acceptable levels of error, their ability to perform according to theoretical and logical expectations, and the consistency of model results with the assumptions used to generate them.
Sensitivity Tests: These include response to transportation system, socioeconomic, or policy changes. Sensitivity is often expressed as the elasticity of a variable. For example, one might examine the impact on travel demand if parking costs were to double or if bus headways were reduced dramatically. Sensitivity analysis should be used for all components of the modeling process, prior to application of the model for forecasting. It is important because projected policies (e.g. tolls) or conditions (e.g. high congestion levels) might not exist in the base year.
Throughout this manual, a number of validation tests will be described which compare observed and estimated values for a given model output (e.g. trips produced, daily link volumes) over a number of observations (e.g. TAZs, links with traffic counts).
There are four common approaches to evaluating how well the model estimates match the observed data:
- Absolute difference: Calculated as the actual difference, i.e. Estimated - Observed. The sign (positive or negative) may be an important indicator of performance.
- Relative difference: Values are normalized to remove scaling effects. Can be expressed as a percentage difference (e.g. acceptable range might be ±10%) or as a ratio (e.g. 0.9 to 1.1) and are calculated as follows:
Percentage difference = (Estimated - Observed) * 100
Observed
Ratio = Estimated / Observed
- Correlation: In regression analysis, an equation is estimated which relates a dependent (or unknown) variable to one or more independent variables. Correlation analysis determines the degree to which the variables are related, i.e. how well the estimating equation actually describes the relationship. In the case of model validation, we determine the degree to which observed and estimated values are related. The most commonly used measure of correlation is the coefficient of determination R2, which describes the amount of variation in the dependent variable which is explained by the regression equation. R2 can range from 0 to 1, with a value of 0 for no correlation and 1 for perfect correlation. Acceptable values of R2 can vary depending on the type of comparison being made, but it would ideally explain more than half of the variation (R2 > 0.5). Note that as aggregation increases, the amount of correlation will increase.
- Variance: Statistical measures can be calculated which measure the variance between observed and estimated values. The most common measure for validation purposes is the Percent Root Mean Square Error (RMSE) which is described in section 7.1.3 Highway Assignment.
These validation tests can be easily calculated with a spreadsheet, database, or statistical package. For example, to estimate a regression line, most spreadsheet packages simply require that the observed and estimated values be placed in columns - the regression equation and R2 are calculated using a simple command. For additional information, you may want to consult an introductory statistics textbook.
1.4.2 Level of Aggregation
Some researchers differentiate between the calibration procedures used for aggregate or first-generation models, such as zone-based regression models, and the disaggregate or second-generation models, such as individual-based choice models. With the first-generation models, calibration may involve trial-and-error adjustment of parameters which improve the overall goodness of fit between the model results and the observed data. With the second-generation models, much more attention is placed on the statistical properties of the parameters and the confidence limits of the estimated values.Similar to calibration procedures, validation checks also vary by the level of aggregation. There is a continuum of checks ranging from validation using disaggregate data at the household level to aggregate results at the regional level. In the middle would be validation checks using the models applied to zonal data. For state-of-the-art disaggregate models, the entire range of checks is needed to ensure that the models can reproduce not only the travel behavior of individual households, but also the resulting performance of the transportation system when all of the individual trips are aggregated over the entire metropolitan area. The two ends of the continuum are defined below:
Disaggregate Validation provides a means of exploring how well a candidate model fits the observed data at the household or individual level. It involves defining subgroups of observations, based, for example, on household size and income or auto ownership levels. Model predictions are compared with observed data to reveal systematic biases. Note that disaggregate validation plays more of a role in the estimation phase of model development
Aggregate Validation provides a general overview of model performance through regional travel characteristics such as average trip rates, average trip lengths, average mode shares, and regional vehicle-miles of travel (VMT). Reasonable ranges for model parameter values have been included in the manual for comparative purposes. Travel models are applied to aggregate data at the regional, county, district, or zonal level. Traffic assignment results are validated at a regional level, using screenline volumes, and then at a local level, using cutline and individual link volumes.
1.4.3 Validation Data Sources
In order to sufficiently prove a model has been validated, the model should match observed data from an independent data source. Each chapter of this manual will discuss necessary validation data sources in detail.While not an independent source, the calibration data set (typically from a household travel survey) is used in validation. Other travel surveys may be available for validation such as workplace/establishment, on-board transit, roadside origin-destination, and external cordon surveys. The Census Public Use Microdata Sample (PUMS) provides socioeconomic and travel behavior data at the household level.
For disaggregate models, particularly choice models with a large enough sample, a validation sample can be created by splitting the observed data set into two random groups. One sample is used for calibration, and the calibrated models are used to predict the second group's demand. A similar approach identifies stratification biases within the population by applying the models to a segment of the calibration data set. While this process does provide an independent set of observations, it lacks temporal variation.
The best estimate of socioeconomic data should be available locally, although these inputs should still be reviewed for reasonableness, particularly changes over time. Transportation system data can be compiled from other public agencies, such as the local highway administration or transit operator. Typical validation data includes daily and peak hour traffic volumes at screenlines, cutlines, critical links, and transit boardings by route.
A number of national data summaries provide comparative data including:
- FHWA's Highway Performance Monitoring System
- Census Transportation Planning Package
- Nationwide Personal Transportation Survey
Comparisons can also be made with observed data from other similar metropolitan areas. NCHRP Report 187 has recently been updated in the forthcoming report 365, Travel Estimation Techniques for Urban Planning. The transferable parameters contained in this report are useful for validation purposes.
Zonal socioeconomic input data and transportation system performance data should be collected for the same base year. Since virtually all transportation models have been based on cross-sectional survey data, there has been a tendency to view validation exclusively in terms of the ability of the model to match observed traffic volumes for a single base year. However, individual model components and the overall set of models should also be tested by predicting demand for a different historical time period than was used for calibration. When the models are applied to historical data, this is often referred to as backcasting. Unfortunately, consistent historical data for more than one time period are rarely available.
1.4.4 Sources of Error
Even when models reasonably reproduce their portions of regional travel, they are not without error. Error is inherent in all models since they are abstractions of real travel behavior; simplifications of reality are unavoidable in order to make the models usable and practical. Sources of error resulting from development and calibration of travel models include:- Measurement Errors inherent in the process of measuring data in the base year, such as survey questions, network coding and digitizing errors, etc. resulting from poor data quality control.
- Sampling Errors such as bias introduced in the process of selecting the set of observations from the population.
- Computational Errors due to arithmetic mistakes, which are typically small for computer-based calculations
- Specification Errors due to improper structure of the model, such as omission of relevant variable.
- Transfer Errors when a model or parameters developed for one context or region is applied in a different one.
- Aggregation Errors arising from the need to forecast for groups of individuals (or households) while modeling needs to be done at the level of the individual.
A major concern for validation of travel models is error inherent in the collection of input data or historical data used for validation. Problems with input data or validation data can lead to erroneous corrections to models that, ultimately, will damage model performance, credibility, and results. For example, if daily traffic counts collected at screenlines are low due to incorrect collection methods, the analyst may attempt to increase auto occupancy rates or lower trip rates in order to match the screenlines. This suggests that a course of action for responding to models that do not validate is to check for errors first, then consider adjustments to parameters. Throughout the planning process, it is important to periodically perform a peer review of networks, socioeconomic inputs, and modeling procedures. Involving more than one person in the review process will often improve results and force the modeler to re-examine steps taken.
Figure 1-3 shows the possible effect of compounding error in model validation. Each step in the modeling process increases the overall error. While there is a potential for the errors to offset each other, there is no guarantee that they will.
Figure 1-3
Effect of Compounding
Error in Model Validation (from course materials)

1.4.5 Accuracy Requirements
There are no absolute measures or thresholds that can be achieved to declare a travel model or its components "validated." The level of accuracy expected of a model is somewhat subjective, and ultimately depends on the time and resources available, and on the intended application of the model. For example:- Emissions estimates for air quality analysis require accurate summaries of VMT by speed range.
- Individual link volumes are not as critical in a long-range regional sketch plan as in a sub-area traffic impact study.
- Consideration of significant land-use changes introduces additional uncertainties and interactions into future year alternatives analysis.
- Transit contributions can vary considerably among metropolitan areas, as do the level of analysis and the complexity of representation of transit in various models.
Table 1-1 shows the estimated accuracy of some parameters in the travel modeling process. Accuracy tends to be greatest on higher volume links and screenlines. The confidence limits also show that, due to error propagation, assignment results tend to contain more error than earlier steps in the process such as trip distribution.
Table 1-1
Estimated Accuracy of Some
Parameters in the Travel Modeling Process
| Parameter | Typical Magnitude | 95 Percent Confidence Limit |
|---|---|---|
| Zonal Generation | 2,000 person trips | ± 50% |
| Interzonal Movement | Small | Extremely Inaccurate |
| Major Trip Interchange | 40,000 person trips | ± 10% |
| Minor Trip Interchange | 15,000 person trips | ± 16% |
| Highway Link Loading: | ||
| Minor Link | 5,000 vehicles | ± 55% |
| Average Link | 20,000 vehicles | ± 27% |
| Major Link | 50,000 vehicles | ± 17% |
| Public Transit Loading: | ||
| Average Urban Link | 5,000 passengers | > ± 46% |
| Major urban link | 20,000 passengers | > ± 23% |
| Source: J. Robbins, "Mathematical Models - the Error of Our Ways," Traffic Engineering + Control, Vol. 18, No. 1, January 1978, p.33. | ||
The reliability of a model validation effort is always constrained by the quality and quantity of validation data available. There is some error inherent in even the best data. Traffic counts alone can vary by 10 percent or more due to daily and seasonal variation (FHWA Guide to Urban Traffic Volume Counting, 1980). Other sources of count error include improper count location, variation in the portion of multi-axle vehicles, special events, accidents, mechanical count failure, and personnel mistakes.
Sources of significant uncertainty or potential error should be identified early in an effective validation process. Thorough knowledge of a model's design, inputs, and applications is needed to recognize if a point-of-diminishing-returns has been reached. It is important to recognize that uncertainty is inevitable, and to avoid confusing precision with accuracy.
1.5 Organization of Manual
The remainder of the Validation Manual is divided into the following chapters:
Chapter 2 discusses reasonableness checks for input data, including zonal socioeconomic data and network inputs. While these checks are not actually model validation checks, a tremendous amount of time can be wasted testing and adjusting models when the problem is with input data. Thus, a separate chapter has been devoted to this subject.
Chapters 3 through 7 discuss validation techniques and reasonableness checks for model parameters and outputs for each of the following travel model elements:
- Trip Generation
- Socioeconomic Disaggregation
- Trip Production
- Trip Attraction
- External Travel
- Trip Distribution
- Estimating Travel Impedances
- Gravity Model
- Mode Choice
- Nested Logit Model
- Auto Occupancy
- Time-of-Day/Direction Split Factors
- Traffic Assignment
- Highway Assignment
- Transit Assignment
Chapters 3 through 7 focus on standard four-step models. However, concepts presented in these chapters should lead the reader to reasonable validation checks for non-traditional modeling processes. Each chapter discusses strategies for systematic troubleshooting of validation problems. The highway assignment section also includes examples of validation targets used to validate the overall modeling process after the initial calibration of each component.
Appendices are included at the end of the manual which provide specific examples of parameters and travel characteristics for a number of metropolitan areas.


