With Xiuchuan Liu and Xunan Yang, two talented Ph.D. students in statistics at the University of South Carolina.
1. Dataset and data preprocess
(1) Data from SCDOT(South Carolina Department of Transportation)
- Road shapefile (including highway, interstates, etc)
- Traffic count (Average Daily Traffic)
- Link: https://info2.scdot.org/GISMapping/Pages/GIS.aspx
(2) Data from SCDPS (South Carolina Department of Public Safety)
- Traffic accidents data
- Fatal traffic accidents data
- Link: https://scdps-gis-and-mapping-scdps.hub.arcgis.com/search?collection=Dataset
(3) DEM Data from USGS
Download link: https://apps.nationalmap.gov/downloader/
(4) Slope and Curvature calculation
(1) Slope: We calculate slope using ArcGIS, the ArcGIS document Link: https://pro.arcgis.com/en/pro-app/latest/tool-reference/spatial-analyst/how-slope-works.htm
(2) Curvature:
The curvature of a road is quantified using the radius of a circle that passes through three consecutive points along the road. This radius is also called the radius of curvature. Smaller radii indicate sharper turns; larger radii indicate straighter segments.
Assume three sequential points on the road:
- $P_1 = (x_1, y_1)$
- $P_2 = (x_2, y_2)$
- $P_3 = (x_3, y_3)$
We want to find the radius $R$ of the circumcircle passing through these three points.
- Compute the Determinant $D$
This value helps locate the circle’s center (circumcenter):
\[D = 2 \cdot (x_1(y_2 - y_3) + x_2(y_3 - y_1) + x_3(y_1 - y_2))\]- Compute Coordinates of Circumcenter $(x_c, y_c)$
These are the coordinates of the center of the circle:
\[x_c = \frac{(x_1^2 + y_1^2)(y_2 - y_3) + (x_2^2 + y_2^2)(y_3 - y_1) + (x_3^2 + y_3^2)(y_1 - y_2)}{D}\] \[y_c = \frac{(x_1^2 + y_1^2)(x_3 - x_2) + (x_2^2 + y_2^2)(x_1 - x_3) + (x_3^2 + y_3^2)(x_2 - x_1)}{D}\]- Compute the Radius $R$
The radius is the Euclidean distance from the center $(x_c, y_c)$ to any of the three points (e.g., $P_1$):
\[R = \sqrt{(x_1 - x_c)^2 + (y_1 - y_c)^2}\]- Repeat Along the Line
To compute the curvature along the entire road line:
- Slide a 3-point window along the line.
- Calculate $R$ at each step using the above method.
- Assign $R$ to the middle point.
- Repeat for all points.
2. Modeling Framework: Poisson Point Process
In this study, we model the occurrence of fatal road traffic accidents as a spatial Poisson point process. This approach treats the observed fatalities as realizations of a stochastic process occurring over a continuous spatial domain (in our case, the road network of South Carolina).
The log-likelihood of a nonhomogeneous Poisson point process is given by:
\[\log L(\lambda) = \sum_{i=1}^n \log \lambda(x_i) - \int_W \lambda(x) \, dx\]Where:
- $\lambda(x)$: The intensity function, representing the expected number of fatal events per unit area (or road length) at location ( x )
- $x_i$: Coordinates of observed fatal accidents
- $W$: The spatial domain (the road network)
Since the integral $\int_W \lambda(x) dx$ is typically intractable in real-world applications, we approximate it using Monte Carlo integration over a set of background (non-fatal) points:
\[\int_W \lambda(x) dx \approx \sum_{j=1}^m w_j \cdot \lambda(x_j)\]Where:
- $x_j$: Sampled background locations
- $w_j$: Weights that represent the area or road segment each background point stands for
- $m$: Total number of background (non-event) points
This transforms the full likelihood into a form that is implementable via weighted Poisson regression.
3. The Meaning of Weights
In our point process modeling, the weights serve to approximate the integral term in the likelihood using a discrete set of non-event (background) points.
-
For fatalities, we assign: \(w_i = 1\) (It’s not a real “weight”)
-
For background points, we assign: \(w_j = \frac{A}{m}\) Where:
- $A$: Total length of roads (in 100-meter units)
- $m$: Number of background points sampled
This weighting ensures that the background points collectively approximate the spatial opportunity for a fatal crash to occur. It is not a sampling weight in the survey sense, but a numerical approximation of integration.
Our background points are non-fatal crash locations drawn from the same traffic accident dataset, which assumes that the exposure to fatality is proportional to exposure to any crash. Hence, the model estimates the fatality intensity conditional on crash occurrence.
4. Correlation and Dependence Between Predictors
We evaluated potential multicollinearity among the continuous predictors: Slope, Curvature, and AADT (Average Annual Daily Traffic).
Pearson Correlation Matrix
Slope | Curvature | AADT | |
---|---|---|---|
Slope | 1.000000 | -0.107572 | -0.166621 |
Curvature | -0.107572 | 1.000000 | 0.044780 |
AADT | -0.166621 | 0.044780 | 1.000000 |
All pairwise correlations are weak (absolute values < 0.2), indicating negligible linear dependence between variables.
Variance Inflation Factor (VIF)
Variable | VIF |
---|---|
const | 8.647327 |
Slope | 1.042054 |
Curvature | 1.013978 |
Night | 1.009383 |
Surface | 1.003594 |
Workzone | 1.004390 |
AADT | 1.037288 |
All VIFs are close to 1.0, well below the standard threshold of 5 or 10 for concern. This confirms that multicollinearity is not present in the dataset.
5. Covariate Diagnostics via Sliding Window Regression (All Roads)
To investigate whether the effects of continuous covariates remain stable across their ranges, we conducted a sliding window Poisson regression for:
- AADT (Average Annual Daily Traffic)
- Curvature
- Slope
Each diagnostic follows these steps:
- The dataset is sorted by the covariate of interest (e.g., AADT, slope), so that similar values are grouped together.
- A fixed-size window (e.g., 200,000 points) is slid along the ordered dataset with a fixed step size (e.g., 5,000 points), generating overlapping subsets of data.
- In each window, we fit a Poisson regression model of the form:
\(\log(\lambda_i) = \beta_0 + \beta_1 x_{i1} + \beta_2 x_{i2} + \cdots + \beta_k x_{ik}\) where:
- $\lambda_i$ is the expected number of fatalities at observation $i$,
- $x_{ij}$ are predictor variables (e.g., AADT, slope, curvature),
- $\beta_j$ are the regression coefficients.
- From each model, we extract the coefficient estimate $\hat{\beta}_j$ associated with the covariate of interest in that window.
- The resulting sequence of coefficients is smoothed using LOESS, and we construct a 95% confidence ribbon to visualize the local trend and uncertainty.
This approach allows us to assess whether the effect of each covariate is spatially or contextually stable, or if it varies across the dataset.
AADT: Effect Weakens at Higher Traffic Volume
The AADT coefficient is consistently negative, indicating that roads with higher traffic volumes are less likely to experience fatal crashes. However, the magnitude of this negative effect weakens as traffic increases.
Curvature: Consistent Mild Risk
The curvature coefficient remains negative across all windows, suggesting that more sharply curved roads are associated with higher fatality risk. The effect is small but consistent.
Slope: Increasingly Positive Association
The slope effect transitions from negative at low gradients to increasingly positive at higher slopes, indicating elevated fatality risk in steep terrain.
6. Spatial Distribution of Estimated Fatal Crash Intensity $\hat{\lambda}(x)$
Areas of high intensity (brighter values) correspond to major intersections, dense urban corridors, and highly sloped or curved roads with low AADT — reflecting model-predicted regions of elevated fatal crash likelihood.
7. GLM Results: Poisson Point Process on All Roads
Variable | Coefficient | Std. Error | z-value | p-value |
---|---|---|---|---|
Intercept | -0.7368 | 0.046 | -15.88 | <0.001 |
Slope | 0.0192 | 0.008 | 2.43 | 0.015 |
Curvature | -1.16e-05 | 8.38e-06 | -1.38 | 0.167 |
Night | 0.5429 | 0.032 | 17.12 | <0.001 |
Surface | -0.0553 | 0.043 | -1.29 | 0.198 |
Workzone | -0.1474 | 0.131 | -1.12 | 0.262 |
AADT | -1.51e-05 | 1.43e-06 | -10.54 | <0.001 |
- Log-Likelihood: −6522.0
- Deviance: 4950.0
- Pearson Chi-square: 3830.0
- Pseudo R² (Cragg & Uhler’s): 0.0012
- Degrees of Freedom: 399993
8. Road-Type-Specific Modeling: Highways vs. Local Roads
Poisson GLM Summary: Highways and Interstates
Variable | Coef | Std. Err | z-value | p-value |
---|---|---|---|---|
Intercept | -0.5305 | 0.059 | -9.047 | <0.001 |
Slope | 0.0162 | 0.009 | 1.728 | 0.084 |
Curvature | -6.14e-06 | 1.06e-05 | -0.582 | 0.561 |
Night | 0.4358 | 0.038 | 11.610 | <0.001 |
Surface | -0.0401 | 0.051 | -0.792 | 0.428 |
Workzone | -0.1402 | 0.139 | -1.009 | 0.313 |
AADT | -9.67e-06 | 1.42e-06 | -6.810 | <0.001 |
- Log-Likelihood: −4082.4
- Deviance: 2402.7
- Pseudo R²: 0.0029
Poisson GLM Summary: Local Roads
Variable | Coef | Std. Err | z-value | p-value |
---|---|---|---|---|
Intercept | -0.9337 | 0.079 | -11.825 | <0.001 |
Slope | 0.0181 | 0.015 | 1.206 | 0.228 |
Curvature | -4.20e-05 | 1.44e-05 | -2.920 | 0.003 |
Night | 0.5859 | 0.059 | 9.930 | <0.001 |
Surface | -0.0878 | 0.081 | -1.088 | 0.277 |
Workzone | 0.2232 | 0.380 | 0.588 | 0.557 |
AADT | -4.31e-05 | 4.59e-06 | -9.394 | <0.001 |
- Log-Likelihood: −2301.6
- Deviance: 2251.2
- Pseudo R²: 0.0024
Spatial Distribution of $\hat{\lambda}(x)$: Highway vs. Local Roads
The intensity of fatal crashes $\hat{\lambda}(x)$ is highest on highway junctions and corridors. In contrast, local roads exhibit spatially dispersed intensity, often centered around urbanized clusters and curved segments.
Sliding Window Analysis of AADT Coefficient
This contrast suggests that AADT plays a more substantial role in fatal crash suppression on local roads, where increases in traffic volume may lead to lower speeds and improved visibility.