Collaborators: Xiuchuan Liu and Xunan Yang, Ph.D. students in Statistics at the University of South Carolina.
Paper Link: Spatial Model of Crash Fatalities by Examining Effects of Infrastructural Factors
1. Dataset and Preprocessing
(1) Infrastructure Data (SCDOT)
Data retrieved from the South Carolina Department of Transportation (SCDOT):
- Road Network: Shapefiles including highways, interstates, and local roads.
- Traffic Volume: Average Annual Daily Traffic (AADT).
- Source: SCDOT GIS Mapping
(2) Crash Data (SCDPS)
Data retrieved from the South Carolina Department of Public Safety (SCDPS):
- All traffic accidents.
- Fatal traffic accidents (subset).
- Source: SCDPS GIS Hub
(3) Terrain Data (USGS)
Digital Elevation Models (DEM) were used to derive slope information.
- Source: USGS National Map Downloader
(4) Deriving Slope and Curvature
1. Slope: Calculated using the ArcGIS Spatial Analyst toolset. (Documentation)
2. Curvature: Road curvature is quantified by calculating the radius of the circumcircle passing through three consecutive points along the road geometry. A smaller radius indicates a sharper turn, while a larger radius indicates a straighter segment.
Consider three sequential points on the road:
- $P_1 = (x_1, y_1)$
- $P_2 = (x_2, y_2)$
- $P_3 = (x_3, y_3)$
We calculate the radius $R$ of the circumcircle defined by these points.
Step 1: Compute the Determinant $D$ This value is required to locate the circumcenter:
\[D = 2 \cdot (x_1(y_2 - y_3) + x_2(y_3 - y_1) + x_3(y_1 - y_2))\]Step 2: Compute Circumcenter Coordinates $(x_c, y_c)$
\[x_c = \frac{(x_1^2 + y_1^2)(y_2 - y_3) + (x_2^2 + y_2^2)(y_3 - y_1) + (x_3^2 + y_3^2)(y_1 - y_2)}{D}\] \[y_c = \frac{(x_1^2 + y_1^2)(x_3 - x_2) + (x_2^2 + y_2^2)(x_1 - x_3) + (x_3^2 + y_3^2)(x_2 - x_1)}{D}\]Step 3: Compute the Radius $R$ The radius is the Euclidean distance from the center $(x_c, y_c)$ to any of the three points (e.g., $P_1$):
\[R = \sqrt{(x_1 - x_c)^2 + (y_1 - y_c)^2}\]Step 4: Moving Window Implementation To compute curvature along the entire road network, we slide a 3-point window along the line vertices, calculate $R$ for each triplet, and assign the value to the middle point.
2. Modeling Framework: Poisson Point Process
We model the occurrence of fatal traffic accidents as a spatial Poisson point process (PPP). This approach treats fatalities as realizations of a stochastic process occurring over a continuous spatial domain (the South Carolina road network).
The log-likelihood of a nonhomogeneous Poisson point process is given by:
\[\log L(\lambda) = \sum_{i=1}^n \log \lambda(x_i) - \int_W \lambda(x) \, dx\]Where:
- $\lambda(x)$: The intensity function, representing the expected number of fatal events per unit area (or road length) at location $x$.
- $x_i$: Coordinates of observed fatal accidents.
- $W$: The spatial domain (the road network).
Since the integral $\int_W \lambda(x) dx$ is analytically intractable for complex road networks, we approximate it using Monte Carlo integration over a set of background (non-fatal) points. This technique is known as a numerical quadrature scheme:
\[\int_W \lambda(x) dx \approx \sum_{j=1}^m w_j \cdot \lambda(x_j)\]Where:
- $x_j$: Sampled background locations.
- $w_j$: Quadrature weights representing the road length each point accounts for.
- $m$: Total number of background points.
This transformation allows us to implement the model using weighted Poisson regression.
3. The Role of Weights
In this framework, weights are used to approximate the integral term of the likelihood function using a discrete set of background points.
-
For Fatalities (Events): \(w_i = 1\) (Note: This indicates an observed event presence).
-
For Background Points (Pseudo-Absences): \(w_j = \frac{A}{m}\) Where:
- $A$: Total length of roads (in 100-meter units).
- $m$: Number of background points sampled.
This weighting scheme ensures that the background points collectively approximate the spatial opportunity for a crash to occur. It is not a survey sampling weight, but rather a numerical approximation of integration.
Our background points are sampled from non-fatal crash locations. By using all crashes as the background domain, the model estimates the fatality intensity conditional on crash occurrence.
4. Correlation and Dependence
We evaluated potential multicollinearity among the continuous predictors: Slope, Curvature, and AADT (Average Annual Daily Traffic).
Pearson Correlation Matrix
| Slope | Curvature | AADT | |
|---|---|---|---|
| Slope | 1.000000 | -0.107572 | -0.166621 |
| Curvature | -0.107572 | 1.000000 | 0.044780 |
| AADT | -0.166621 | 0.044780 | 1.000000 |
| All pairwise correlations are weak ($ | r | < 0.2$), indicating negligible linear dependence between variables. |
Variance Inflation Factor (VIF)
| Variable | VIF |
|---|---|
| const | 8.647327 |
| Slope | 1.042054 |
| Curvature | 1.013978 |
| Night | 1.009383 |
| Surface | 1.003594 |
| Workzone | 1.004390 |
| AADT | 1.037288 |
All VIF values are approximately 1.0, well below the standard threshold of 5. This confirms that multicollinearity is not present.
5. Covariate Diagnostics: Sliding Window Regression
To test if the effects of continuous covariates are spatially or structurally stationary, we utilized sliding window Poisson regression.
Methodology:
- Sort: The dataset is sorted by the covariate of interest (e.g., increasing AADT).
- Window: A fixed-size window (e.g., 200,000 points) slides along the sorted data with a step size of 5,000.
- Fit: A Poisson regression is fitted within each window: \(\log(\lambda_i) = \beta_0 + \beta_1 x_{i1} + \dots + \beta_k x_{ik}\)
- Extract & Smooth: The coefficient ($\hat{\beta}$) for the variable of interest is extracted and smoothed using LOESS with a 95% confidence ribbon.
AADT: Effect Weakens at Higher Volumes
The AADT coefficient is consistently negative, implying that roads with higher traffic volume have a lower risk of fatal crashes (likely due to congestion reducing speeds). However, this protective effect weakens as traffic volume increases.
Curvature: Consistent Risk Factor
The curvature coefficient remains negative (recall that lower Radius = sharper turn). This suggests that sharper curves are consistently associated with higher fatality risk across the dataset.
Slope: Risk Increases with Gradient
The effect of slope transitions from negative at low gradients to increasingly positive on steeper terrain, indicating elevated fatality risk on high-gradient roads.
6. Spatial Distribution of Estimated Intensity $\hat{\lambda}(x)$
Brighter values indicate areas of high predicted fatality intensity. These correspond to major intersections, dense urban corridors, and high-risk geometries (steep/curved) with lower traffic volumes.
7. GLM Results: All Roads
| Variable | Coefficient | Std. Error | z-value | p-value |
|---|---|---|---|---|
| Intercept | -0.7368 | 0.046 | -15.88 | <0.001 |
| Slope | 0.0192 | 0.008 | 2.43 | 0.015 |
| Curvature | -1.16e-05 | 8.38e-06 | -1.38 | 0.167 |
| Night | 0.5429 | 0.032 | 17.12 | <0.001 |
| Surface | -0.0553 | 0.043 | -1.29 | 0.198 |
| Workzone | -0.1474 | 0.131 | -1.12 | 0.262 |
| AADT | -1.51e-05 | 1.43e-06 | -10.54 | <0.001 |
- Log-Likelihood: −6522.0
- Deviance: 4950.0
- Pearson Chi-square: 3830.0
- Pseudo R² (Cragg & Uhler’s): 0.0012
8. Comparative Modeling: Highways vs. Local Roads
Poisson GLM Summary: Highways and Interstates
| Variable | Coef | Std. Err | z-value | p-value |
|---|---|---|---|---|
| Intercept | -0.5305 | 0.059 | -9.047 | <0.001 |
| Slope | 0.0162 | 0.009 | 1.728 | 0.084 |
| Curvature | -6.14e-06 | 1.06e-05 | -0.582 | 0.561 |
| Night | 0.4358 | 0.038 | 11.610 | <0.001 |
| Surface | -0.0401 | 0.051 | -0.792 | 0.428 |
| Workzone | -0.1402 | 0.139 | -1.009 | 0.313 |
| AADT | -9.67e-06 | 1.42e-06 | -6.810 | <0.001 |
Poisson GLM Summary: Local Roads
| Variable | Coef | Std. Err | z-value | p-value |
|---|---|---|---|---|
| Intercept | -0.9337 | 0.079 | -11.825 | <0.001 |
| Slope | 0.0181 | 0.015 | 1.206 | 0.228 |
| Curvature | -4.20e-05 | 1.44e-05 | -2.920 | 0.003 |
| Night | 0.5859 | 0.059 | 9.930 | <0.001 |
| Surface | -0.0878 | 0.081 | -1.088 | 0.277 |
| Workzone | 0.2232 | 0.380 | 0.588 | 0.557 |
| AADT | -4.31e-05 | 4.59e-06 | -9.394 | <0.001 |
Spatial Distribution of $\hat{\lambda}(x)$ by Road Type
The intensity of fatal crashes $\hat{\lambda}(x)$ is highest on highway junctions and corridors. In contrast, local roads exhibit spatially dispersed intensity, often centered around urbanized clusters and curved segments.
Sliding Window Analysis: AADT Coefficient Comparison
This contrast suggests that AADT plays a more substantial role in fatal crash suppression on local roads. The negative association is stronger on local roads, where increased traffic volume likely forces lower speeds and higher driver vigilance compared to highways.