忘记概念的时候,回来多看几遍
Part 1: Fundamentals of Point Processes
1.1 What is a Point Process?
A point process is a random collection of events—or “points”—occurring in a mathematical space such as:
- A 1D timeline (e.g., times of phone calls),
- A 2D spatial domain (e.g., locations of trees, crimes, or disease cases),
- Or a 3D space-time continuum (e.g., earthquakes with locations and times).
A spatial point process models the random locations of events in space:
Let $W \subset \mathbb{R}^2$ be a bounded spatial domain. A realization of a point process $\mathbf{X}$ is a finite set of points:
\(\mathbf{X} = \{ s_1, s_2, \dots, s_n \}, \quad s_i \in W.\)
A spatio-temporal point process extends this by adding time: \(\mathbf{X} = \{ (s_1, t_1), (s_2, t_2), \dots, (s_n, t_n) \}, \quad s_i \in W, \, t_i \in T \subset \mathbb{R}.\)
Each realization is a “dot map” of where (and when) events occurred.
1.2 Why Use Point Processes?
Point processes capture not only the count of events, but also their pattern:
- Are events clustered, random, or regularly spaced?
- Are they stationary or inhomogeneous in space/time?
- Do they exhibit interaction (e.g., attraction or repulsion)?
These questions cannot be answered by simple aggregate counts.
1.3 Basic Types of Point Processes
- Homogeneous Poisson Process (HPP): Events are completely random; constant intensity $\lambda$.
- Inhomogeneous Poisson Process (IPP): Intensity $\lambda(s)$ varies with location.
- Cluster Processes (e.g., Thomas process): Points cluster around “parent” events.
- Inhibitory Processes (e.g., Strauss process): Points repel each other.
1.4 Assumptions and Properties of a Point Process
Let $W \subset \mathbb{R}^2$ denote a bounded spatial window (e.g., a study region), and let $N(B)$ denote the number of events (points) falling within a Borel-measurable subset $B \subset W$.
For a well-defined point process, the following structural properties are often assumed or examined:
Simplicity
A point process is said to be simple if it does not place multiple events at the same location.
Formally: \(\mathbb{P}(s_i = s_j) = 0 \quad \text{for all } i \neq j\)
This ensures that points are distinct and no two events occur at exactly the same coordinates.
Geospatial Interpretation:
- In most physical phenomena (e.g., tree locations, accident sites), it is unlikely for two events to be recorded at precisely the same point.
- However, in some applications (e.g., telecommunications or highly discrete datasets), non-simple processes may arise and need to be treated carefully (e.g., using marked or multi-type point processes).
Local Finiteness
A point process is locally finite if, for any bounded region $B \subset W$, the number of points in $B$ is finite almost surely:
\[\mathbb{P}(N(B) < \infty) = 1\]This condition prevents the occurrence of an infinite number of events in a finite region, which would be nonphysical.
Geospatial Interpretation:
- Ensures model realism: one cannot observe infinite car crashes, trees, or disease cases within a finite spatial extent.
- A necessary assumption for practical statistical inference and visualization.
Stationarity (Translation Invariance)
A point process is stationary if its statistical properties do not change when the entire process is translated in space.
Formally, let $\mathbf{X}$ be a point process and let $u \in \mathbb{R}^2$ be a translation vector. Then $\mathbf{X}$ is stationary if: \(\mathbf{X} \overset{d}{=} \mathbf{X} + u\) i.e., the distribution of $\mathbf{X}$ is the same as that of $\mathbf{X} + u$.
First-order stationarity implies that the intensity function $\lambda(s)$ is constant: \(\lambda(s) = \lambda > 0 \quad \text{for all } s \in W\)
Geospatial Interpretation:
- In urban planning, assuming stationarity would imply that events (e.g., accidents) are equally likely across the entire city—often unrealistic.
- Therefore, inhomogeneous models (non-stationary) are typically more appropriate for real-world data.
Isotropy (Rotational Invariance)
A point process is isotropic if its properties are invariant under rotations about any point (typically the origin).
Formally, for a rotation matrix $R \in \text{SO}(2)$, the process satisfies: \(\mathbf{X} \overset{d}{=} R \mathbf{X}\)
This implies that second-order properties such as pair correlation functions depend only on the distance between points, not the direction.
Geospatial Interpretation:
- An isotropic process implies there is no preferred direction in the spatial pattern.
- Violated in cases like:
- Linear features (e.g., road networks, rivers),
- Environmental gradients (e.g., pollution diffusion in wind direction),
- Urban corridors or coastline-aligned processes.
Summary Table
Property | Description | Practical Implication |
---|---|---|
Simplicity | No duplicate points | Ensures unique event locations |
Local Finiteness | Finite points in any bounded region | Prevents unrealistic infinite densities |
Stationarity | Invariant under translation | Events equally likely everywhere (if homogeneous) |
Isotropy | Invariant under rotation | No directional bias in spatial structure |
Common Violations
- Nonstationarity: Urban centers have higher intensity of events (accidents, crimes).
- Anisotropy: Events aligned along roads, rivers, coastlines, etc.
- Non-simplicity: Duplicated events (e.g., multiple GPS pings at same location).
- Local non-finiteness: Sensor noise or faulty data generation.
These violations guide the choice of models (e.g., Poisson vs Cox, homogeneous vs inhomogeneous, isotropic vs anisotropic kernels).
Part 2: Mathematical Framework
2.1 First-Order Properties — The Intensity Function
The intensity function $\lambda(s)$ is the foundational concept of the first-order structure in a point process. It describes the expected number of events per unit area at any spatial location $s \in W$:
\[\lambda(s) = \lim_{|ds| \to 0} \frac{\mathbb{E}[N(ds)]}{|ds|}\]where:
- $ds$ is a small area around location $s$,
- $N(ds)$ is the number of points falling within $ds$,
- $\mathbb{E}[\cdot]$ denotes expectation.
🔹 Units and Interpretation
- Units: typically points per km² or per m².
- Interpretation: high $\lambda(s)$ indicates a denser expected concentration of events near $s$.
🔹 Homogeneous Poisson Point Process (HPPP)
In the homogeneous case, the intensity is constant throughout the domain:
\[\lambda(s) = \lambda > 0 \quad \text{for all } s \in W\]Let $B \subset W$ be a region with area $ | B | $, |
Then:
- The number of points $N(B)$ in $B$ follows a Poisson distribution:
- Events are independent: the number of points in disjoint regions are independent random variables.
🔹 Inhomogeneous Poisson Point Process (IPPP)
In the inhomogeneous case, the intensity varies with location: \(\lambda(s) : W \rightarrow \mathbb{R}^+\)
Then:
- For any region $B$:
- The expected number of points in $B$ is:
This model accounts for spatial inhomogeneity (e.g., higher intensity of crime in urban cores vs suburbs).
2.2 Estimating Intensity from Observed Data
🔹 Kernel Smoothing Estimator
Given a realization of $n$ observed points ${s_1, \dots, s_n}$, the intensity function $\lambda(s)$ can be nonparametrically estimated using kernel smoothing:
\[\hat{\lambda}_h(s) = \frac{1}{n} \sum_{i=1}^n \frac{1}{h^2} \, K\left( \frac{\lVert s - s_i \rVert}{h} \right)\]where:
- $h$ is the bandwidth, controlling the smoothing level,
- $K(\cdot)$ is the kernel function (e.g., Gaussian, Epanechnikov),
- $\lVert s - s_i \rVert$ is the Euclidean distance between evaluation point $s$ and observation $s_i$.
Common kernel choices:
- Gaussian: $K(u)$ = $\frac{1}{2\pi} e^{-u^2/2}$
-
Epanechnikov: $K(u) = \frac{3}{4}(1 - u^2) for u <1$
This yields a smooth surface approximation of intensity across space — analogous to KDE (Kernel Density Estimation).
🔹 What is Kernel Smoothing?
Kernel smoothing is a nonparametric technique that estimates a continuous function (here, the intensity surface) by averaging nearby observations with weights decreasing with distance.
It can be visualized as placing a “bump” (the kernel) over each observed point and summing all bumps to form a surface.
- Small $h$: high resolution, sensitive to local variation (but may overfit).
- Large $h$: smoother surface, reduces noise (but may oversmooth).
2.3 Regression Models in Point Processes
🔹 Is Regression Involved?
Yes — particularly for inhomogeneous processes, intensity regression models are often used.
The goal is to model $\lambda(s)$ as a function of covariates $Z(s)$: \(\lambda(s) = \exp\left( \beta_0 + \beta_1 Z_1(s) + \dots + \beta_p Z_p(s) \right)\)
This is analogous to a Poisson regression in generalized linear models (GLM), where:
- $\lambda(s)$ is the mean rate (log-linked),
- $Z_k(s)$ are spatial covariates (e.g., elevation, population density, distance to roads).
🔹 When is Regression Necessary?
Regression is not required for defining a point process but is:
- Necessary for inference, explanation, or prediction,
- Useful when exploring the relationship between spatial events and environmental or contextual variables.
In contrast, kernel smoothing is exploratory, while regression modeling enables hypothesis testing and covariate interpretation.
🔹 Example (Geographic Scenario)
Suppose you’re studying wildfire occurrences in California.
- Without covariates: Estimate intensity using kernel smoothing.
- With covariates: Fit an inhomogeneous Poisson process using:
- $Z_1(s)$: distance to nearest road,
- $Z_2(s)$: NDVI (vegetation index),
- $Z_3(s)$: historical temperature.
Then: \(\lambda(s) = \exp( \beta_0 + \beta_1 \cdot \text{RoadDist}(s) + \beta_2 \cdot \text{NDVI}(s) + \beta_3 \cdot \text{Temp}(s) )\)
This allows for both estimation and interpretation of spatial drivers.
🔍 2.4 Deriving Intensity for Inhomogeneous Poisson Point Process (IPPP) with Covariates
Suppose we observe $n$ events at locations $s_1, s_2, \dots, s_n \in W$ in a spatial domain $W \subset \mathbb{R}^2$.
Let $\mathbf{Z}(s) = (Z_1(s), \dots, Z_p(s))$ be a vector of $p$ spatial covariates observed at each location $s$.
We model the intensity function $\lambda(s)$ using a log-linear regression form:
🔹 Intensity Function (Log-Linear Model)
\[\lambda(s) = \exp\left( \beta_0 + \sum_{k=1}^p \beta_k Z_k(s) \right) = \exp\left( \mathbf{Z}(s)^\top \boldsymbol{\beta} \right)\]where:
- $\boldsymbol{\beta} = (\beta_0, \beta_1, \dots, \beta_p)^\top$ is the vector of regression coefficients,
- $Z_0(s) \equiv 1$ serves as the intercept term.
🔹 Likelihood Function for IPPP
In an IPPP, the number of points in any Borel set $B$ follows:
\[N(B) \sim \text{Poisson} \left( \int_B \lambda(s) \, ds \right)\]Let $\mathcal{S} = {s_1, \dots, s_n}$ be the observed points, and assume conditional independence of locations.
Then the likelihood of observing these $n$ points under intensity $\lambda(s)$ is:
This formula includes:
- The product of intensities at the observed event locations,
- A normalizing term to ensure the total intensity over the region $W$ matches the expected number of events.
🔹 Log-Likelihood Function
Taking the logarithm of the likelihood:
\[\log L(\boldsymbol{\beta}) = \sum_{i=1}^n \log \lambda(s_i) - \int_W \lambda(s) \, ds\]Substitute the log-linear form of $\lambda(s)$:
\[\log L(\boldsymbol{\beta}) = \sum_{i=1}^n \mathbf{Z}(s_i)^\top \boldsymbol{\beta} - \int_W \exp\left( \mathbf{Z}(s)^\top \boldsymbol{\beta} \right) ds\]This is the objective function maximized in Poisson point process regression (a type of spatial GLM).
🔹 Estimating Parameters
To obtain $\hat{\boldsymbol{\beta}}$, maximize $\log L(\boldsymbol{\beta})$ numerically, usually via:
- Newton-Raphson or Fisher scoring algorithms,
- Quasi-likelihood methods,
- Variational inference in Bayesian settings.
In practice, the integral: \(\int_W \exp\left( \mathbf{Z}(s)^\top \boldsymbol{\beta} \right) ds\) is approximated using quadrature, e.g., by discretizing $W$ into grid cells.
🔹 Interpretation of Coefficients
Each $\beta_k$ describes the log change in expected event intensity per unit increase in covariate $Z_k(s)$, holding others constant.
- $\beta_k > 0$ → increasing $Z_k$ leads to higher intensity (e.g., higher population density → more crime events),
- $\beta_k < 0$ → increasing $Z_k$ suppresses intensity.
Example: Fatal Accidents on Roads
Suppose you’re modeling crash locations with three covariates:
- $Z_1(s)$: AADT (traffic volume),
- $Z_2(s)$: slope of the road,
- $Z_3(s)$: road curvature.
Then: \(\lambda(s) = \exp\left( \beta_0 + \beta_1 \cdot \text{AADT}(s) + \beta_2 \cdot \text{Slope}(s) + \beta_3 \cdot \text{Curvature}(s) \right)\)
This model can be fitted using maximum likelihood and visualized as an estimated crash risk surface.
2.5 Second-Order Properties — Pair Correlation
Second-order properties study interaction between pairs of points. Define the second-order intensity:
\[\rho^{(2)}(s, s') = \lim_{|ds||ds'| \to 0} \frac{\mathbb{E}[N(ds)N(ds')]}{|ds||ds'|}\]The pair correlation function $g(s, s’)$ is:
\[g(s, s') = \frac{\rho^{(2)}(s, s')}{\lambda(s)\lambda(s')}\]Interpretation:
- $g(s, s’) > 1$: clustering
- $g(s, s’) = 1$: independence
- $g(s, s’) < 1$: inhibition
Part 3: Spatio-Temporal Point Processes
3.1 Definition
A spatio-temporal point process describes random events in both space and time. Formally:
Let $W \subset \mathbb{R}^2$ be the spatial domain, and $T \subset \mathbb{R}$ be the time interval.
The process generates a finite set: \(\mathbf{X} = \{(s_i, t_i)\}_{i=1}^n, \quad s_i \in W, \; t_i \in T.\)
This could model:
- Disease outbreak data (case locations and times),
- Traffic accidents (locations and timestamps),
- Wildlife sightings (GPS tracks + observation times).
3.2 Spatio-Temporal Intensity Function
The spatio-temporal intensity function $\lambda(s, t)$ gives the expected rate of event occurrence per unit space and time:
\[\lambda(s, t) = \lim_{|ds| \to 0, \, dt \to 0} \frac{\mathbb{E}[N(ds \times dt)]}{|ds| \cdot dt}\]If separable: \(\lambda(s, t) = \lambda_S(s) \cdot \lambda_T(t)\)
This separability simplifies modeling assumptions.
3.3 Conditional Intensity Function
When the process is history-dependent (e.g., self-exciting), we define the conditional intensity:
\[\lambda(s, t \mid \mathcal{H}_t) = \lim_{|ds|, dt \to 0} \frac{\mathbb{E}[N(ds \times dt) \mid \mathcal{H}_t]}{|ds| \cdot dt}\]- $\mathcal{H}_t$: history of the process up to time $t$.
- Used in Hawkes processes, where past events increase the likelihood of future events nearby.
3.4 Example Use Case (Urban Crashes)
Let $\lambda(s, t)$ be the intensity of fatal traffic crashes in Charleston over 24 hours.
- High $\lambda(s, t)$ at downtown intersections between 6-9 AM and 5-7 PM.
- Low $\lambda(s, t)$ in residential zones at night.
This structure can be inferred and predicted using kernel smoothing or Poisson regression.