1. Introduction: Seeing the Past Clearly
Cities are constantly changing, and often the only record of our urban history is preserved in black-and-white, blurry aerial photographs. Precisely identifying rooftops from these historical images is crucial for analyzing the long-term evolution of urban development and human settlement patterns.
However, extracting data from these archives is difficult. Historical analog images suffer from limited spatial resolution, a complete lack of color information, and defects like overexposure.
To address this, we propose a two-stage image enhancement pipeline based on Generative Adversarial Networks (GAN). We “colorize” the city and sharpen the details before using modern object detection models to identify rooftops one by one.
2. Study Area and Data
Our study focuses on Charleston, South Carolina, a historic port city founded in 1670. The city retains a large number of historic buildings (Colonial, Georgian, Victorian), making it an ideal case study for urban change.
We utilized eight black-and-white aerial images captured by the USDA in 1979, digitized and georeferenced by the University of South Carolina library.
3. Methodology: A GAN-Enhanced Pipeline
We constructed a workflow that combines image enhancement with deep learning object detection. The process involves three main steps:
- Colorization (DeOldify)
- Super-Resolution (Real-ESRGAN)
- Object Detection (YOLOv11, Faster R-CNN, DETR)
3.1 Image Enhancement (The “Magic”)
Colorization: We used DeOldify, a GAN-based method that combines a U-Net generator with a ResNet-34 backbone. It restores semantic color (e.g., green trees vs. gray roofs), which is critical for modern detection models pretrained on colorful datasets.
Super-Resolution: To fix the blurriness, we applied Real-ESRGAN. This model generates sharper edges and richer textures, making the outline of small buildings distinct enough for detection.
3.2 Detection Model Training
We trained models like YOLOv11n using the enhanced images. The architecture uses CSPNet for feature extraction and a PAN neck for multi-scale fusion.
4. Experimental Results
4.1 Visual Enhancement
The enhancement results were striking. The addition of color helps distinguish buildings from the surrounding vegetation, while super-resolution restores the structural edges of the rooftops.
4.2 The Impact of Super-Resolution
When zooming in, the difference becomes even more apparent. Real-ESRGAN effectively removes noise and sharpens the building boundaries.
4.3 Quantitative Performance
We compared four training scenarios:
- B&W Original
- Colored Original
- B&W Upscaled
- Colored Upscaled (Ours)
The Colored + Upscaled approach yielded the best performance. YOLOv11n achieved a mean Average Precision (mAP) of over 85%, which is a ~40% improvement over the raw B&W images.
5. Discussion & Comparisons
We also benchmarked different enhancement models. Real-ESRGAN outperformed competitors like SwinIR and BSRGAN in maintaining realistic textures without introducing artifacts.
Interestingly, we tested “Zero-Shot” large segmentation models like SAM (Segment Anything Model). They performed poorly on this historical data, often missing buildings or hallucinating shapes due to the domain gap between modern training data and 1979 aerial photos.
6. Conclusion
This framework bridges the gap between archival imagery and modern AI. By restoring the visual quality of 1979 aerial photos, we unlocked the ability to automatically map historical urban structures with high precision.
Editing & Layouts: Pengyu CHEN