🚀 PUBLISHED: This paper, "A GAN-enhanced deep learning framework for rooftop detection from historical aerial imagery," was published on July 20, 2025.

1. Introduction: Seeing the Past Clearly

Cities are constantly changing, and often the only record of our urban history is preserved in black-and-white, blurry aerial photographs. Precisely identifying rooftops from these historical images is crucial for analyzing the long-term evolution of urban development and human settlement patterns.

However, extracting data from these archives is difficult. Historical analog images suffer from limited spatial resolution, a complete lack of color information, and defects like overexposure.

Blurriness and Overexposure Examples
(a) Blurriness and low resolution; (b) Overexposure in historical imagery.

To address this, we propose a two-stage image enhancement pipeline based on Generative Adversarial Networks (GAN). We “colorize” the city and sharpen the details before using modern object detection models to identify rooftops one by one.

2. Study Area and Data

Our study focuses on Charleston, South Carolina, a historic port city founded in 1670. The city retains a large number of historic buildings (Colonial, Georgian, Victorian), making it an ideal case study for urban change.

We utilized eight black-and-white aerial images captured by the USDA in 1979, digitized and georeferenced by the University of South Carolina library.

Study Area Map
The study area covering Charleston, SC, overlaid with the 1979 aerial imagery mosaic.

3. Methodology: A GAN-Enhanced Pipeline

We constructed a workflow that combines image enhancement with deep learning object detection. The process involves three main steps:

  1. Colorization (DeOldify)
  2. Super-Resolution (Real-ESRGAN)
  3. Object Detection (YOLOv11, Faster R-CNN, DETR)
Technical Workflow
The overall framework: From raw B&W input -> Colorization -> Super-Resolution -> Detection.

3.1 Image Enhancement (The “Magic”)

Colorization: We used DeOldify, a GAN-based method that combines a U-Net generator with a ResNet-34 backbone. It restores semantic color (e.g., green trees vs. gray roofs), which is critical for modern detection models pretrained on colorful datasets.

GAN Architecture
The GAN architecture used for image enhancement (Generator and Discriminator).

Super-Resolution: To fix the blurriness, we applied Real-ESRGAN. This model generates sharper edges and richer textures, making the outline of small buildings distinct enough for detection.

3.2 Detection Model Training

We trained models like YOLOv11n using the enhanced images. The architecture uses CSPNet for feature extraction and a PAN neck for multi-scale fusion.

YOLO Architecture
The YOLOv11 detection pipeline: Input images + Box Labels -> Training -> Output Predictions.

4. Experimental Results

4.1 Visual Enhancement

The enhancement results were striking. The addition of color helps distinguish buildings from the surrounding vegetation, while super-resolution restores the structural edges of the rooftops.

Colorization Results
Comparison: (a) Original Black-and-white images vs (b) GAN-Colorized images.
Large Area Mosaic
A large-scale view of the colorized Charleston mosaic.

4.2 The Impact of Super-Resolution

When zooming in, the difference becomes even more apparent. Real-ESRGAN effectively removes noise and sharpens the building boundaries.

Super Resolution Comparison
Top: Original vs Upscaled B&W. Bottom: Original vs Upscaled Color. Note the sharpness of the roof edges in the upscaled versions.

4.3 Quantitative Performance

We compared four training scenarios:

  1. B&W Original
  2. Colored Original
  3. B&W Upscaled
  4. Colored Upscaled (Ours)

The Colored + Upscaled approach yielded the best performance. YOLOv11n achieved a mean Average Precision (mAP) of over 85%, which is a ~40% improvement over the raw B&W images.

Training Metrics Charts
Training curves showing Loss, Precision, Recall, and mAP. The Orange line (Colored) consistently outperforms the Blue line (B&W).

5. Discussion & Comparisons

We also benchmarked different enhancement models. Real-ESRGAN outperformed competitors like SwinIR and BSRGAN in maintaining realistic textures without introducing artifacts.

Comparison of SR Models
Visual comparison of different Super-Resolution models (BSRGAN vs SwinIR vs Real-ESRGAN).

Interestingly, we tested “Zero-Shot” large segmentation models like SAM (Segment Anything Model). They performed poorly on this historical data, often missing buildings or hallucinating shapes due to the domain gap between modern training data and 1979 aerial photos.

SAM Failure Cases
Failure cases using the Segment Anything Model (SAM) with different text prompts.

6. Conclusion

This framework bridges the gap between archival imagery and modern AI. By restoring the visual quality of 1979 aerial photos, we unlocked the ability to automatically map historical urban structures with high precision.


Editing & Layouts: Pengyu CHEN