SkyScenes: A Synthetic Dataset for Aerial Scene Understanding

Georgia Institute of Technology
*Indicates Equal Contribution

SkyScenes comprises of 33.6K aerial images curated from UAV perspectives under different weather and daytime conditions, different flying altitudes, different viewpoint pitch angles, different map layouts (rural and urban) with supporting dense pixel level semantic, and depth annotations. SKYSCENES not only serves as a synthetic source dataset to train real-world generalizable models, but can also augment real data for improved real-world performance

Abstract

Real-world aerial scene understanding is limited by a lack of datasets that contain densely annotated images curated under a diverse set of conditions. Due to inherent challenges in obtaining such images in controlled real-world settings, we present SkyScenes, a synthetic dataset of densely annotated aerial images captured from Unmanned Aerial Vehicle (UAV) perspectives.

We carefully curate SkyScenes images from CARLA to comprehensively capture diversity across layout (urban and rural maps), weather conditions, times of day, pitch angles and altitudes with corresponding semantic, instance and depth annotations.

Through our experiments using SkyScenes, we show that :

  1. Models trained on SkyScenes generalize well to different real-world scenarios
  2. Augmenting training on real images with SkyScenes data can improve real-world performance
  3. Controlled variations in SkyScenes can offer insights into how models respond to changes in viewpoint conditions
  4. Incorporating additional sensor modalities (depth) can improve aerial scene understanding

Video

SkyScenes → Real Transfer

(a) SkyScenes trained models generalize well to real-settings

Models trained on SkyScenes exhibit strong out-of-the box generalization performance on multiple real world datasets. We find that SkyScenes pretraining exhibits strong generalization across both CNN and transformer segmentation backbones.

(b) SkyScenes can augment real training data.

In addition to zero-shot real-world generalization, we also show how SkyScenes is useful as additional training data when labeled real-world data is available. We find that in low-shot regimes (when little “real” world data is available),SkyScenes data (either explicitly via joint training or implicitly via finetuning) is beneficial in improving recognition performance.

SkyScenes as a diagnostic framework

The variations available in SkyScenes across town layouts, weather and daytime conditions, height and pitch settings enable us to assess how sensitive trained models are to specific factors like height, pitch, daytime, weather, or town layout.

-->

BibTeX

@misc{khose2023skyscenes,
      title={SkyScenes: A Synthetic Dataset for Aerial Scene Understanding}, 
      author={Sahil Khose and Anisha Pal and Aayushi Agarwal and Deepanshi and Judy Hoffman and Prithvijit Chattopadhyay},
      year={2023},
      eprint={2312.06719},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}