SkyScenes: A Synthetic Dataset for Aerial Scene Understanding

Sahil Khose^*, Anisha Pal^*, Aayushi Agarwal^*, Deepanshi^*, Judy Hoffman, Prithvijit Chattopadhyay

Georgia Institute of Technology
^*Indicates Equal Contribution

SkyScenes has been accepted at ECCV 2024 !

SkyScenes comprises of 33.6k aerial images curated from aerial oblique viewpoints with controlled variations facilitating reproducibility of viewpoints across different weather and daytime conditions (col 1), different flying altitudes (col 2) and different viewpoint pitch angles (col 3), across different map layouts (col 4) with dense pixel-level semantic, instance and depth annotations (col 5).

Abstract

Real-world aerial scene understanding is limited by a lack of datasets that contain densely annotated images curated under a diverse set of conditions. Due to inherent challenges in obtaining such images in controlled real-world settings, we present SkyScenes, a synthetic dataset of densely annotated aerial images captured from Unmanned Aerial Vehicle (UAV) perspectives. We carefully curate SkyScenes images from CARLA to comprehensively capture diversity across layouts (urban and rural maps), weather conditions, times of day, pitch angles and altitudes with corresponding semantic, instance and depth annotations.

Through our experiments using SkyScenes, we show that :

Models trained on SkyScenes generalize well to different real-world scenarios
Augmenting training on real images with SkyScenes data can improve real-world performance
Controlled variations in SkyScenes can offer insights into how models respond to changes in viewpoint conditions(height and pitch), weather and time of day
Incorporating additional sensor modalities (depth) can improve aerial scene understanding

Why use SkyScenes?

SkyScenes → Real Transfer

(a) SkyScenes trained models generalize well to real-settings

Models trained on SkyScenes exhibit strong out-of-the box generalization performance on multiple real world datasets. We find that SkyScenes pretraining exhibits strong generalization across both CNN and transformer segmentation backbones.

❮ ❯

(b) SkyScenes can augment real training data

In addition to zero-shot real-world generalization, we also show how SkyScenes is useful as additional training data when labeled real-world data is available. We find that in low-shot regimes (when little “real” world data is available),SkyScenes data (either explicitly via joint training or implicitly via finetuning) is beneficial in improving recognition performance.

❮ ❯

SkyScenes as a Diagnostic Framework

The variations available in SkyScenes across town layouts, weather and daytime conditions, height and pitch settings enable us to assess how sensitive trained models are to specific factors like height, pitch, daytime, weather, or town layout.

❮ ❯