Awesome Aerial Spatial Intelligence
Awesome Aerial Spatial Intelligence 
 Welcome to Awesome Aerial Spatial Intelligence, a curated collection of resources, datasets, and research papers focused on advancing aerial spatial intelligence. This repository is designed to support our upcoming survey on aerial perception, detection, and navigation, covering key advancements in ground-air collaboration, remote sensing, and autonomous aerial systems. We welcome contributions via pull requests (PRs) to keep this resource up-to-date and comprehensive.
Welcome to Awesome Aerial Spatial Intelligence, a curated collection of resources, datasets, and research papers focused on advancing aerial spatial intelligence. This repository is designed to support our upcoming survey on aerial perception, detection, and navigation, covering key advancements in ground-air collaboration, remote sensing, and autonomous aerial systems. We welcome contributions via pull requests (PRs) to keep this resource up-to-date and comprehensive.
⭐ Star the Repository
If you find this collection useful, please give it a star on GitHub to help others discover it!
Table of Contents
- 🌍 Ground-Air Collaborative Perception and Visual Geolocalization
- 🛰️ Remote Sensing Detection and Environmental Understanding
- ✈️ Autonomous Aerial Navigation and Decision Making
- 🤝 Contributing
- 📜 License
Datasets
- DOTA: A Large-Scale Dataset for Object Detection in Aerial Images
- University-1652: Drone-Based Geo-Localization
- SUES-200: Multi-Height Cross-View Benchmark
1. Ground-Air Collaborative Perception and Visual Geolocalization
- Lin, T.-Y., Cui, Y., Belongie, S., & Hays, J. (2015). Learning deep representations for ground-to-aerial geolocalization. IEEE Conference on Computer Vision and Pattern Recognition, 5007–5015.
- Vo, N. N., & Hays, J. (2016). Localizing and orienting street views using overhead imagery. European Conference on Computer Vision, 494–509.
- Tian, Y., Chen, C., & Shah, M. (2017). Cross-view image matching for geo-localization in urban environments. IEEE Conference on Computer Vision and Pattern Recognition, 3608–3616.
- Workman, S., & Jacobs, N. (2015). On the location dependence of convolutional neural network features. IEEE Conference on Computer Vision and Pattern Recognition Workshops, 70–74.
- Workman, S., Souvenir, R., & Jacobs, N. (2015). Wide-area image geolocalization with aerial reference imagery. IEEE International Conference on Computer Vision, 3961–3969.
- Zhai, M., Bessinger, Z., Workman, S., & Jacobs, N. (2017). Predicting ground-level scene layout from aerial imagery. IEEE Conference on Computer Vision and Pattern Recognition, 867–875.
- Hu, S., Feng, M., Nguyen, R. M. H., & Lee, G. H. (2018). CVM-net: Cross-view matching network for image-based ground-to-aerial geo-localization. IEEE Conference on Computer Vision and Pattern Recognition, 7258–7267.
- Liu, L., & Li, H. (2019). Lending orientation to neural networks for cross-view geo-localization. CVPR.
- Liu, L., Li, H., & Dai, Y. (2019). Stochastic Attraction-Repulsion Embedding for Large Scale Image Localization. IEEE International Conference on Computer Vision, 2570–2579.
- Shi, Y., Liu, L., Yu, X., & Li, H. (2019). Spatial-Aware Feature Aggregation for Image based Cross-View Geo-Localization. NeurIPS.
- Shi, Y., Yu, X., Liu, L., Zhang, T., & Li, H. (2019). Optimal Feature Transport for Cross-View Image Geo-Localization. arXiv preprint arXiv:1907.05021. ([101] & [102] merged)
- Shi, Y., Yu, X., Campbell, D., & Li, H. (2020). Where am I looking at? Joint location and orientation estimation by cross-view matching. CVPR.
- Shi, Y., Yu, X., Liu, L., Zhang, T., & Li, H. (2020). Optimal feature transport for cross-view image geo-localization. AAAI.
- Zheng, Z., Wei, Y., & Yang, Y. (2020). University-1652: A multi-view multi-source benchmark for drone-based geo-localization. ACM Multimedia.
- Wang, T., Zheng, Z., Yan, C., Zhang, J., Sun, Y., Zheng, B., & Yang, Y. (2021). Each part matters: Local patterns facilitate cross-view geo-localization. IEEE Transactions on Circuits and Systems for Video Technology, 32(2), 867–879.
- Yang, H., Lu, X., & Zhu, Y. (2021). Cross-view geo-localization with layer-to-layer transformer. Advances in Neural Information Processing Systems, 34, 29009–29020.
- Lin, J., Zheng, Z., Zhong, Z., Luo, Z., Li, S., Yang, Y., & Sebe, N. (2022). Joint representation learning and keypoint detection for cross-view geo-localization. IEEE Transactions on Image Processing, 31, 3780–3792.
- Shi, Y., Campbell, D. J., Yu, X., & Li, H. (2022). Geometry-guided street-view panorama synthesis from satellite imagery. TPAMI.
- Shi, Y., Yu, X., Liu, L., Campbell, D., Koniusz, P., & Li, H. (2022). Accurate 3-DoF Camera Geo-Localization via Ground-to-Satellite Image Matching. TPAMI.
- Shi, Y., Yu, X., Wang, S., & Li, H. (2022). CVLNet: Cross-View Feature Correspondence Learning for Video-based Camera Localization. ACCV.
- Zhu, S., Shah, M., & Chen, C. (2022). Transgeo: Transformer is all you need for cross-view image geo-localization. IEEE/CVF Conference on Computer Vision and Pattern Recognition, 1162–1171.
- Deuser, F., Habel, K., & Oswald, N. (2023). Sample4geo: Hard negative sampling for cross-view geo-localisation. IEEE/CVF International Conference on Computer Vision, 16847–16856.
- Shi, Y., Wu, F., Perincherry, A., Vora, A., & Li, H. (2023). Boosting 3-DoF Ground-to-Satellite Camera Localization Accuracy via Geometry-Guided Cross-View Transformer. ICCV.
- Song, Z., Ze, X., Lu, J., & Shi, Y. (2023). Learning Dense Flow Field for Highly-accurate Cross-view Camera Localization. NIPS.
- Wang, C., Zheng, Z., Quan, R., Sun, Y., & Yang, Y. (2023). Context-Aware Pretraining for Efficient Blind Image Decomposition. CVPR.
- Zhu, R., Yin, L., Yang, M., Wu, F., Yang, Y., & Hu, W. (2023). SUES-200: A multi-height multi-scene cross-view image benchmark across drone and satellite. IEEE Transactions on Circuits and Systems for Video Technology, 33(9), 4825–4839.
- Zhang, X., Li, X., Sultani, W., Zhou, Y., & Wshah, S. (2023). Cross-view geo-localization via learning disentangled geometric layout correspondence. AAAI Conference on Artificial Intelligence, 37, 3480–3488.
- Chu, M., Zheng, Z., Ji, W., Wang, T., & Chua, T.-S. (2024). Towards natural language-guided drones: GeoText-1652 benchmark with spatial relation matching. European Conference on Computer Vision, 213–231.
- Wang, T., Zheng, Z., Sun, Y., Chua, T.-S., Yang, Y., & Yan, C. (2024). Multiple-environment Self-adaptive Network for Aerial-view Geo-localization. Pattern Recognition, 152, 110363. ([117] & [118] merged)
- Wang, T., Zheng, Z., Zhu, Z., Sun, Y., Yang, Y., & Yan, C. (2024). Learning Cross-view Geo-localization Embeddings via Dynamic Weighted Decorrelation Regularization. TGRS.
- Gao, X., Wu, Y., Yang, F., Luo, X., Wu, K., Chen, X., … & Tu, Z. (2025). Airv2x: Unified air-ground vehicle-to-everything collaboration. arXiv preprint arXiv:2506.19283.
- Ju, H., Huang, S., Liu, S., & Zheng, Z. (2025). Video2bev: Transforming drone videos to BEVs for video-based geo-localization. ICCV.
- Albaluchi, Y., Fu, B., Damer, N., Ramachandra, R., & Raja, K. (2025). UAV-based person re-identification: A survey of UAV datasets, approaches, and challenges. Computer Vision and Image Understanding, 251, 104261.
2. Remote Sensing Detection and Environmental Understanding
- Maas, S. J. (1988). Using satellite data to improve model estimates of crop yield. Agronomy Journal, 80(4), 655–662.
- Del Valle, H. F., Elissalde, N. O., Gagliardini, D. A., & Milovich, J. (1998). Status of desertification in the Patagonian region: Assessment and mapping from satellite imagery. Arid Land Research and Management, 12(2), 95–121.
- Collado, A. D., Chuvieco, E., & Camarasa, A. (2002). Satellite remote sensing analysis to monitor desertification processes in the crop-rangeland boundary of Argentina. Journal of Arid Environments, 52(1), 121–133.
- Kaufman, Y. J., Tanré, D., & Boucher, O. (2002). A satellite view of aerosols in the climate system. Nature, 419(6903), 215–223.
- Ferencz, C., Bognár, P., Lichtenberger, J., Hamar, D., Tarcsai, G., Timár, G., … & Székely, B. (2004). Crop yield estimation by satellite remote sensing. International Journal of Remote Sensing, 25(20), 4113–4149.
- Malingreau, J.-P. (1986). Global vegetation dynamics: satellite observations over Asia. International Journal of Remote Sensing, 7(9), 1121–1146.
- Tucker, C. J., & Townshend, J. R. G. (2000). Strategies for monitoring tropical deforestation using satellite data. International Journal of Remote Sensing, 21(6-7), 1461–1470.
- Boyle, S. A., Kennedy, C. M., Torres, J., Colman, K., Pérez-Estigarribia, P. E., & De La Sancha, N. U. (2014). High-resolution satellite imagery is an important yet underutilized resource in conservation biology. PLoS One, 9(1), e86908.
- Ferreira, V. G., Gong, Z., He, X., Zhang, Y., & Andam-Akorful, S. A. (2013). Estimating total discharge in the Yangtze River Basin using satellite-based observations. Remote Sensing, 5(7), 3415–3430.
- Liu, Z., Wang, H., Weng, L., & Yang, Y. (2016). Ship rotated bounding box space for ship extraction from high-resolution optical satellite images with complex backgrounds. IEEE Geoscience and Remote Sensing Letters, 13(8), 1074–1078.
- Dobrynin, D. V., Rozhnov, V. V., Saveliev, A. A., Sukhova, O. V., & Yachmennikova, A. A. (2017). Integration of Satellite Tracking Data and Satellite Images for Detailed Characteristics of Wildlife Habitats. Izvestiya, Atmospheric and Oceanic Physics, 53(9), 1060–1071.
- Guo, W., Yang, W., Zhang, H., & Hua, G. (2018). Geospatial object detection in high resolution satellite images based on multi-scale convolutional neural network. Remote Sensing, 10(1), 131.
- Finer, M., Novoa, S., Weisse, M. J., Petersen, R., Mascaro, J., Souto, T., … & Martinez, R. G. (2018). Combating deforestation: From satellite to intervention. Science, 360(6395), 1303–1305.
- Xia, G.-S., Bai, X., Ding, J., Zhu, Z., Belongie, S., Luo, J., … & Zhang, L. (2018). DOTA: A large-scale dataset for object detection in aerial images. CVPR, 3974–3983.
- Ding, J., Xue, N., Long, Y., Xia, G.-S., & Lu, Q. (2019). Learning RoI transformer for oriented object detection in aerial images. CVPR, 2849–2858.
- Yang, X., Yang, J., Yan, J., Zhang, Y., Zhang, T., Guo, Z., … & Fu, K. (2019). Scrdet: Towards more robust detection for small, cluttered and rotated objects. ICCV, 8232–8241.
- Zamir, S. W., Arora, A., Gupta, A., Khan, S., Sun, G., Khan, F. S., … & Bai, X. (2019). iSAID: A Large-Scale Dataset for Instance Segmentation in Aerial Images. CVPR Workshops.
- Guo, H., Yang, X., Wang, N., Song, B., & Gao, X. (2020). A rotational libra R-CNN method for ship detection. TGRS, 58(8), 5772–5781.
- Fu, K., Chang, Z., Zhang, Y., & Sun, X. (2020). Point-based estimator for arbitrary-oriented object detection in aerial images. TGRS, 59(5), 4370–4387.
- Xu, Y., Fu, M., Wang, Q., Wang, Y., Chen, K., Xia, G.-S., & Bai, X. (2020). Gliding vertex on the horizontal bounding box for multi-oriented object detection. TPAMI, 43(4), 1452–1459.
- Zheng, Z., Zhong, Y., Ma, A., Han, X., Zhao, J., Liu, Y., & Zhang, L. (2020). HyNet: Hyper-scale object detection network framework for multiple spatial resolution remote sensing imagery. ISPRS Journal of Photogrammetry and Remote Sensing, 166, 1–14.
- Ding, J., Xue, N., Xia, G.-S., Bai, X., Yang, W., Yang, M. Y., … & Zhang, L. (2021). Object detection in aerial images: A large-scale benchmark and challenges. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(11), 7778–7796.
- Han, J., Ding, J., Xue, N., & Xia, G.-S. (2021). Redet: A rotation-equivariant detector for aerial object detection. CVPR, 2786–2795.
- Hou, J.-B., Zhu, X., & Yin, X.-C. (2021). Self-adaptive aspect ratio anchor for oriented object detection in remote sensing images. Remote Sensing, 13(7), 1318.
- Ming, Q., Zhou, Z., Miao, L., Zhang, H., & Li, L. (2021). Dynamic anchor learning for arbitrary-oriented object detection. AAAI, 35, 2355–2363.
- Qian, W., Yang, X., Peng, S., Yan, J., & Guo, Y. (2021). Learning modulated loss for rotated object detection. AAAI, 35, 2458–2466.
- Shamsolmoali, P., Zareapoor, M., Chanussot, J., Zhou, H., & Yang, J. (2021). Rotation equivariant feature image pyramid network for object detection in optical remote sensing imagery. TGRS, 60, 1–14.
- Xie, X., Cheng, G., Wang, J., Yao, X., & Han, J. (2021). Oriented R-CNN for object detection. ICCV, 3520–3529.
- Yang, X., Yan, J., Feng, Z., & He, T. (2021). R3det: Refined single-stage detector with feature refinement for rotating object. AAAI, 35, 3163–3171.
- Yang, X., Yan, J., Ming, Q., Wang, W., Zhang, X., & Tian, Q. (2021). Rethinking rotated object detection with Gaussian Wasserstein distance loss. ICML, 11830–11841.
- Yang, X., Yang, J., Yan, J., Zhang, Y., Wang, W., Tian, Q., & Yan, J. (2021). Learning high-precision bounding box for rotated object detection via Kullback-Leibler divergence. NeurIPS, 34, 18381–18394.
- Zhang, W., Jiao, L., Li, Y., Huang, Z., & Wang, H. (2021). Laplacian feature pyramid network for object detection in VHR optical remote sensing images. TGRS, 60, 1–14.
- Cheng, G., Wang, J., Li, K., Xie, X., Lang, C., Yao, Y., & Han, J. (2022). Anchor-free oriented proposal generator for object detection. TGRS, 60, 1–11.
- Cheng, G., Yao, Y., Li, S., Li, K., Xie, X., Wang, J., Yao, X., & Han, J. (2022). Dual-aligned oriented detector. TGRS, 60, 1–11.
- Fang, C., Song, K., Paerl, H. W., Jacinthe, P.-A., Wen, Z., Liu, G., … & Wang, Z. (2022). Global divergent trends of algal blooms detected by satellite during 1982–2018. Global Change Biology, 28(7), 2327–2340.
- Hou, L., Lu, K., & Xue, J. (2022). Refined one-stage oriented object detection method for remote sensing images. TGRS, 31, 1545–1558.
- Hou, L., Lu, K., Xue, J., & Li, Y. (2022). Shape-adaptive selection and measurement for oriented object detection. TGRS, 36, 923–932.
- Jin, P., Mou, L., Xia, G.-S., & Zhu, X. X. (2022). Anomaly Detection in Aerial Videos With Transformers. TGRS.
- Shao, M., Wang, C., Zuo, W., & Meng, D. (2022). Efficient pyramidal GAN for versatile missing data reconstruction in remote sensing images. TGRS, 60, 1–14.
- Sun, X., Wang, P., Yan, Z., Xu, F., Wang, R., Diao, W., … & Li, J. (2022). FAIR1M: A benchmark dataset for fine-grained object recognition in high-resolution remote sensing imagery. ISPRS Journal of Photogrammetry and Remote Sensing, 184, 116–130.
- Wang, G., Zhuang, Y., Chen, H., Liu, X., Zhang, T., Li, L., … & Sang, Q. (2022). FSoD-Net: Full-Scale Object Detection From Optical Remote Sensing Imagery. TGRS, 60, 1–18.
- Zhang, K., Bello, I. M., Su, Y., Wang, J., & Maryam, I. (2022). Multiscale depthwise separable convolution based network for high-resolution image segmentation. TGRS, 43(18), 6624–6645.
- Pu, Y., Wang, Y., Xia, Z., Han, Y., Wang, Y., Gan, W., … & Huang, G. (2023). Adaptive Rotated Convolution for Rotated Object Detection. ICCV, 6589–6600.
- Yao, Y., Chen, T., Bi, H., Cai, X., Pei, G., Yang, G., … & Zhang, H. (2023). Automated object recognition in high-resolution optical remote sensing imagery. Nucleic Acids Research, 10(6), nwad122.
- Yang, G., Li, W., Zhang, J., Wang, W., & Liu, J. (2023). LAI-YOLOv5s: A Lightweight Aerial Image Object Detection Algorithm. TGRS, 61, 1–12.
- Zhao, L., & Zhu, M. (2023). MS-YOLOv7: YOLOv7 Based on Multi-Scale for Object Detection on UAV Aerial Photography. Drones.
- Cai, X., Lai, Q., Wang, Y., Wang, W., Sun, Z., & Yao, Y. (2024). Poly kernel inception network for remote sensing detection. IEEE/CVF Conference on Computer Vision and Pattern Recognition, 27706–27716.
- Li, Y., Hou, Q., Zheng, Z., Cheng, M.-M., Yang, J., & Li, X. (2023). Large Selective Kernel Network for Remote Sensing Object Detection. ICCV, 16794–16805.
3. Autonomous Aerial Navigation and Decision Making
- Giusti, A., Guzzi, J., Cireşan, D. C., He, F.-L., Rodríguez, J. P., Fontana, F., … & Di Caro, G. (2015). A machine learning approach to visual perception of forest trails for mobile robots. IEEE Robotics and Automation Letters, 1(2), 661–667.
- Smolyanskiy, N., Kamenev, A., Smith, J., & Birchfield, S. (2017). Toward low-flying autonomous MAV trail navigation using deep neural networks for environmental awareness. IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 4241–4247.
- Majdik, A. L., Till, C., & Scaramuzza, D. (2017). The Zurich urban micro aerial vehicle dataset. The International Journal of Robotics Research, 36(3), 269–273.
- Loquercio, A., Maqueda, A. I., Del-Blanco, C. R., & Scaramuzza, D. (2018). Dronet: Learning to fly by driving. IEEE Robotics and Automation Letters, 3(2), 1088–1095.
- Kang, K., Belkhale, S., Kahn, G., Abbeel, P., & Levine, S. (2019). Generalization through simulation: Integrating simulated and real data into deep reinforcement learning for vision-based autonomous flight. IEEE International Conference on Robotics and Automation (ICRA), 6008–6014.
- Singla, A., Padakandla, S., & Bhatnagar, S. (2019). Memory-based deep reinforcement learning for obstacle avoidance in UAV with limited environment knowledge. IEEE Transactions on Intelligent Transportation Systems, 22(1), 107–118.
- Albanis, G., Zioulis, N., Dimou, A., Zarpalas, D., & Daras, P. (2020). Dronepose: Photorealistic UAV-assistant dataset synthesis for 3D pose estimation via a smooth silhouette loss. European Conference on Computer Vision, 663–681.
- Bozcan, I., & Kayacan, E. (2020). Au-air: A multi-modal unmanned aerial vehicle dataset for low altitude traffic surveillance. IEEE International Conference on Robotics and Automation (ICRA), 8504–8510.
- Fan, Y., Chu, S., Zhang, W., Song, R., & Li, Y. (2020). Learn by observation: Imitation learning for drone patrolling from videos of a human navigator. IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 5209–5216.
- Li, T., Liu, J., Zhang, W., Ni, Y., Wang, W., & Li, Z. (2021). UAV-human: A large benchmark for human behavior understanding with unmanned aerial vehicles. CVPR.
- Wen, L., Du, D., Zhu, P., Hu, Q., Wang, Q., Bo, L., & Lyu, S. (2021). Detection, tracking, and counting meets drones in crowds: A benchmark. CVPR.
- Dang, Y., Huang, C., Chen, P., Liang, R., Yang, X., & Cheng, K.-T. (2022). Path-Analysis-Based Reinforcement Learning Algorithm for Imitation Filming. TMM.
- Fan, Y., Chen, W., Jiang, T., Zhou, C., Zhang, Y., & Wang, X. E. (2022). Aerial vision-and-dialog navigation. arXiv preprint arXiv:2205.12219.
- Pitas, I., & Mademlis, I. (2022). Autonomous UAV Cinematography. ACM Multimedia.
- Zhang, H., Wang, G., Lei, Z., & Hwang, J.-N. (2019). Eye in the sky: Drone-based object tracking and 3D localization. ACM Multimedia.
- Dissanayaka, D., Wanasinghe, T. R., De Silva, O., Jayasiri, A., & Mann, G. K. I. (2023). Review of Navigation Methods for UAV-Based Parcel Delivery. TASE.
- Goodrich, P., Betancourt, O., Arias, A. C., & Zohdi, T. (2023). Placement and drone flight path mapping of agricultural soil sensors using machine learning. Computers and Electronics in Agriculture.
- Liu, Z., Shang, Y., Li, T., Chen, G., Wang, Y., Hu, Q., & Zhu, P. (2023). Robust Multi-Drone Multi-Target Tracking to Resolve Target Occlusion: A Benchmark. TMM.
- Sorbelli, F. B., Corò, F., Palazzetti, L., Pinotti, C. M., & Rigoni, G. (2023). How the Wind Can Be Leveraged for Saving Energy in a Truck-Drone Delivery System. TITS.
- Zheng, O., Abdel-Aty, M., Yue, L., Abdelraouf, A., Wang, Z., & Mahmoud, N. (2022). CitySim: A Drone-Based Vehicle Trajectory Dataset for Safety Oriented Research and Digital Twins. arXiv:2208.11036.
- Liu, S., Zhang, H., Qi, Y., Wang, P., Zhang, Y., & Wu, Q. (2023). Aerialvln: Vision-and-language navigation for UAVs. IEEE/CVF International Conference on Computer Vision, 15384–15394.
- Gao, Y., Wang, Z., Jing, L., Wang, D., Li, X., & Zhao, B. (2024). Aerial vision-and-language navigation via semantic-topo-metric representation guided LLM reasoning. arXiv preprint arXiv:2410.08500.
- Lee, J., Miyanishi, T., Kurita, S., Sakamoto, K., Azuma, D., Matsuo, Y., & Inoue, N. (2024). Citynav: Language-goal aerial navigation dataset with geographic information. arXiv preprint arXiv:2406.14240.
- Wang, X., Yang, D., Wang, Z., Kwan, H., Chen, J., Wu, W., … & Liu, S. (2024). Towards realistic UAV vision-language navigation: Platform, benchmark, and methodology. arXiv preprint arXiv:2410.07087.
- Gao, Y., Li, C., You, Z., Liu, J., Li, Z., Chen, P., … & Zhao, B. (2025). OpenFly: A versatile toolchain and large-scale benchmark for aerial vision-language navigation. arXiv e-prints, arXiv–2502.
- Wu, R., Zhang, Y., Chen, J., Huang, L., Zhang, S., Zhou, X., … & Liu, S. (2025). AeroDuo: Aerial Duo for UAV-based Vision and Language Navigation. arXiv preprint arXiv:2508.15232.
- Xu, Z., Han, X., Shen, H., Jin, H., & Shimada, K. (2025). Navrl: Learning safe flight in dynamic environments. IEEE Robotics and Automation Letters.
Contributing
We welcome contributions! To contribute:
- Fork the repository.
- Add new papers/datasets under the correct module.
- Keep year-based sorting (same year → alphabetical by first author).
- Remove duplicates and merge arXiv + conference/journal versions.
- Submit a PR with clear description.
License
Last updated: October 26, 2025