UAVM 2024
ACM Multimedia
Workshop on
UAVs in Multimedia: Capturing the World from a New Perspective (UAVM 2024)
The accept papers will be published at ACM Multimedia Workshop (top 50%), and go through the same peer review process as the regular papers. Several authors will be invited to do a oral presentation.
[Accepted Workshop Proposal] [Submission Site] [Previous UAVM 2023] [Proceeding Workshop Papers] [Challenge Code Collection]
News
- 2/11/2024 - Video is available at [Youtube] and [Bilibili]
- 26/10/2024 - Finally, we have accepted 7 papers. They are available at https://dl.acm.org/doi/proceedings/10.1145/3689095
- 15/8/2024 - Challenge Open-source Code (https://github.com/wtyhub/UAVM2024).
- 23/4/2024 - Challenge Platform is now available.
- 23/4/2024 - Paper submission site is now available.
- 22/4/2024 - CFP is released.
- 22/4/2024 - Workshop homepage is now available.
Workshop Schedule (GMT+11, 1/11/2024)
9:30~10:30am Invited Talk from Prof. Julian Kooij [Slides]
10:30~11:30am Invited Talk from Prof. Javen Qinfeng Shi [Slides]
11:30~11:45am break
11:45~12:00pm Challenge 1st-place Winner
12:00~12:15pm Challenge 2nd-place Winner
12:15~12:30pm Challenge 3rd-place Winner
Invited Speakers
Julian Kooij, TU Delft | Javen Qinfeng Shi, University of Adelaide |
Talk: Vehicle localization without SLAM: Finding your camera’s pose in an aerial image (Julian Kooij) [Slides]
Abstract: Accurately locating a robotic vehicle with respect to a map is a key requirement for highly automated driving. While GNSS can provide precise positional information, it is often unreliable and inaccurate in urban environments where buildings block satellite reception (“urban canyons”). This talk discusses a vision-based alternative to determine the pose of a vehicle directly from its camera images. Ideally, such localization can improve the rough localization estimates from GNSS in a manner that accurately scales to large areas without the need for expensive Visual-SLAM. Instead, the emerging field of “fine-grained cross-view localization” develops methods that compare the vehicle’s camera images to easily obtained aerial imagery to estimate its location and viewing direction. I will discuss two deep learning techniques that we proposed for this task: Convolution Cross-View Pose Estimation (T-PAMI’23) and SliceMatch (CVPR’23). These supervised techniques rely on labelled training data, i.e. camera images with known highly accurate pose information, but this can be hard to obtain exactly because GNSS is often unreliable. Our most recent work aims to address this limitation by unsupervised adaptation of an already trained cross-view localization method to the images from a new area, only requiring that their location information is sufficient to pick the correct aerial image patch rather than defining the exact vehicle pose (ECCV’24).
Bio:Julian Kooij is an Associate Professor at the Intelligent Vehicles group, performing research on multi-sensor vehicle perception for autonomous driving. The group is part of the Cognitive Robotics (CoR) department of the 3ME Faculty. His research interests include computer vision, 3D object detection by vision/lidar/radar and acoustic, visual localization, semantic environment understanding, and trajectory forecasting for Vulnerable Road User (VRU) behavior. His team develops novel techniques using deep learning, including representation learning and self-supervised approaches, statistical machine learning and probabilistic inference.
Talk: Causal AI: The Way of Change in the Age of AI (Javen Qinfeng Shi) [Slides]
Abstract:Our world is undergoing inevitable and tumultuous changes. Causality, operating beneath the veneer of cause and effect, is essentially the way of change. This talk will show how causal AI can identify the root causes, discover latent variables, build immunity against spurious correlations, improve generalisation to diverse domains and distribution shifts, model the consequence of interventions, and answer What-If counterfactual questions. More importantly, causal AI holds the key to answer the reverse question: What is the ideal sequence of interventions, given resources or budgets, to optimise future outcomes? Join us for an exploration of the transformative role of Causal AI in understanding and navigating the complexities of a rapidly changing world in the age of AI.
Bio: Professor Javen Qinfeng Shi is the Founding Director of Causal AI Group at the University of Adelaide, and one of the directors at Australian Institute for Machine Learning (AIML). His research interests include Causation, AI, mind and metaphysics. Google Scholar ranks him 7th globally in Probabilistic Graphical Models, and 4th in Causation. He served as a Panelist for the Responsible AI Think Tank from 2022 to 2024 and currently holds the position of an AI Industry Forum Panelist from 2024 onward, actively contributing to the cultivation of the national and state AI ecosystem. He has transferred his research to diverse industries including material discovery, agriculture, mining, sport, manufacturing, bushfire, health and education. Recent awards include: 1) 1st place at Open Catalyst Challenge at NeurIPS AI for Science 2023, using AI to discover energy material; 2) Won AUS/NZ Bushfire Data Quest 2020 using AI to predict fire spread, which led to winning a Citizen Science Grant in 2021, and released a bushfire app NOBURN in 2023 (over 50 media coverages); 3) Finalist of SA Department of Energy and Mining’s Gawler Challenge 2020 (over 2k participants from 100+ countries) with his team’s work being considered as “The most innovative modelling” by the judge panel; 4) 2nd place in Explorer Challenge 2019 (over 1k entries from 62 countries); 5) 1st place at SAIC Volkswagen’s Logistics Innovation Day for Smart Manufacturing 2019.
Important Dates
Submission of papers:
- Workshop Papers Submission:
5 July 20247 July 2024 - Workshop Papers Notification: 30 July 2024
- Student Travel Grants Application Deadline: 5 August 2024
- Camera-ready Submission: 6 August 2024
- Conference Dates: 28 October 2024 – 1 November 2024
Please note: The submission deadline is at 11:59 p.m. of the stated deadline date Anywhere on Earth
Abstract
Unmanned Aerial Vehicles (UAVs), also known as drones, have become increasingly popular in recent years due to their ability to capture high-quality multimedia data from the sky. With the rise of multimedia applications, such as aerial photography, cinematography, and mapping, UAVs have emerged as a powerful tool for gathering rich and diverse multimedia content. This workshop aims to bring together researchers, practitioners, and enthusiasts interested in UAV multimedia to explore the latest advancements, challenges, and opportunities in this exciting field. The workshop will cover various topics related to UAV multimedia, including aerial image and video processing, machine learning for UAV data analysis, UAV swarm technology, and UAV-based multimedia applications. In the context of the ACM Multimedia conference, this workshop is highly relevant as multimedia data from UAVs is becoming an increasingly important source of content for many multimedia applications. The workshop will provide a platform for researchers to share their work and discuss potential collaborations, as well as an opportunity for practitioners to learn about the latest developments in UAV multimedia technology. Overall, this workshop will provide a unique opportunity to explore the exciting and rapidly evolving field of UAV multimedia and its potential impact on the wider multimedia community.
The list of possible topics includes, but is not limited to:
- Video-based UAV Navigation
- Satellite-guided & Ground-guided Navigation
- Path Planning and Obstacle Avoidance
- Visual SLAM (Simultaneous Localization and Mapping)
- Sensor Fusion and Reinforcement Learning for Navigation
- UAV Swarm Coordination
- Multiple Platform Collaboration
- Multi-agent Cooperation and Communication
- Decentralized Control and Optimization
- Distributed Perception and Mapping
- UAV-based Object Detection and Tracking
- Aerial-view Object Detection, Tracking and Re-identification
- Aerial-view Action Recognition
- UAV-based Sensing and Mapping
- 3D Mapping and Reconstruction
- Remote Sensing and Image Analysis
- Disaster Response and Relief
- UAV-based Delivery and Transportation
- Package Delivery and Logistics
- Safety and Regulations for UAV-based Transportation
Submission Types
Paper can be submitted on [Open Review].
Submission template can be found at ACM or you may directly follow the overleaf template.
We recommend the single-blind (showing your name and affilliation) for fast processing, but double-blind papers are also acceptable. We will ensure the fairness.
In this workshop, we welcome four types of submissions, all of which should relate to the topics and themes as listed in Section 3:
- (1). Position or perspective papers (up to 4 pages in length, plus 1 pages for references): original ideas, perspectives, research vision, and open challenges in the area of evaluation approaches for explainable recommender systems;
- (2). Challenge papers (up to 4 pages in length, plus 1 pages for references): original solution to the Challenge data, University160k, in terms of effectiveness and efficiency.
- (3). Demonstration papers (up to 4 pages in length, plus 1 pages for references): original or already published prototypes and operational evaluation approaches in the area of explainable recommender systems. Page limits include diagrams and appendices. Submissions should be single-blind, written in English, and formatted according to the current ACM two-column conference format. Suitable LaTeX, Word, and Overleaf templates are available from the ACM Website (use “sigconf” proceedings template for LaTeX and the Interim Template for Word).
Tips:
- For privacy protection, please blur faces in the published materials (such as paper, video, poster, etc.)
- For social good, please do not contain any misleading words, such as
surveillance
andsecret
.
Challenge
Challenge Platform is at https://codalab.lisn.upsaclay.fr/competitions/18770 .
We also provide a multi-weather cross-view geo-localization dataset, called University160k-WX, and welcome your participation in the competition. The motivation is to simulate the real-world geo-localization scenario. In particular, University160k extends the current University-1652 dataset with extra 167,486 satellite-view gallery distractors. University160k-WX further introduces weather variants on University160k, including fog, rain, snow and multiple weather compositions. We will release University160k-WX on our website, and make a public leader board. These distractor satellite-view images have a size of $1024 \times 1024$ and are obtained by cutting orthophoto images of real urban and surrounding areas. Multiple weathers are randomly sampled to increase the difficulty of representation learning. In our primary evaluation, the distractor is challenging and makes the competitive baseline model, LPN, decrease the Recall@1 accuracy from $75.93\%$ to $64.85\%$ and the value of AP from $79.14\%$ to $67.69\%$ in the Drone $\rightarrow$ Satellite task. If we further introduce extreme weather, the performance further drops from $64.85\%$ to $7.94\%$. We hope more audiences can be involved to solve this challenge, and consider the robustness problem against extreme weather.
Check challenge details at Section 5 in https://zdzheng.xyz/files/MM24_Workshop_Proposal_Drone.pdf
The challenge dataset contains two part.
-
The basic dataset (training set) can be download by Request. Usually I will reply the download link in 5 minutes.
-
The name-masked test-160k-WX dataset (query & gallery+distractor) can be downloaded from Onedrive. Since only drone will meet weather conditions, we only simulate weather on drone-view queries.
The submission example can be found at Baseline Submission. Please zip it as answer.zip
to submit the result.
Please return the top-10 satellite names. For example, the first query is Q3JI2tUwDkhcfip.jpeg
. Therefore, the first line of returned result in answer.txt
should be the format as follows:
e6kXgz36E8nOY2n ioqKwvSIYYhiW2v y4VmQPUYOMD8AH4 kpZ2QJlNBHMnbRA xffJQs2n9DP17fg IejrFHLQYBfce2y cH79t5WJMEMZ3VA W9u0j4N1nlFbI97 zDurtAW4FTJfNJ3 MuvIMNVdofmaRqG
Please return the result following the order of query at Query TXT It will be 37855 lines.
Related Papers
- Wang, T., Zheng, Z., Sun, Y., Yan, C., Yang, Y., & Chua, T. S. (2024). Multiple-environment Self-adaptive Network for Aerial-view Geo-localization. Pattern Recognition, 152, 110363.
- Zheng, Z., Wei, Y., & Yang, Y. (2020, October). University-1652: A multi-view multi-source benchmark for drone-based geo-localization. In Proceedings of the 28th ACM international conference on Multimedia (pp. 1395-1403).
- Wang, C., Zheng, Z., Quan, R., Sun, Y., & Yang, Y. (2023). Context-aware pretraining for efficient blind image decomposition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 18186-18195).
- Chu, M., Zheng, Z., Ji, W., & Chua, T. S. (2024). Towards Natural Language-Guided Drones: GeoText-1652 Benchmark with Spatially Relation Matching. ECCV.
Organizing Team
Zhedong Zheng, University of Macau, China | Yujiao Shi, ShanghaiTech University, China | Tingyu Wang, Hangzhou Dianzi University, China |
Chen Chen, University of Central Florida, USA | Pengfei Zhu, Tianjin University, China | Richard Hartley, Australian National University, Australia |
Conference and Journal Papers
All papers presented at ACMMM 2024 will be included in ACM proceeding. All papers submitted to this workshop will go through the same review process as the regular papers submitted to the main conference to ensure that the contributions are of high quality.
Student Traval Funding
Please check https://2024.acmmm.org/
Workshop Citation
@inproceedings{zheng2024UVA,
title={The 2nd Workshop on UAVs in Multimedia: Capturing the World from a New Perspective},
author={Zheng, Zhedong and Shi, Yujiao and Wang, Tingyu and Chen, Chen and Zhu, Pengfei and Hartley, Richard},
booktitle={Proceedings of the 32nd ACM International Conference on Multimedia Workshop},
year={2024}
}