Calender
04
June
Han Fu & Jialu Xu Thesis presentation
Han Fu & Jialu Xu will present their thesis "Unified Pre-Training for Multi-Modal Sensor Data in Autonomous Driving" (conducted in collaboration with Zenseact)
Abstract:
Accurate, efficient and transferable scene representations are central to autonomous driving. Modelling the environment with learnable three-dimensional Gaussians offers a favourable trade-off between memory footprint and geometric fidelity: each Gaussian stores local occupancy and semantic context in a continuous, object-centric form. This thesis turns the 3D Gaussian into a unified pre-training framework for downstream perception. Building on GaussianFormer, we add a lightweight sensor-guided initialization that merges evidence from lidar and radar at the beginning of training, and cross-attention mechanism which helps extract geometric cues from distance signals. A backbone trained with this strategy can be frozen and reused as a feature extractor for tasks beyond its original objective.
To test transferability, the trained backbone is frozen and paired with a simple decoder that receives only the Gaussian field and learns bird’s-eye-view(BEV) vehicle segmentation. With only a quarter of the labelled data, the decoder outperforms the camera-only baseline—and the modified backbone also delivers a measurable gain on standard 3D occupancy benchmarks. These findings position the 3D Gaussian representation as a versatile foundation for multi-modal perception: a single pre-trained representation can now be shared across semantic segmentation.