Surface-Aware Semanic Features

We introduce a surface-aware feature embedding space separating instances of the same semantic class. Semantically-related regions across diverse 3D shapes are implicitly aligned in a self-supervised manner without access to any labels 🔍.

Abstract

Many 3D tasks such as pose alignment, animation, motion transfer, and 3D reconstruction rely on establishing correspondences between 3D shapes. This challenge has recently been approached by matching of semantic features from pre-trained vision models. However, despite their power, these features struggle to differentiate instances of the same semantic class such as "left hand" versus "right hand" which leads to substantial mapping errors. To solve this, we learn a surface-aware embedding space that is robust to these ambiguities. Importantly, our approach is self-supervised and requires only a small number of unpaired training meshes to infer features for new 3D shapes at test time. We achieve this by introducing a contrastive loss that preserves the semantic content of the features distilled from foundational models while disambiguating features located far apart on the shape's surface. We observe superior performance in correspondence matching benchmarks and enable downstream applications including in-part segmentation, pose alignment, and motion transfer in low-data regimes.

Applications

We demonstrate multiple applications that benefit from our surface-aware features. We compare our results against Diff3F (Dutt, 2024 CVPR). You can explore the results below by cycling through the different examples by pushing the < and > buttons.

Instance-based Part Segmentation

Following prior Diff3F, we segment a target shape by clustering features around centroids from K-means clustering of source-shape features. We demonstrate that unlike the Diff3F features, our surface-aware features disambiguate the limbs.

Ours Source

Ours Target

Diff3F Source

Diff3F Target

< Previous	⤢ Zoom	▶‖ Play/Pause	> Next

Pose Alignment

Our surface-aware features are also useful for the pose alignment of a kinematic model to another 3D shape. To this end, we establish point correspondences between shape pairs and optimize the kinematic pose parameters to minimize point-to-point distances. Our method produces poses closer to the target shape for dense and sparse correspondences.

Static

We first optimize rotation, translation, and scale of the source shape for a rough alignment. Then, we optimize the bone rotations.

Ours Source

Ours Target

Diff3F Source

Diff3F Target

< Previous	⤢ Zoom	▶‖ Play/Pause	> Next

Dynamic

We can adopt the same procedure for a sequence of target poses and optimize a set of source shapes in parallel.

Ours Source

Ours Target

Diff3F Source

Diff3F Target

< Previous	⤢ Zoom	▶‖ Play/Pause	> Next

Correspondence Matching

For fairness, we sample one target shape randomly per source shape and show the examples here. You can explore the results below by cycling through the different examples by pushing the < and > buttons. We find that our features produces visually smoother correspondences and are more robust in separating left from right, as well as front and back.

Note that correspondences are point-based and no post-processing is applied.

SHREC'20

Source

Ours

Diff3F

< Previous	⤢ Zoom	▶‖ Play/Pause	> Next

SHREC'19

Source

GT

Ours

Diff3F

< Previous	⤢ Zoom	▶‖ Play/Pause	> Next

TOSCA

Source

GT

Ours

Diff3F

< Previous	⤢ Zoom	▶‖ Play/Pause	> Next

BibTeX

@misc{uzolas2025surfaceawaredistilled3dsemantic,
title={Surface-Aware Distilled 3D Semantic Features}, 
author={Lukas Uzolas and Elmar Eisemann and Petr Kellnhofer},
year={2025},
eprint={2503.18254},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2503.18254}, 
}