Publications
* denotes equal contribution
Peer-reviewed
2025
- 3DVMotionDreamer: Exploring Semantic Video Diffusion features for Zero-Shot 3D Mesh AnimationUzolas, Lukas, Eisemann, Elmar, and Kellnhofer, Petr2025
Animation techniques bring digital 3D worlds and characters to life. However, manual animation is tedious and automated techniques are often specialized to narrow shape classes. In our work, we propose a technique for automatic re-animation of arbitrary 3D shapes based on a motion prior extracted from a video diffusion model. Unlike existing 4D generation methods, we focus solely on the motion, and we leverage an explicit mesh-based representation compatible with existing computer-graphics pipelines. Furthermore, our utilization of diffusion features enhances accuracy of our motion fitting. We analyze efficacy of these features for animation fitting and we experimentally validate our approach for two different diffusion models and four animation models. Finally, we demonstrate that our time-efficient zero-shot method achieves a superior performance re-animating a diverse set of 3D shapes when compared to existing techniques in a user study.
2023
- NeurIPSTemplate-free Articulated Neural Point Clouds for Reposable View SynthesisUzolas, Lukas, Eisemann, Elmar, and Kellnhofer, PetrIn Thirty-seventh Conference on Neural Information Processing Systems 2023
Dynamic Neural Radiance Fields (NeRFs) achieve remarkable visual quality when synthesizing novel views of time-evolving 3D scenes. However, the common reliance on backward deformation fields makes reanimation of the captured object poses challenging. Moreover, the state of the art dynamic models are often limited by low visual fidelity, long reconstruction time or specificity to narrow application domains. In this paper, we present a novel method utilizing a point-based representation and Linear Blend Skinning (LBS) to jointly learn a Dynamic NeRF and an associated skeletal model from even sparse multi-view video. Our forward-warping approach achieves state-of-the-art visual fidelity when synthesizing novel views and poses while significantly reducing the necessary learning time when compared to existing work. We demonstrate the versatility of our representation on a variety of articulated objects from common datasets and obtain reposable 3D reconstructions without the need of object-specific skeletal templates.
2022
- IEEE AccessDeep Anomaly Generation: An Image Translation Approach of Synthesizing Abnormal Banded Chromosome ImagesUzolas, Lukas*, Rico, Javier*, Coupé, Pierrick, Sanmiguel, Juan C., and Cserey, GyörgyIEEE Access 2022
Advances in deep-learning-based pipelines have led to breakthroughs in a variety of microscopy image diagnostics. However, a sufficiently big training data set is usually difficult to obtain due to high annotation costs. In the case of banded chromosome images, the creation of big enough libraries is difficult for multiple pathologies due to the rarity of certain genetic disorders. Generative Adversarial Networks (GANs) have proven to be effective in generating synthetic images and extending training data sets. In our work, we implement a conditional GAN (cGAN) that allows generation of realistic single chromosome images following user-defined banding patterns. To this end, an image-to-image translation approach based on automatically created 2D chromosome segmentation label maps is used. Our validation shows promising results when synthesizing chromosomes with seen as well as unseen banding patterns. We believe that this approach can be exploited for data augmentation of chromosome data sets with structural abnormalities. Therefore, the proposed method could help to tackle medical image analysis problems such as data simulation, segmentation, detection, or classification in the field of cytogenetics.
2018
- MuCScale & Walk: Evaluation of scaling-based interaction techniques for natural locomotion in VR (German Original: Scale & Walk: Evaluation von skalierungsbasierten Interaktionstechniken zur natürlichen Fortbewegung in VR)Boysen, Yannic*, Husung, Malte*, Mantei, Timo*, Müller, Lisa-Maria*, Schimmelpfennig, Joshua*, Uzolas, Lukas*, and Langbehn, EikeMensch und Computer 2018-Tagungsband 2018
Virtual reality headsets, such as the HTC Vive, enable the user to move around in the virtual world through real movements. However, this is only applicable to a limited extent, as the walkable real space is usually significantly smaller than the virtual space. Scaling techniques make it possible to travel long distances in the virtual world by manipulating the virtual size of the user. In this paper, we present an experiment in which we compare two scaling techniques versus accelerated walking on the basis of usability, sense of presence, motion sickness and spatial understanding. Our results show that automatic scaling in its current form performs significantly worse in terms of usability and motion sickness than accelerated walking and self-determined scaling. Self-determined scaling, however, is an equivalent alternative to accelerated walking.
Theses
2021
- M.Sc.Meta-Learning for Domain Generalization with Style-based Parameter Prediction for Biomedical Image SegmentationUzolas, LukasUniversity of Bordeaux 2021
Deep Learning models often suffer a degradation in performance when applied to data sets sampled from a different distribution than the training data set. For example, this shift in distributions can be induced by different imaging devices located at different hospitals, producing images of different resolution, contrast, and brightness. Several well-established solutions exist to tackle this problem by aligning the distributions between data sets but these approaches necessitate that the target distribution is known a priori which is not always realistic. Meta-Learning for Domain Generalization can train models that generalize well on unseen target data by inducing a domain shift during training. However, the importance of normalization layers in these models has been neglected, yet regulation of normalization parameters has been proven to be beneficial to tackle the domain shift problem. To this end, this thesis aims to investigate whether a normalization parameter prediction scheme can improve generalization performance on an unseen target data set while using a Meta-Learning approach. In this context, the source data is defined as the data available during training, while the target data is unknown; the Domain shift described the shift in distribution between these two sets. We investigate two parameter prediction methods: Firstly, recently introduced Instance-Level Meta Normalization which predicts the scale and shift on a local level based on the feature map moments (mean and variance). Secondly, we propose a global parameter prediction scheme based on embeddings from a pre-trained Inception-v3, inspired by contemporary work in Neural Style Transfer. To evaluate the methods, we generate a shape data set characterized by the same underlying content, which only differs in style between domains. Our results are twofold: We find that the global prediction scheme outperforms Instance-Level Meta Normalization on our shape data set, improving generalization marginally and convergence significantly. Additionally, we discover that the Meta-Learning approach results in worse performance when compared to a traditional supervised training. We, thus, show that the domain shift problem can be partially handled by predicting the normalization parameters based on the input, consistent with the findings of related works.
2019
- B.Sc.Evaluation of Grayscale Hand Gesture Segmentation with Fully Convolutional Neural NetworksUzolas, LukasUniversity of Hamburg 2019
Static hand gestures can convey information through the posture of the hand alone. However, gesture classification systems can struggle when gestures are presented in front of complex backgrounds. Segmentation represents one method to counteract this problem by extracting the hand from the image. These methods are often based on hand-crafted algorithms using skin color, which limits their application under natural conditions. We approach the segmentation tasks by utilizing two Fully Convolutional Neural Networks, namely the Light-Weight RefineNet and DeepLabv3+, and evaluate their influence on grayscale gesture classification tasks. We find that the Light-Weight RefineNet has an overall better performance, and a fine-tuned version can improve recognition accuracy on most of the gesture data sets. We further explore how gesture classification could be exploited to learn an intermediate segmentation in a Convolutional Neural Network, but this method fails to yield satisfying results at its current state.