MIDGArD: Modular Interpretable Diffusion over Graphs for Articulated Designs

MIDGArD Generates Simulatable Articulated Assets

Abstract

Providing functionality through articulation and interaction with objects is a key objective in 3D generation. We introduce MIDGArD (Modular Interpretable Diffusion over Graphs for Articulated Designs), a novel diffusion-based framework for articulated 3D asset generation. MIDGArD improves over foundational work in the field by enhancing quality, consistency, and controllability in the generation process. This is achieved through MIDGArD's modular approach that separates the problem into two primary components: structure generation and shape generation. The structure generation module of MIDGArD aims at producing coherent articulation features from noisy or incomplete inputs. It acts on the object's structural and kinematic attributes, represented as features of a graph that are being progressively denoised to issue coherent and interpretable articulation solutions. This denoised graph then serves as an advanced conditioning mechanism for the shape generation module, a 3D generative model that populates each link of the articulated structure with consistent 3D meshes. Experiments show the superiority of MIDGArD on the quality, consistency, and interpretability of the generated assets. Importantly, the generated models are fully simulatable, i.e., can be seamlessly integrated into standard physics engines such as MuJoCo, broadening MIDGArD's applicability to fields such as digital content creation, meta realities, and robotics.

Method

Articulated assets are represented as graphs.

MIDGArD employs a modular and sequential approach based on two diffusion models: the structure generator and the shape generator.

Structure Generation

The structure generator denoises an articulation graph, unconditionally or from incomplete heterogeneous inputs. This articulation graph acts as an abstract, yet interpretable object representation, encoding the structural and semantic information of every link, as well as kinematic attributes.

Given partial information about an articulated asset, MIDGArD's structure generator module can complete the missing elements, yielding a consistent, human-interpretable representation that can be adjusted by a human designer during the creative process.

Shape Generation

Object attributes of the articulation graph are used by the shape generator to produce consistent part geometries.

Controllable Generation

Image inputs allow controlling the object design on a part-level.

Simulation-Ready Assets

The generated articulated assets are fully simulatable and can be directly exported into MuJoCo:

Citation

@inproceedings{leboutet2024midgard,
    author = {Leboutet, Quentin and Wiedemann, Nina and cai, zhipeng and Paulitsch, Michael and Yuan, Kai},
    booktitle = {Advances in Neural Information Processing Systems},
    editor = {A. Globerson and L. Mackey and D. Belgrave and A. Fan and U. Paquet and J. Tomczak and C. Zhang},
    pages = {1556--1585},
    publisher = {Curran Associates, Inc.},
    title = {MIDGArD: Modular Interpretable Diffusion over Graphs for Articulated Designs},
    url = {https://proceedings.neurips.cc/paper_files/paper/2024/file/0318de478e18308a5f64297f618299d3-Paper-Conference.pdf},
    volume = {37},
    year = {2024}
    }

MIDGArD: Modular Interpretable Diffusion over Graphs for Articulated Designs

NeurIPS 2024

Quentin Leboutet

Nina Wiedemann

Zhipeng Cai

Michael Paulitsch

Kai Yuan