MIDGArD: Modular Interpretable Diffusion over Graphs for Articulated Designs

NeurIPS 2024

Quentin Leboutet

Intel Labs

Nina Wiedemann

Intel Labs

Zhipeng Cai

Intel Labs

Michael Paulitsch

Intel Labs

Kai Yuan

Intel Labs

Paper Code Video

MIDGArD Generates Simulatable Articulated Assets

Abstract

Providing functionality through articulation and interaction with objects is a key objective in 3D generation. We introduce MIDGArD (Modular Interpretable Diffusion over Graphs for Articulated Designs), a novel diffusion-based framework for articulated 3D asset generation. MIDGArD improves over foundational work in the field by enhancing quality, consistency, and controllability in the generation process. This is achieved through MIDGArD's modular approach that separates the problem into two primary components: structure generation and shape generation. The structure generation module of MIDGArD aims at producing coherent articulation features from noisy or incomplete inputs. It acts on the object's structural and kinematic attributes, represented as features of a graph that are being progressively denoised to issue coherent and interpretable articulation solutions. This denoised graph then serves as an advanced conditioning mechanism for the shape generation module, a 3D generative model that populates each link of the articulated structure with consistent 3D meshes. Experiments show the superiority of MIDGArD on the quality, consistency, and interpretability of the generated assets. Importantly, the generated models are fully simulatable, i.e., can be seamlessly integrated into standard physics engines such as MuJoCo, broadening MIDGArD's applicability to fields such as digital content creation, meta realities, and robotics.

Method

Articulated assets are represented as graphs.

Graph Representation

MIDGArD employs a modular and sequential approach based on two diffusion models: the structure generator and the shape generator.

MIDGArD Overview

Structure Generation

The structure generator denoises an articulation graph, unconditionally or from incomplete heterogeneous inputs. This articulation graph acts as an abstract, yet interpretable object representation, encoding the structural and semantic information of every link, as well as kinematic attributes.

Structure Generation 1
Structure Generation 2
Structure Generation 3
Structure Generation 4
Structure Generation 5
Structure Generation 6

Given partial information about an articulated asset, MIDGArD's structure generator module can complete the missing elements, yielding a consistent, human-interpretable representation that can be adjusted by a human designer during the creative process.

Shape Generation

Object attributes of the articulation graph are used by the shape generator to produce consistent part geometries.

Shape Generation

Controllable Generation

Image inputs allow controlling the object design on a part-level.

Controllable Generation

Simulation-Ready Assets

The generated articulated assets are fully simulatable and can be directly exported into MuJoCo:

Citation

@inproceedings{leboutet2024midgard,
  title = {MIDGArD: Modular Interpretable Diffusion over Graphs for Articulated Designs},
  author = {Leboutet, Quentin and Wiedemann, Nina and Cai, Zhipeng and Paulitsch, Michael and Yuan, Kai},
  booktitle = {Advances in Neural Information Processing Systems},
  pages = {xxxx--xxxx},
  year = {2024},
}