Cosmos-Predict2 is NVIDIAโ€™s next-generation physical world foundation model, specifically designed for high-quality visual generation and prediction tasks in physical AI scenarios. The model features exceptional physical accuracy, environmental interactivity, and detail reproduction capabilities, enabling realistic simulation of complex physical phenomena and dynamic scenes. Cosmos-Predict2 supports various generation methods including Text-to-Image (Text2Image) and Video-to-World (Video2World), and is widely used in industrial simulation, autonomous driving, urban planning, scientific research, and other fields. GitHub:Cosmos-predict2 huggingface: Cosmos-Predict2 This guide will walk you through completing text-to-image workflow in ComfyUI. For the video generation section, please refer to the following part:

Cosmos Predict2 Video Generation

Using Cosmos-Predict2 for video generation
Make sure your ComfyUI is updated.Workflows in this guide can be found in the Workflow Templates. If you canโ€™t find them in the template, your ComfyUI may be outdated.(Desktop versionโ€™s update will delay sometime)If nodes are missing when loading a workflow, possible reasons:
  1. Not using the latest ComfyUI version(Nightly version)
  2. Using Stable or Desktop version (Latest changes may not be included)
  3. Some nodes failed to import at startup