First Look at Stable Cascade, a New Text-to-Image AI Generator with Three-Stage Approach

545shares
Facebook
Twitter
Pinterest
Reddit
WhatsApp
Telegram
Bluesky
Threads
Baidu
ChatGPT
Perplexity
Google Preferred Source

Here’s a first look at Stable Cascade, a new text-to-image AI generator from Stability AI that takes a three-stage approach. This new hyper efficient Würstchen architecture enables a hierarchical compression of images, resulting in incredible outputs while utilizing a highly compressed latent space.

What sets this apart from Stable Diffusion is the Latent Generator phase, or Stage C, which transforms the user inputs into compact 24×24 latents that are passed along to the Latent Decoder phase (Stages A & B). This is used to compress images, similar to what the job of the VAE is in Stable Diffusion, but achieving results at a much higher compression. Github page here.

No products found.

Stable Cascade Text-to-Image AI Generator

Next to standard text-to-image generation, Stable Cascade can generate image variations and image-to-image generations. Image variations work by extracting image embeddings from a given image using CLIP and then returning this back to the model,” said Stability AI in a press release.