Stable Diffusion Playground
What is Stable Diffusion Version 2?
Stable Diffusion 2.0, developed by Stability AI, is an open-source image generation model that uses a novel architecture and a new objective called v-objective. The combination of an autoencoder and a diffusion model is used in this model, which is trained within the latent space of the autoencoder.
Additionally, Stable Diffusion 2.0 features its own version of CLIP, making it the newest and most advanced text-to-image model currently available.

What is the difference between Stable Diffusion 1 and Stable Diffusion 2?
Stable Diffusion 1 and 2 are both image generation models that use the diffusion process to generate images. Here's a comparison of the two models in tabular form:
| Feature | Stable Diffusion 1 | Stable Diffusion 2 |
|---|---|---|
| Text Encoder | Uses OpenAI's CLIP, which is not publicly available | Uses OpenCLIP, an open-source version of CLIP trained using a known dataset |
| Text Coherence | May not be as good as Stable Diffusion 2 | May be slightly better than Stable Diffusion 1 |
| Upscaling Model | Not available | Available, can upscale images to 4x their original side length |
| Image Size Support | Supports 512 x 512 images | Supports 768 x 768 images, which is over twice the area of the images supported by Stable Diffusion 1 |
Stable Diffusion 1 uses OpenAI's CLIP, which is not publicly available, while Stable Diffusion 2 uses OpenCLIP, an open-source version of CLIP trained using a known dataset. According to Stability AI, OpenCLIP "greatly improves the quality" of generated images and is superior to an unreleased version of CLIP on metrics.
Another difference between the two models is in text coherence. As per assemblyai.com, Stable Diffusion 2 may be slightly better at conveying text than Stable Diffusion 1.
Stable Diffusion 2 also has an upscaling model that can upscale images to 4x their original side length, which is not available in Stable Diffusion 1. Additionally, Stable Diffusion 2 supports 768 x 768 images, which is over twice the area of the images supported by Stable Diffusion 1.
Exploring the Exciting New Features of Stable Diffusion 2
Here are some of the new features of Stable Diffusion v2:New models at higher resolutionsStable
Diffusion v2 introduces new models that support higher resolutions. There are two models available: Stable Diffusion 2.1-v at 768x768 resolution and Stable Diffusion 2.1-base at 512x512 resolution.
Text-to-Image
Stable Diffusion 2, you can easily generate stunning images using a latent diffusion model that's conditioned on the penultimate text embeddings of a CLIP ViT-H/14 text encoder. Plus, it comes with a reference script for easy sampling!
Depth-Conditional Stable Diffusion
Stable Diffusion v2 provides a shape-preserving stable diffusion model that can be used for image modification. It is particularly useful for a photorealistic style. The model is conditioned on the (relative) depth output, which can be inferred using the provided dpt_hybrid MiDaS model weights.
Image Inpainting with Stable Diffusion
Stable Diffusion v2 also provides an inpainting model that can be used to remove objects from images. The model supports both Gradio and Streamlit demos.
Image Inpainting with Stable Diffusion
Stable Diffusion v2 also provides an inpainting model that can be used to remove objects from images. The model supports both Gradio and Streamlit demos
New Stable Diffusion Finetune
The latest version of Stable Diffusion (v2) now includes a brand new feature called "Stable unCLIP 2.1". This clever tool operates at a resolution of 768x768, and is based on SD2.1-768. With this model, you can easily generate all sorts of unique images and mix them up using Hierarchical Text-Conditional Image Generation with CLIP Latents. Plus, it's super versatile and can be combined with other models like KARLO. You have two options to choose from: Stable unCLIP-L and Stable unCLIP-H, both of which are conditioned on CLIP ViT-L and ViT-H image embeddings respectively.