How to Access Stable Diffusion 3.5?

Blog

How to Access Stable Diffusion 3.5?

November 9, 2024

Stability.ai has unveiled Stable Diffusion 3.5, featuring multiple variants: Stable Diffusion 3.5 Large, Large Turbo, and Medium. These models are customizable and can run on consumer hardware. Let’s explore these models, learn how to access them, and use them for inference to see what Stable Diffusion brings to the table this time around.

Overview

Availability: The of the models can be downloaded from Hugging Face. Accessible through various platforms such as Stability AI’s API, Replicate, and others.
Safety and Security: Stability AI has implemented safety protocols designed to minimize potential misuse. These measures ensure responsible use and user safety.
Future Enhancements: Plans include ControlNet support, enabling more advanced and precise control over the image generation process.
Platform Flexibility: Users can access and integrate these models into their workflows across different platforms, providing flexibility in use.

Stable Diffusion 3.5 Models

Stable Diffusion 3.5 offers a range of models:

Stable Diffusion 3.5 Large: With 8.1 billion parameters, this flagship model delivers top-notch quality and prompt adherence, making it the most powerful in the Stable Diffusion lineup. It’s optimized for professional applications at 1 megapixel resolution.
Stable Diffusion 3.5 Large Turbo: A streamlined version of Stable Diffusion 3.5 Large, this model produces high-quality images with excellent prompt adherence in just 4 steps, offering significantly faster performance than the standard Large model.
Stable Diffusion 3.5 Medium: Featuring 2.5 billion parameters and the improved MMDiT-X architecture, this model is designed for seamless use on consumer hardware. It balances quality with customization flexibility, supporting resolution image generation from 0.25 to 2 megapixels.

The models can be easily fine-tuned to fit the needs and are optimized for consumer hardware, including the Stable Diffusion 3.5 Medium and Large Turbo models, which offer high-quality output with minimal resource demands. The 3.5 Medium model requires 9.9 GB VRAM (excluding text encoders), ensuring broad compatibility with most GPUs.

Comparison with Other Models

The Stable Diffusion 3.5 Large leads in prompt adherence and rivals larger models in image quality. The Large Turbo variant delivers fast inference and quality output, while the 3.5 Medium offers a high-performing, efficient option among medium-sized models.

Accessing Stable Diffusion 3.5

On Stability.ai Platform

Go to the platform page and get your API Key. (You’re offered 25 credits after signing up)

Run this Python code in a jupyter environment (Replace your API key in the code) to generate an image and change the prompt if you wish to.

import requests

response = requests.post(

   f"https://api.stability.ai/v2beta/stable-image/generate/sd3",

   headers={

       "authorization": f"Bearer sk-{API-key}",

       "accept": "image/*"

   },

   files={"none": ''},

   data={

       "prompt": "A middle-aged man wearing formal clothes",

       "output_format": "jpeg",

   },

)

if response.status_code == 200:

   with open("./man.jpeg", 'wb') as file:

       file.write(response.content)

else:

   raise Exception(str(response.json()))

I asked the model to generate an image of “A middle-aged man wearing formal clothes”, the model seems to be performing well in generating photo-realistic images.

On Hugging Face

You can use the model on Hugging Face.

First, click on the link, and then you can start inferencing directly from the Stable Diffusion 3.5-medium model.

This is the interface you’ll be greeted with:

I prompted the model to generate an image of “A forest with red trees”, and it did a wonderful job generating this 1024 x 1024 image.

Feel free to play around with the advanced settings to see how the result changes.

Using Inference API in Huggingface:

Step 1: Visit the model page of Stable Diffusion 3.5-large on Hugging Face

Note: You can choose a different model and see the options here: Hugging Face.

Step 2: Fill out the necessary details to get access to the model, as it’s a gated model, and wait for a while. Once you’ve been granted access, you’ll be able to use the model.

Step-3: Now you can run this Python code in a jupyter environment to send prompts to the model. (make sure to replace your Hugging Face token in the header)

import requests

API_URL = "https://api-inference.huggingface.co/models/stabilityai/stable-diffusion-3.5-large"

headers = {"Authorization": "Bearer hf_token"}

def query(payload):

 response = requests.post(API_URL, headers=headers, json=payload)

 return response.content

image_bytes = query({

 "inputs": "A ninja sitting on top of a tall building, 8k",

})

# You can access the image with PIL

import io

from PIL import Image

image = Image.open(io.BytesIO(image_bytes))

image

You can feel free to change the prompt and try to generate different sorts of images.

Conclusion

In conclusion, the model offers a robust range of image-generation models with various performance levels tailored for both professional and consumer use. The lineup, which includes the Large, Large Turbo, and Medium models, provides flexibility in quality and speed, making it a great choice for various applications. With simple access options via Stability AI’s platform, Hugging Face, and API integrations, Stable Diffusion 3.5 makes high-quality AI-driven image generation easier.

Also, if you are looking for Generative AI course then explore: GenAI Pinnacle Program

Frequently Asked Questions

Q1. How can I authenticate API requests to Stability AI?

Ans. API requests require an API key for authentication, which should be included in the header to access various functionalities.

Q2. What error responses might I encounter with the Stability AI API?

Ans. Common errors include unauthorized access, invalid parameters, or exceeding usage limits, each with specific response codes for troubleshooting.

Q3. Is Stable Diffusion 3.5 Medium free to use?

Ans. The model is free under the Stability Community License for research, non-commercial use, and organizations with under $1M revenue. Larger entities need an Enterprise License.

Q4. What makes Stable Diffusion 3.5 Medium different?

Ans. It uses a Multimodal Diffusion Transformer (MMDiT-X) with improved training techniques, such as QK-normalization and dual attention, for enhanced image generation across multiple resolutions.

I’m a tech enthusiast, graduated from Vellore Institute of Technology. I’m working as a Data Science Trainee right now. I am very much interested in Deep Learning and Generative AI.

Source link

Blog