Advanced Settings: Explained
CatGPT uses stable diffusion to generate unique and highly creative cat-themed artwork.
Stable Diffusion is an algorithmic framework used for generating high-quality images from noisy inputs. The process involves gradually removing noise from a randomly generated image to produce a clear and detailed final image.
The Stable Diffusion system lets people adjust different parts, referred to as “settings” or “parameters”, to see what combination leads to the best images. We welcome you to take part in this technological exploration and play with the settings yourself to see what works best!
Schedulers are used to improve images by working with other components to make noisy images clearer. These programs repeatedly adjust the image, adding and removing random noise, until it looks clean. These programs are quite complicated, balancing how quickly they make the image clear with how good the resulting image looks.
There are more than 10 types of these programs available, and choosing the right one affects how good the image looks and how fast it's made. We’ve included 7 schedulers which you can test out and decide what you like the most.
Number of Inference Steps
The term 'number of inference steps' indicates how many steps are taken to create an image. Generally speaking, more steps lead to higher-quality images but require more time for processing.
However, after a certain point, adding additional steps will only marginally improve the quality while still requiring more processing time. For this reason, we suggest using 30 steps by default. If you find the quality lacking, consider increasing the number of steps accordingly.
The CFG scale, or ‘classifier-free guidance scale,’ determines how closely an image aligns with a given text prompt. Higher values mean the image sticks more closely to the text input. However, setting the value to the maximum isn't always ideal because too much guidance can reduce diversity and quality.
Different CFG scale values are suited for specific types of prompts:
CFG 2-6: Creative but might deviate from the prompt. Fun and useful for short prompts.
CFG 7-10: Recommended for most prompts. Strikes a good balance between creativity and following the prompt.
CFG 10-15: Suitable when your prompt is very detailed and clear about what you want the image to look like.
CFG 16-20: Generally not recommended unless the prompt is highly detailed, as it could affect coherence and quality.
CFG >20: Almost never usable due to excessive constraint.
Choosing the right CFG scale depends on the prompt and the desired output, ensuring a balance between creativity, adherence to the prompt, and overall image quality.
In the Img2Img mode of Stable Diffusion, a new image is generated based on a text prompt AND an existing image. This is how CatGPT allows users to remix and clone existing images with their own cat.
During this process, the strength parameter plays a crucial role. It determines the level of noise that is added to the initial image while generating a new one.
Setting the strength close to 0 will produce an image nearly identical to the original (the image you are remixing/cloning).
Setting the strength to 1 will produce an image that greatly differs from the original (the image you are remixing/cloning).
The seed is a number that determines the initial random noise. Since this random noise shapes the final image, the unique seed is the reason you get a different image each time you run the exact same prompt. Conversely, if you run the same seed with the same prompt multiple times, you will receive the same generated image each time.
Copying the seed and using it across multiple generations is a great way to experiment with different parameters while keeping other factors constant.
Refine styles improve the image by adding finer details to the output. Similar to schedulers, there are a variety of programs which exist and provide this function. We’ve provided 3 options for you to test out and see what works best!
High Noise Frac
This setting only applies if you are using the refine style: expert_ensemble_refiner
It refers to the percentage of inference steps to run in each stage of the base model and refiner model.
By default we have this set to .8, meaning 80% of inference steps run on the refiner model and 20% run on the base model.
This setting only applies if you are using the refine style: base_image_refiner
Similar to “High Noise Frac,” this setting refers to the number of steps run with the refiner model. By default, all inference steps will be run with the refiner model.
LoRA models are what we’re using to create images that look like your cat.
The larger this value is, the more the result resembles the cat on which the model was trained.
The smaller this value is, the more the result is dissimilar to the cat on which the model was trained.
Use Native SDXL
Using this parameter, you can toggle on/off whether the images will feature your pet.
If turned off: images will be generated on SDXL model trained on your cat
If turned on: images will be generated on standard SDXL model not trained on your cat