“Stable Diffusion” is going to lead innovations in computer vision in 2023. It must be exciting!

Hi friends. Happy new year! Hope you are doing well. Last year, I found a new computer vision model, called “Stable Diffusion” in September. Since then, many AI researchers, artists and illustrators are crazy about that because it can create high quality of images easily. The image above is also created by “Stable Diffusion”. This is great!

1. I created many kinds of images by “Stable Diffusion”. They are amazing!

These images below were created in the experiments by “Stable Diffusion” last year. I found that it has a great ability to generate many kinds of images from oil painting to animation. With fine-tuning by “prompt engineering”, they are getting much better. It means that we should input appropriate words / texts into the model then the model can generate images that we want more effectively.

2. “Prompt engineering” works very well

In order to generate images that we want, we need to input the appropriate “prompt” into the model. We call it “prompt engineering” as I said before,

If you are a beginner to generate images, you can start it with a short prompt such as ” an apple on the table”. When you want the image which looks oil painting, you can just add it such as “oil painting of an apple on the table”.

Let us divide each prompt into three categories

  • Style
  • physical object
  • the way the physical object is displayed (ex. lighting)

So all we have to do is to consider what “each category of our prompt” is and input it into the model. For example “oil painting of an apple on the table, volumetric light’ . The results are images below. Why don’t you try it by yourself?

3. More research needed

Some researchers in computer vision think “Prompt engineering” can be optimized by computers. They developed the model to optimize it. In the research paper(1), they compare hand made prompt vs AI optimized prompt (see the images below). Which do you like better? I am not sure optimization always works perfectly. Therefore I think more research is needed with many use cases.

I will update my article to see how the technology is going in the future. Stay tuned!

1) Optimizing Prompts for Text-to-Image Generation Yaru Hao, Zewen Chi, Li Dong, Furu Wei, Microsoft Research, 19 Dec 2022, https://arxiv.org/abs/2212.09611

Notice: ToshiStats Co., Ltd. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithms or ideas contained herein, or acting or refraining from acting as a result of such use. ToshiStats Co., Ltd. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on ToshiStats Co., Ltd. and me to correct any errors or defects in the codes and the software.

“Stable Diffusion” is a game changer of computer vision. It is amazing!

Hi friends. Hope you are doing well. Today, I would like to introduce a new computer vision model, called “Stable Diffusion”. This is an open source software. It means that you can use it for free, just download it without paying money for a license. It is good for anyone who is interested in computer vision. The image above is created by “Stable Diffusion”. It looks so good! I love that because it is very easy to create such beautiful images.

1.These images are amazing!

These images are created from the same text. When you see the background of each image, you may know where the girl stays. Yes it is “a cafe” because I make a text to order that she is in a cafe. As you know this is a “text to image generative model”. It means that we should input some words / texts into the model then the model can generate images based on this instruction. It is very interesting as I feel like I can communicate with computers when I create these images.

2. It is “open source software”

If I have to pay a lot of money to use it, it is not so impressive because very few people can do that. Fortunately, however, it is an open source software so everyone can use it for free! If you want to integrate “Stable Diffusion” into your products, no problem. If you want to create an updated version of this software, you can do that because it is open source software. So I want to make my own products with it in the near future. Why don’t you try it by yourself? If you are interested in “Stable Diffusion”, I recommend you to watch a youtube video about an interview of Emad Mostaque, founder of Stability AI (1). This company creates “Stable Diffusion” . The details of release information are provided here (2). Please check the terms of the licence of this software, too. 

3. It can change the direction of computer vision and beyond

The blog says “This release is the culmination of many hours of collective effort to create a single file that compresses the visual information of humanity into a few gigabytes.”. I cannot predict what can be achieved by this software exactly. But I can say many things , which used to be impossible, can be possible with this software. It means that “Stable Diffusion” enables all of us to create products, services and arts, which are unseen yet. This is definitely “democratization of AI”. I expect a tsunami of new kinds of products, services and arts will appear in the near future. It must be exciting!

I will update my article to see how this new software is going in the future. Stay tuned!

1] The Man behind Stable Diffusion https://www.youtube.com/watch?v=YQ2QtKcK2dA&t=942s

2) Stable Diffusion Public Release https://stability.ai/blog/stable-diffusion-public-release

Notice: ToshiStats Co., Ltd. and I do not accept any responsibility or liability for loss or damage occasioned to any person or property through using materials, instructions, methods, algorithms or ideas contained herein, or acting or refraining from acting as a result of such use. ToshiStats Co., Ltd. and I expressly disclaim all implied warranties, including merchantability or fitness for any particular purpose. There will be no duty on ToshiStats Co., Ltd. and me to correct any errors or defects in the codes and the software.