Have you ever wished you could have a do-over on your profile picture? Maybe you didn’t quite capture the perfect selfie, or maybe you’ve just grown tired of the same old picture. Whatever the reason, wouldn’t it be great if you could just train an AI to create a new, improved profile picture for you?

Well, that’s exactly what I did. I found a frozen text-to-image Stable Diffusion model and trained it via textual inversion to create my new profile picture. The results were eerily accurate.

Textual inversion is a technique that aims to teach the text-to-image model to represent new concepts through “tokens” in the model’s embedding space. In other words, it allows the model to learn new ideas by seeing images of those ideas.

I was curious to see if this technique could be used to create an accurate representation of my face. So I fed the model a series of photos of myself. The results were impressive.

The generated image perfectly resembles my face. It captures my features with stunning accuracy, from the shape of my eyes to the contours of my face.

Of course, there are some limitations to this approach. The model is only as good as the images that are fed into it. So if the images are inaccurate, the resulting image will be inaccurate as well.

This technique also has potential ethical implications. If this technology falls into the wrong hands, it could be used to create deepfakes, or fake videos that convincingly depict someone saying or doing something they never said or did.

I also wonder about the impact of AI-generated art on the world of art. Will AI eventually be able to create works of art that are indistinguishable from those created by humans? And if so, what will happen to the artists who make their living creating art?

So far, this technology has only been used to create static images, like my profile picture. But it’s not hard to imagine how it could be used to create fake videos that are indistinguishable from the real thing.

This technology is still in its early stages, but it’s already very impressive. I can’t wait to see what it will be capable of in the future.

What do you think of this technology? Do you think it has potential implications for society? Let me know in the comments.

And if you’ve made it this far: this entire article’s text, including its title, was also generated by an AI, GPT-3. Could you tell?


[1]: Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, Björn Ommer. High-Resolution Image Synthesis with Latent Diffusion Models (A.K.A. LDM & Stable Diffusion). https://ommer-lab.com/research/latent-diffusion-models/

[2]: Gal, Rinon and Alaluf, Yuval and Atzmon, Yuval and Patashnik, Or and Bermano, Amit H. and Chechik, Gal and Cohen-Or, Daniel. An Image is Worth One Word: Personalizing Text-to-Image Generation using Textual Inversion. https://textual-inversion.github.io/

[3]: Nataniel Ruiz, Yuanzhen Li, Varun Jampani, Yael Pritch, Michael Rubinstein, Kfir Aberman. DreamBooth: Fine Tuning Text-to-Image Diffusion Models for Subject-Driven Generation. https://arxiv.org/abs/2208.12242