If you could, you'd find that the model tends to generate images of people with lighter skin tones and reinforce traditional gender roles. On its website, Google lets you click on specific words from a selected group to see results, like 'a photo of a fuzzy panda wearing a cowboy hat and a black leather jacket playing a guitar on top of a mountain,' but you can't search for anything to do with people or potentially problematic actions or items. So no, you can't access Imagen for yourself. The koala is wearing large marble headphones.' 'A marble statue of a koala DJ in front of a marble statue of a turntable. The text strings can become quite complicated. Not just that, but Google used the LAION-400M dataset for Imagen, which is known to 'contain a wide range of inappropriate content including pornographic imagery, racist slurs, and harmful social stereotypes.' A subset of the training group was filtered to remove noise and 'undesirable' content, but there remains a 'risk that Imagen has encoded harmful stereotypes and representations, which guides our decision to not release Imagen for public use without further safeguards in place.' These biases have negative societal impacts worth considering and, ideally, rectifying. Content on the internet is skewed and biased in ways that we are still trying to understand fully. If a model can conceivably create just about any image from text, how good is a model at presenting unbiased results? AI models like Imagen are largely trained using datasets scraped from the web. There are ethical challenges with text-to-image research. We don't know how Imagen handles these text strings because Google has elected not to show any people. 'A robot couple fine dining with Eiffel Tower in the background.' What would this couple look like if the text didn't include the word 'robot'? What would these people look like? Would doctors mostly be men, would flight attendants mostly be women, and would most people have light skin? I'm sure there'd be a lot of text inputs about adorable animals in humorous situations, but there'd also be input text about chefs, athletes, doctors, men, women, children, and much more. Whether innocent or ill-intentioned, we know that some users would immediately start typing in all sorts of phrases about people as soon as they had access to Imagen.
Funny outfits on furry animals, cactuses with sunglasses, swimming teddy bears, royal raccoons, etc. Seemingly all of what we've seen so far from Imagen is cute. 'A cute corgi lives in a house made out of sushi.' Consider the image below generated from 'a cute corgi lives in a house made out of sushi.' It looks believable, like someone really built a dog house from sushi that the corgi, perhaps unsurprisingly, loves. Imagen uses text-conditional super-resolution diffusion models to upsample the 64圆4 image into a 256x2x1024.Ĭompared to NVIDIA's GauGAN2 method from last fall, Imagen is significantly improved in terms of flexibility and results. A 'conditional diffusion model' then maps the text embedding into a small 64圆4 image.
Imagen works by taking a natural language text input, like, 'A Golden Retriever dog wearing a blue checkered beret and red dotted turtleneck,' and then using a frozen T5-XXL encoder to turn that input text into embeddings. 'The Toronto skyline with Google brain logo written in fireworks.'
You can read about the full testing results in Google's research paper. Imagen also bests DALL-E 2 and other competing text-to-image methods among human raters. Despite not being trained using COCO, Imagen still performed well here too. Using a standard measure, FID, Google Imagen outpaces Open AI's DALL-E 2 with a score of 7.27 using the COCO dataset. Google's results are extremely, perhaps even scarily, impressive.
I am sure you would like to check this out.Īnd feel free to larger the pictures for better academic comparison purposes.‘A blue jay standing on a large basket of rainbow macarons.' Credit: GoogleĪbout a month after OpenAI announced DALL-E 2, its latest AI system to create images from text, Google has continued the AI "space race" with its own text-to-image diffusion model, Imagen. All bugles in this list are ranked and measured on a scale of zero to five Jon Hamms. We all know something about Jon Hamms bulge, and to be fair with other celebrities, he is not ranked on the list.
This list of celebulge is something you shouldn’t miss.