Google’s new AI tool uses image prompts instead of text

Google has introduced a new AI tool called “Whisk,” which allows users to upload photos and receive a unique, AI-generated image without needing to provide any text descriptions. Users can upload images that show subjects, settings, and styles, and Whisk will blend them into a single, cohesive image. According to Google, Whisk is designed as a “creative tool” to quickly inspire users, rather than a traditional image editor. It’s intended more for fun and creativity, rather than for producing polished, professional-level results.

Big Tech companies, including Google and OpenAI, are competing to release consumer products that highlight the potential of new AI technology, even as critics warn about the risks posed by the lack of regulations in AI development.

Since OpenAI launched its text-to-image tool, Dall-E, in 2021, AI-generated art has gained immense popularity on social media and has become a key focus for consumer products. Google’s Whisk takes this further with an image-to-image generator, expanding on the concept of text-to-image tools.

Whisk allows users to “remix” images by adjusting their inputs and mixing various categories to create different items like plush toys, enamel pins, or stickers. While users can add text to influence specific details, it is not necessary to create an image.

Thomas Iljic, a product management director at Google Labs, explained that Whisk is designed to help users explore new and creative visual combinations quickly, focusing on rapid exploration rather than creating precise edits.

Whisk leverages generative AI technology developed by DeepMind, the AI lab acquired by Google in 2014. Whisk operates by using Google’s AI platform, Gemini, which launched in December 2023, and combining it with Imagen 3, DeepMind’s latest text-to-image generator released in the same month.

When users upload images to Whisk, Gemini creates a caption that is then fed into Imagen 3. This process captures the “essence” of the subject, rather than a perfect replica, allowing for creative remixing of the final image. However, this can result in the final image differing from the original prompt, with variations in aspects like height, hairstyle, or skin tone, as explained by Google.

When Gemini’s text-to-image tool was first introduced in February, it faced criticism for generating historically inaccurate images. Currently, Whisk is available as a website on Google Labs for users in the US and is still in its early development stages, according to the company.

OpenAI has also recently introduced a text-to-video generator called Sora, intensifying the competition in the consumer AI product market. Dan Ives, managing director and senior equity analyst at Wedbush Securities, stated that Whisk represents another major step for Google in the AI and tech race. He emphasized that DeepMind is a valuable asset for Google and highlighted that AI products are part of the company’s upcoming offerings for 2025.

These include a new Android operating system developed in collaboration with Samsung and Qualcomm.

Google’s new AI tool uses image prompts instead of text

Sachin Mane

Related News

LATEST NEWS