A screenshot showing Google’s Project Astra
Google plans to start adding Astra’s capabilities to its Gemini app and across its products this year © Google

Google owner Alphabet has unveiled an artificial intelligence agent that can answer real-time queries across video, audio and text, as part of a number of initiatives designed to showcase its prowess in AI and quell criticism that it has fallen behind rivals.

Chief executive Sundar Pichai demonstrated the Silicon Valley giant’s new “multimodal” AI assistant called Project Astra, powered by an upgraded version of its Gemini model, during an annual developer conference on Tuesday.

Astra was part of a series of announcements to showcase a new AI-centric vision for Google. It follows product launches and upgraded AI models from Big Tech rivals including Meta, Microsoft and its partner OpenAI.

In a video demonstration, Google’s prototype AI assistant responded to voice commands based on an analysis of what it sees through a phone camera or when using a pair of smart glasses.

It successfully identified sequences of code, suggested improvements to electrical circuit diagrams, recognised the King’s Cross area of London through the camera lens, and reminded the user where they had left their glasses.

Google plans to start adding Astra’s capabilities to its Gemini app and across its products this year, Pichai said. However, he said that while the ultimate “goal is to make Astra seamlessly available” across the company’s software, it would be rolled out cautiously and “the path to productisation will be quality driven”.

“Getting response time down to something conversational is a difficult engineering challenge,” said Sir Demis Hassabis, head of its AI research arm DeepMind. “It is amazing to see how far AI has come, especially when it comes to spatial understanding, video processing and memory.”

At the conference, Google also set out big changes to its core search engine. From this week, all US users will see an “AI Overview” — a brief AI-generated summary answer to the query — at the top of many common search results, followed by clickable links interspersed with advertisements lower down.

The company said the search system would be able to answer complex questions with multi-step reasoning — meaning the AI agent can make several independent decisions in order to complete a task — and help customers generate search queries using voice and video.

Liz Reid, head of Google search, said the aim was to “remove some of the legwork in search” and that AI overview would be expanded to users in other parts of the world later this year.

The changes come as OpenAI threatens Google’s search business.

The San Francisco-based start-up’s ChatGPT chatbot provides quick and complete answers to many questions, threatening to render obsolete search results that provide a traditional list of links alongside advertising. OpenAI has also signed deals with media organisations to include up-to-date information to improve its responses.

On Monday — in a move seen as an attempt to upstage Google’s announcements — OpenAI demonstrated a faster and cheaper version of the model that powers ChatGPT, which can similarly interpret voice, video, images and code in a single interface.

Google also revealed new or improved AI products including Veo, which generates video from text prompts; Imagen 3, which creates pictures; and Lyria, a model for AI music generation. Subscribers to Gemini Advanced will be able to create personalised chatbots called “Gems” to help with specific tasks.

The company’s flagship Gemini 1.5 Pro model has also been upgraded. It now has a much larger context window of 2mn tokens — referring to the amount of data such as code or images that it can refer to when generating a response — making it better at following nuanced instructions and referring back to earlier conversations.

Copyright The Financial Times Limited 2024. All rights reserved.
Reuse this content (opens in new window) CommentsJump to comments section

Comments