Discussion about this post

User's avatar
Emerald Fleur's avatar

Random thought for fellow AGI Friday readers: Those cheaply generated AI summaries at the top of Google Search results can't be good for the reputation of AI as a whole, no?

I find that Gemini 2.5 Pro and all the gemini thinking or chain of thought models so far very rarely hallucinates, while Gemini Flash and similar Gemini models nearly always hallucinate to the point where I never trust Flash with any factual lookup, even though both models have access to Google Search.

I read in a wired article about the history of Gemini that while internal factions complained, surveying done of end users found that the addition of AI generated summaries overwhelmingly was preferred by users.

"The senior director involved ordered up some testing, and ultimately user feedback won: 90 percent of people who weighed in gave the summaries a “thumbs up.”

https://www.wired.com/story/google-openai-gemini-chatgpt-artificial-intelligence/

However, as a person who uses AI (never for creative work) daily, I DO NOT TRUST the AI summaries at the top of the search results even on the most common queries that must be queried millions of times a day because they have been so consistently wrong as to make me permanently distrust and mentally skip over these results every time I see them. If I want actual results I tap the AI mode button.

I can see the reasoning this way Socratic argument style.

A) Using the flash model is cheaper since we have a lot of unique google searches and allows us to customize it for every single user.

B) Fair, but shouldn't we use a more computationally expensive model?

A) No, because users are clearly already content with our worse model.

B) The one that generates really inaccurate results? We have a model that doesn't hallucinate right here. Why don't we use it?

A) I can show you a business cost benefit right here that shows that the improved results barely matter to the end user for x y and z reason.

Here's me chiming into the hypothetical

C) Why don't we just generate an expensive computational cost summary for the 1 million most common queries, or some threshold that meets the most users possible?

A) (My guess of what they'd say) We already do that with the flash model, but the generating notes makes users feel more engaged with the AI than if we were to instantly display the results. Additionally, using the flash model allows us to generate individualized results that better reflect the flow of information sources presented to the users and allows us to react to realtime events like if the Pope were to suddenly pass away.

C) Why don't we just cache an expensive query once an hour for the most common search queries? Users aren't expecting personalization there (although that'd be really cool) because users aren't expecting to opt into AI personalization when they use a Google Search (yet) and might prompt a bigger backlash than we already have. We treat it like our extracted data from Wikipedia articles.

A) Cost, benefit, analysis.

C) This small expensive change could do a lot to convert a lot of users to Gemini. Who in their right minds would pay us for Gemini if flash keeps hallucinating on every second result?

A) You're exaggerating.

C) I am frustrated! This is our chance to get AI in front of as many people as possible and we could convert a lot of people who are skeptics into believers by providing good quality results. That's what they come to Google for, to get good quality results and frankly while they're good enough for some long term growth might be hampered if people start mentally skipping over it, even if your cost benefit analysis and statistics show that this is the most wise short term decision!

I have very strong feelings about AI summaries. 😅

Expand full comment
Markos's avatar

"the thing they’re testing is… just the smartphone app"

They are testing both the FSD part and any remote operations (monitoring the car, even remotely stopping/starting it)

"When you summon a car, it’s just a normal Tesla with a human driver using normal supervised FSD."

That's both a safety and legal feature. They want to avoid reporting any issues to the NHTSA while testing. You can argue that it's bending the rules or whatever, but it's not a basis for the technical feasibility of them starting an actual service in June.

As for what will happen in June, in the Q1 2025 earnings call, they spoke of "10-20 cars" and Musk specifically mentioned "June or July". So it will be a tiny start, and they can still claim "victory"

Expand full comment
18 more comments...

No posts