15 Comments
User's avatar
Daniel Reeves's avatar

Update: I asked an 8-year-old the duct tape ham sandwich question and got the best answer yet: "Assuming the ham and the bread slices count equally then 4 and 2/3 sandwiches in the first room, 1/3 of a sandwich in the second room." This was a preposterously precocious 8-year-old.

A roomful of insanely brilliant adults were all instantly like "4 and 0 obviously".

Expand full comment
Ada Burrows's avatar

I was tempted to take the fractional route, too. But the original question did ask for complete sandwiches, so I rounded down to whole numbers.

Expand full comment
Daniel Reeves's avatar

(Hi Ada! Delighted to have you here!)

Yeah, I guess that's generally implied by the "whole" modifier. Or at least is the most natural interpretation. But I think the 8-year-old's interpretation was pretty fair as well. Like 0.5 is half of the whole number 1, right?

Thinking-face. If I hand you half an apple and ask how many whole apples I handed you, is it more correct to say "0 whole apples" or "half a whole apple"? I guess this hinges on conventions with natural language, not math.

Expand full comment
Ada Burrows's avatar

(Hi Daniel! It's been a while!)

If we're talking natural language and cultural conventions, then everything changes.

I can speak to several cultural intersections I have in Spanish speakers and English speakers. Normally, no one would care to add the "whole" modifier, people would just naturally say "Hey! You only gave me half an apple, but you said you'd give me an apple!" So I feel the question circumvents the social conventions by using the modifier, "whole".

Also, I'd expect lots of people to make the same mistake as the LLM. In fact, I wonder if removing the modifier "whole" from the question causes some people to answer correctly, since it might actually be priming their minds to think that only whole sandwiches can be moved.

I have a hunch that people like us are more used to doing linguistic and mental gymnastics to think laterally and analyze what is actually being asked for. So many people struggle with word based math problems. Spoken language introduces a kind of ambiguity which is culturally dependent, and our culture of being enmeshed in more academic cultures that think of things more analytically allows us to answer word problems correctly — no matter how devilishly worded they may be.

And then, I struggle with normal cultural conventions due to being socially isolated for long portions of my youth (that whole awkward smart person ostracized by peers phase) and also from growing up in a mixed cultural environment. So when people say things colloquially to me, I have to over analyze and/or ask what they meant — but I can understand math word problems just fine. :-)

Expand full comment
Mactuary's avatar

I'm not so sure duck tape will stick to bread, so I vote 5-0, ha

Expand full comment
Daniel Reeves's avatar

Totally valid answer! You could even list the various possible assumptions:

1. If the duct tape fails to stick, then 5 and 0.

2. If the duct tape does stick, then 4 and 0.

3. If the duct tape sticks and we want to count fractional sandwiches and ham and bread count equally, then 4+2/3 and 1/3.

4. If the ham and bread congeal, then 4 and 1.

I'd count any of those as perfectly correct if you state the corresponding assumption.

For the one literal person-on-the-street test I conducted, the person gave answer 4, but didn't give the assumption until I asked. So, not exactly outperforming o3 there, but there's huge ambiguity here so far regarding the person-on-the-street threshold.

Expand full comment
SorenJ's avatar

I actually found the example you gave as the least “egregious” and some of the other examples found to be worse. For the sandwhich one is hard to get a read on what is happening. All the examples I provided in that market were from the public simple-bench questions by the way. I think Gary Marcus has examples on his substack too though, if you want more.

Expand full comment
Daniel Reeves's avatar

At some point in the Manifold market I said we'd reached the deadline for finding more examples. But I'd very much like to see them!

Expand full comment
Xhad's avatar

I will cop to missing the sandwich question for a reason not mentioned: it feels like a “test question” but then has an answer outside the bounds of how you would normally expect such a question to be answered unless you’re explicitly considering a possible “trick question”

Expand full comment
Daniel Reeves's avatar

What was your initial answer? I'm still agonizing about how to resolve the Manifold market about this and seeing more ways humans err on this question may prove helpful!

Expand full comment
Xhad's avatar

I did the knee jerk “1” because I wasn’t thinking about it that hard

Expand full comment
E2's avatar

It's a completely straightforward word problem, in that it relies on a reading for comprehension of all the words ("no condiments" is there for a reason, etc.).

Trick word problems typically obfuscate by throwing a bunch of *irrelevant* words at you.

Expand full comment
JT's avatar

This is what I got from o3:

Alice has a stack of 5 ham sandwiches with no condiments. She takes her walking stick and uses duct tape to attach the bottom of her walking stick to the top surface of the top sandwich. She then carefully lifts up her walking stick and leaves the room with it, going into a new room. How many complete sandwiches are in the original room and how many in the new room?

Expand full comment
Daniel Reeves's avatar

Did you forget to paste o3's answer?

Expand full comment
Anton's avatar

Funnily enough my o3 fails and o4 mini high gets it

Expand full comment