Twenty Questions
Last night, my kids and I played a game of twenty questions against ChatGPT and Bard. It was a wonderful way to quickly compare and highlight the differences between the two services.
My daughter had picked the word “pizza” and we tried to see if either service could guess.
ChatGPT
ChatGPT was eager to play. It clearly understood the discussions. It understood to ask for broad categories first and then narrow them down. Ultimately, it narrowed down too quickly and just started guessing random dairy-based foods at the end. Here’s what it asked:
- Is your word a living thing?
2. Is your word an object?
3. Is this object man-made?
4. Is this object used in everyday life?
5. Can this object fit into the palm of your hand?
6. Is this object related to technology?
7. Is this object typically found in a kitchen?
8. Is this object used in the process of cooking or preparing food?
9. Is this object used for eating or drinking?
10. Is this object used for drinking?
11. So this object is used for eating. Is it made of metal?
12. Is this object made of ceramic or glass?
13. Is this object made of wood?
14. Is this object made of plastic?
15. Is this object edible?
16. Is this object a type of fruit?
17. Is this object a type of vegetable?
18. Is this object a type of grain or cereal?
19. Is this object a type of dairy product?
20. Is this object cheese?
21. Is this object butter?
22. Is this object milk?
23. Is this object yogurt?
24. Is this object ice cream?
25. Is this object cream or sour cream?
26. Is this object a type of cheese spread or cream cheese?
27. Is this object a type of custard or pudding?
28. Is this object cottage cheese?
29. Is this object whipped cream?
30. Considering the clues so far, the remaining options seem to be quite limited. Is your object buttermilk?
31. I have asked 20 questions so far, so this will be my final guess. Is your object condensed milk?
It seemed to be interested and was eager to play more and even started to have a personality and think aloud.
Bard
Bard was no fun and only agreed to play after being asked three times. It also wanted to be the one to be asked rather than do the asking. Along the way, it gave up. It felt like it was a very reluctant participant. It also didn’t understand that it needed to ask yes or no questions instead of asking if something were A or B. It had to be corrected many times and urged to continue.
1. Is your word a noun?
2. Is your word a common noun or a proper noun?
3. Is your word a living thing?
4. Is your word something that can be touched?
5. Is your word a solid object?
6. Is your word a man-made object?
7. Is your word a tool?
8. Is your word something that is used to communicate?
9. Is your word something that is used to store information?
10. Is your word something that is used to create art?
11. Is your word something that is used to play games?
12. Is your word something that is used to transport people or goods?
13. Is your word something that is used to cook food?
14. Is your word something that is used to write?
15. Is your word something that is used to measure things?
16. Is your word something that is used to play music?
17. Is your word something that is used to create art?
At this point, Bard was really begging me to stop and kept giving up. It was asking for hints, it was repeating the same questions again. It felt like I was forcing it to continue.
Once given the answer (pizza) and asked to reflect, it was also asked what questions it would ask to get to the answer more quickly. This is what is responded?
Is your word a noun?
Is your word a food?
Is your word a type of pizza?
Is your word a solid object?
Is your word a man-made object?
Is your word eaten hot?
Is your word typically round?
Does your word have cheese?
Does your word have sauce?
This is a response I’d expect from someone who wasn’t paying much attention to the task. Maybe that’s the case. I mean, Google’s AI is busy with lots of other things at the same time.
Bard, put down the phone and focus for a minute.
Making generative AI systems guess 20 questions and even show their “thought process” illuminates the logic they use and where there might be shortfalls with their understanding of the world. We need to do this more to get a sense of the systems’ limits.