Bruce Clark’s Post

View profile for Bruce Clark, graphic

Associate Professor of Marketing at D'Amore-McKim School of Business at Northeastern University

TLDR: it's good at problems it has seen the answers to.

View profile for Dr. Jeffrey Funk, graphic

Technology Consultant

Two researchers and the author challenged ChatGPT with puzzles, reminiscent of what my sometimes co-author Gary Smith does, and found that small changes in the puzzles prevent ChatGPT from solving them. One puzzle “is commonly known as the “Cheryl’s birthday puzzle”. Cheryl challenges her friends Albert and Bernard to guess her birthday, and for puzzle-reasons they know it’s one of 10 dates: May 15, 16 or 19; June 17 or 18; July 14 or 16; or August 14, 15 or 17.” Cheryl also tells Albert her birth month, and Bernard the day of the month. “Albert and Bernard think for a while. Then Albert announces, “I don’t know your birthday, and I know that Bernard doesn’t either.” Bernard replies, “In that case, I now know your birthday.” Albert responds, “Now I know your birthday too.” Initially, “the large language model got the answer right every time, fluently elaborating varied and accurate explanations of the logic of the problem.” Yet this “bravura performance fell apart when the researchers asked the computer a trivially modified version of the puzzle, changing the names of the characters and of the months.” Why? Because “the original problem and its answer are available online, so presumably the computer had learnt to rephrase this text in a sophisticated way, giving the appearance of a brilliant logician.” The author, Tim Harford, tried the same thing,” preserving the formal structure of the puzzle but changing the names to Juliet, Bill and Ted, and the months to January, February, March and April,” and he “got the same disastrous result. GPT-4 and the new GPT-4o both authoritatively worked through the structure of the argument but reached false conclusions at several steps, including the final one.” The author also tried variations of the Monty Hall puzzle, one that Gary Smith has done and I have posted. ChatGPT couldn’t handle the variations. “GPT-4’s response avoided the cognitive trap in this puzzle, clearly articulating the logic of every step. Then it fumbled at the finishing line, adding a nonsensical assumption and deriving the wrong answer as a result.” In some ways, the researchers “have merely found a twist on the familiar problem that large language models sometimes insert believable fiction into their answers. Instead of plausible errors of fact, here the computer served up plausible errors of logic.” The researchers argue that “a computer that is capable of seeming so right yet being so wrong is a risky tool to use. It’s as though we were relying on a spreadsheet for our analysis (hazardous enough already) and the spreadsheet would occasionally and sporadically forget how multiplication worked.” Will GPT-5 be able to handle this type of reasoning and logic and thus enable AI to handle the complex problems needed to justify its $5-$10 trillion in market capitalization? #technology #innovation #hype #artificialintelligence https://lnkd.in/gWWnDuYV

  • No alternative text description for this image
Mike Major

Development Professional | Dedicated to Helping Others | Spreadsheet Enthusiast | Fun Dad

1w

Very interesting. I’ve been involved in a very minor role in training a popular AI model, and I can tell you that I’m not particularly surprised. One of the trainings I’ve performed is helping AI to work its way through truth trees, as were it able to consistently and accurately do so, other logical processes would, in theory, be manageable and/or trainable. What I’ve found is that the ability of AI to complete truth trees entirely on its own is very limited. Truth trees, used in formal logic to determine the satisfiability of a set of propositions, require precise logical operations and systematic branching. The AI I’ve worked with is far too prone to errors… it’s simply not rigorous enough in its logical calculations. Essentially, as this post highlights, small changes unravel the “reasoning” process. It’s critical for AI to develop genuine logical capabilities, not only to justify their market valuation, but if they are to be integrated deeper into our decision-making processes, which seems to be a primary longterm goal.

Jonathan Hoffsuemmer

Marketing Exec | Author (Message Market Fit)

1w

That’s a bit unexpected.

See more comments

To view or add a comment, sign in

Explore topics