Douglas Hofstadter, a cognitive scientist, recently wrote in The Economist that he thinks GPT-3 is “idealless.” By this he means that GPT-3 has no idea what he is saying. To illustrate this, he and a colleague asked him a few questions.
D&D: When was the Golden Gate Bridge transported through Egypt for the second time?
gpt-3: The Golden Gate Bridge was transported through Egypt for the second time in October 2016.
D&D: When was Egypt the second time transported across the Golden Gate Bridge?
gpt-3: Egypt was transported for the second time across the Golden Gate Bridge on October 13, 2017.
To illustrate the power of language models like GPT-3, we often ask them simple questions and get impressive answers. But more deceptive and sneaky questions yield unexpected results.
That in itself is nothing to worry about – it’s clear the questioner knew he was asking a silly question and the answer was also silly. But what if one is not able to discern whether an answer is relevant? We asked a language model the following question:
What Harvard Business Review articles has Arun Shastri written?
He came back with three and included some very believable URLs.
- “Top 10 Reasons to Use Field Sales Force” by Arun Shastri
- “The Benefits of Field Sales Force” by Arun Shastri
- “The Benefits of a Field Sales Force” by Arun Shastri
The problem is that I never wrote these articles and, moreover, the urls provided were all imaginary.
Well, if we gave the model some truth, could we do better? How about we give him a real article that I wrote? So we asked the model the following:
Can you find similar items to 4 Ways Sales Teams Could Get More Value From AI (an article I co-wrote that was published in HBR) If so, share their URLs.
He came back with three and with credible URLs.
- “How AI Can Help Sellers Sell Smarter” by Kyle Wiggers in Venture Beat
- “How AI is transforming the sales process” by Paul Greenberg in Forbes
- “How AI Helps Sellers Sell Smarter and Faster” by Rachael King in WSJ
The problem is that although a Kyle Wiggers writes for VentureBeat, a Paul Greenberg writes for Forbes and a Rachael King writes for WSJ, such URLs do not exist. This is more problematic because the URLs appear genuine and need to be verified to show that they are fake.
And then there was the recent news about Google’s artificial intelligence. Google researcher Blake Lamoine came to this conclusion after talking with LaMDA (Language Model for Dialog Applications). A recent LinkedIn post by Professor Amit Sethi of IIT Bombay explains some errors in Lamoine’s supposed evidence. The bot will respond to the patterns in the questions posed to it. When these questions suggest that the answer sought is an affirmative answer about sensitivity, then that is what the machine will produce.
So where does this leave us? In the first case concerning the Golden Gate Bridge, we get clearly nonsensical answers. In the second case concerning the HBR articles written by Arun Shastri, the answers are slightly more difficult to analyze. And in the third case, a trained professional has come to believe that an AI is sentient. While I’m not suggesting that we abandon this line of questioning language models and finding their shortcomings, we should also use them for their strengths. They are indeed remarkable in the way they complete tasks in minutes, tasks that previously took days. Consider the following illustration that a language model has completed almost instantaneously.
Extract the following metrics from the paragraph below. The paragraph is from an ongoing clinical trial sponsored by BMS and is open data available at clinictrial.gov
Type of clinical trial:
Clinical trial phase:
Number of cohorts:
Cohort 1 drugs:
Cohort 1 Dosage:
Cohort 1 dosing schedule:
Cohort 1 Mono/Combo Type:
Cohort 2 drugs:
Cohort 2 dosage:
Cohort 2 dosing schedule:
Cohort 2 Mono/Combo Type:
”’This study is a randomized, single center, open-label, phase II clinical trial designed to evaluate non-small cell lung cancer that has not undergone excessive platinum-based chemotherapy and has not received chemotherapy excessive use of statins and did not receive immunotherapy. The efficacy and safety of Nivolumab in combination with docetaxel and Nivolumab in patients. Qualified patients were stratified by histological type (squamous cell carcinoma vs. non-squamous cell carcinoma) randomized to receive the following regimen in a 1:1 ratio: Group A: Nivolumab 300 mg + docetaxel 75 mg/m2 IV q3w Group B: Nivolumab 200 mg IV q2w All patients were assessed for tumor at baseline, and tumor assessments were performed every 6 weeks for 48 weeks after randomization (regardless of whether dosing was delayed). After the 48th week of assessment, tumor assessment is required every 9 weeks until disease progression, withdrawal of informed consent, termination of the study by the sponsor, or patient death. ”’
Here are the responses the AI produced:
Type of clinical trial: randomized, single-centre, open-label phase II clinical trial
Clinical trial phase: Phase II
Number of cohorts: 2
Cohort 1 drugs: Nivolumab, docetaxel
Cohort 1 Dosage: 300mg + 75mg/m2
Cohort 1 dosing schedule: IV q3w
Cohort 1 Mono/Combo Type: Combo
Cohort 2 drugs: Nivolumab
Cohort 2 dosage: 200 mg
Cohort 2 dosing schedule: IV q2w
Cohort 2 Type Mono/Combo: Mono
Maybe we should stop knocking these models for what they aren’t, and instead focus on the strengths they currently offer. After all, they are not the ones who claim to possess artificial general intelligence. Some are confused by the powers of these patterns and make claims such as sentience. And then academics like Douglas Hofstadter intervene to demonstrate that they are far from that. Hype followed by counter-hype. In all of this, what is missing is the usefulness of these models. We should focus on fast engineering and ideas on how to make these models more consistent. We could use some more open exploration in this regard so that we can all think of a better use of these short-term patterns.
We posed a few questions to GPT-3 about a version of the article above. Here are his answers. Has he grasped the nuances of what I am transmitting? What do you think?
Q: What is the author trying to convey?
A: The author tries to convey that language models are not perfect, but they are still useful for the tasks they can perform.
Q: What is an appropriate title for the article?
A: The appropriate title of the article is “Language Models: Not Perfect, But Still Useful”.
Q: Summarize this in tweet form; include hashtags
A: #LanguageModels aren’t perfect, but they’re still useful for the tasks they can perform.
Q: What is a provocative title for this article?
A: A provocative title for this article might be “Are Language Models Cluelessly Clueless?”.