Is Google's Chatbot BARD Failing in Public Testing?

  • Thread starter kyphysics
  • Start date
  • Tags
    Google
In summary, Google's chatbot BARD, which is a rival to Microsoft and C3.ai's chatbot GPT, has been struggling in public testing. It has been giving incorrect answers to astronomy-related questions and has been getting a lot of math questions wrong, even when multiple-choice answers were provided. BARD also struggles with written language tests and often needs to be asked questions twice to understand them. However, it does perform better on reading tests and when asked language-based questions. However, it has also been shown to give nonsensical answers when asked about niche subjects. Additionally, BARD deliberately scrubs datasets of authorship information and personal information, making it unreliable and untrustworthy in its current state.
  • #1
kyphysics
676
436
After an inauspicious debut by Google last month (Feb. 8th), where BARD (Alphabet's rival chatbot to Microsoft/C3.ai's Chat GPT) gave an incorrect answer to an "astronomy-related" question, BARD again seems to flop with its abilities in public testing the past week:
https://fortune.com/2023/03/28/google-chatbot-bard-would-fail-sats-exam/

Fortune sourced practice SAT math questions from online learning resources and found that Bard got anywhere from 50% to 75% of them wrong—even when multiple-choice answers were provided.

Often Bard gave answers which were not even a multiple-choice option, though it sometimes got them correct when asked the same question again. . .

Bard’s first written language test with Fortune came back with around 30% correct answers, often needing to be asked the questions twice for the A.I. to understand.

Even when it was wrong, Bard’s tone is confident, frequently framing responses as: “The correct answer is”—which is a common feature of large language models.

The more Bard was asked language-based questions by Fortune—around 45 in total—the less frequently it struggled to understand or needed the question to be repeated.

On reading tests, Bard similarly performed better than it did in math—getting around half the answers correct on average.
 
Physics news on Phys.org
  • #2
No idea about math problems, but in my experience the best way to make AI hallucinate is to ask a question on some niche subject, that is discussed only in some obscure sources. As you probably don't know I am author of the first commercial Polish video game, Puszka Pandory (Pandora's Box), for ZX Spectrem. That was in 1986, so the sources are scarce, but they do exist. We were playing with ChatGPT last week and for fun asked about the game. Before we got bored ChatGPT listed at least four different authors, each time starting with "I am sorry, you are right I was wrong, the correct answer is XXXX". It never named me as the author :biggrin:

That was in Polish, I suppose if you will ask about details of something like FIDO net technology or BBS software it will give similarly nonsensical answers.
 
  • Like
Likes Jarvis323 and russ_watters
  • #3
Borek said:
No idea about math problems, but in my experience the best way to make AI hallucinate is to ask a question on some niche subject, that is discussed only in some obscure sources. As you probably don't know I am author of the first commercial Polish video game, Puszka Pandory (Pandora's Box), for ZX Spectrem. That was in 1986, so the sources are scarce, but they do exist. We were playing with ChatGPT last week and for fun asked about the game. Before we got bored ChatGPT listed at least four different authors, each time starting with "I am sorry, you are right I was wrong, the correct answer is XXXX". It never named me as the author :biggrin:

That was in Polish, I suppose if you will ask about details of something like FIDO net technology or BBS software it will give similarly nonsensical answers.

It may not be a good test to check authorship errors because they deliberately scrub datasets of authorship information and personal information, to an extent. They want to avoid legal issues pertaining to privacy, copyright/attribution, defamation, or whatever else.
 
  • #4
Jarvis323 said:
It may not be a good test to check authorship errors because they deliberately scrub datasets of authorship information and personal information, to an extent. They want to avoid legal issues pertaining to privacy, copyright/attribution, defamation, or whatever else.
It is perfectly good test to prove why GPT in its current state is unreliable and can't be trusted.
 

1. What is BARD?

BARD (Bidirectional Encoder Representations from Transformers) is a chatbot developed by Google that uses artificial intelligence and natural language processing to interact with users in a conversational manner.

2. What is the purpose of BARD?

The purpose of BARD is to provide a more human-like and efficient way for users to interact with Google's products and services. It is designed to understand and respond to natural language queries and tasks.

3. What is the current status of BARD's public testing?

As of now, BARD is still in the early stages of public testing and is not available to the general public. It is currently being tested by a select group of users to gather feedback and improve its performance.

4. What are some potential challenges that BARD may face during public testing?

Some potential challenges that BARD may face during public testing include understanding complex or ambiguous queries, providing accurate and relevant responses, and handling sensitive or offensive language from users.

5. How is Google addressing any issues or concerns with BARD's public testing?

Google is constantly monitoring and analyzing the feedback from BARD's public testing to identify and address any issues or concerns. They are also continuously improving the chatbot's algorithms and training it with more data to enhance its performance.

Similar threads

Replies
10
Views
1K
  • General Discussion
Replies
34
Views
3K
  • STEM Educators and Teaching
Replies
33
Views
3K
  • General Discussion
Replies
3
Views
2K
  • General Discussion
Replies
1
Views
1K
Replies
49
Views
2K
  • STEM Academic Advising
Replies
11
Views
1K
  • Sci-Fi Writing and World Building
Replies
31
Views
2K
Replies
1
Views
59
  • STEM Academic Advising
2
Replies
56
Views
6K
Back
Top