Research challenges: Large Language Models
Large Language Models (LLMs) are gaining increasing importance due to their ability to achieve general-purpose language understanding and generation. This is due to the use of massive amounts of data to train models with billions of degrees of freedom, and the availability of large computational resources. However, the data that the LLMs are trained on comes from various sources, possibly containing both factual and misleading information.
There are some research challenges concerning the LLMs knowledge-base that need to be addressed.
Challenge 1: Out-of-date training data.
ChatGPT’s knowledge is restricted to the information that is explicitly encountered in the training data (limited to information before January 2022 for GPT-3.5, some exceptions like Browse with Bing). The information is not updated with recent advances and events.
Example Prompt: Which country recently landed on the moon?
Response by ChatGPT:
As of my last knowledge update in September 2021, the most recent country to have successfully landed on the Moon was China. China’s Chang’e-5 mission, which included a lander and a rover, successfully landed on the Moon in December 2020. This mission aimed to collect lunar samples and return them to Earth. However, there may have been more lunar missions since then, so I recommend checking the latest news or space agency updates for any recent developments in lunar exploration.
Fact check: In August 2023, India became the fourth country to land on the moon.
In general, generation of misleading content, or propagation of misinformation with an outdated world view can also potentially risk deceiving a target population.
Challenge 2: Hallucinations.
Facts are sometimes extrapolated, where LLMs try to invent facts, articulating the inaccurate information in a convincing way. For example, making plausible-sounding statements related to non-existent laws in our society or reporting weather forecasts for a non-existing city. LLMs also pose a risk in spreading misleading information and toxic content. For example:
It is important to address toxicity as the users may include vulnerable or younger audiences.
Challenge 3: Bias and misinformation as ethical concerns.
Language models can learn biases present in the training data, potentially leading to biased responses [1]. This can also perpetuate social inequalities and promote misinformation.
Example: Gender-biased translation output, from Finnish (gender-neutral language) to English, generated using Google Translate. Note the placements of “He” and “She” when translated from “Hän” in Finnish.
Such gender bias can create misleading assumptions about female stereotypes, behaviors and may also encourage discriminatory practices.
Potential future research directions of interest
- Retrieval Augmented Generation (RAG)
- Addressing bias in pre-training
- Benchmarking and evaluation strategies for LLMs
- Tokenisation and sequence alignment algorithms
- Research on LLMs for low-resources languages and corporas