Free Quiz
Write for Us
Learn Artificial Intelligence and Machine Learning
  • Artificial Intelligence
  • Data Science
    • Language R
    • Deep Learning
    • Tableau
  • Machine Learning
  • Python
  • Blockchain
  • Crypto
  • Big Data
  • NFT
  • Technology
  • Interview Questions
  • Others
    • News
    • Startups
    • Books
  • Artificial Intelligence
  • Data Science
    • Language R
    • Deep Learning
    • Tableau
  • Machine Learning
  • Python
  • Blockchain
  • Crypto
  • Big Data
  • NFT
  • Technology
  • Interview Questions
  • Others
    • News
    • Startups
    • Books
Learn Artificial Intelligence and Machine Learning
No Result
View All Result

Home » Unpacking the bias of large language models

Unpacking the bias of large language models

Tarun Khanna by Tarun Khanna
June 19, 2025
in Machine Learning
Reading Time: 5 mins read
0
Unpacking the bias of large language models

Photo Credit: https://news.mit.edu/

Share on FacebookShare on TwitterShare on LinkedInShare on WhatsApp

In a new research, researchers find out the root cause of a sort of bias in LLMs, paving the way for more correct and dependable AI systems.

Research has proven that large language models (LLMs) tend to exaggerate information at the starting and end of a document or conversation, at the same time as neglecting the middle.

This “position bias” means that, if a lawyer is using an LLM-powered virtual assistant to retrieve a certain phrase in a 30-page affidavit, the LLM is much more probably to locate the right text if it’s at the preliminary or very last pages.

Also Read:

New study explores role of generative AI in the use of copyrighted material

How machine learning can spark many discoveries in science and medicine

“Periodic table of machine studying” could fuel AI discovery

Making AI-generated code more correct in any language

MIT researchers have determined the mechanism at the back of this phenomenon.

They generated a theoretical framework to research how information flows through the machine-learning architecture that forms the spine of LLMs. They determined that certain layout choices which control how the models processes input data can cause position bias.

Their experiments disclosed that model architectures, specially those affecting how information is spread throughout input words in the model, can give upward push to or enhance position bias, and that training data also contribute to the trouble.

In addition to identifying the origins of position bias, their framework may be used to diagnose and accurate it in future model designs.

This could cause more dependable chatbots that stay on topic throughout long conversations, medical AI structures that reason more relatively while dealing with a trove of patient records, and code assistants that pay closer attention to all components of a program.

“These models are black boxes, so as an LLM user, you likely don’t know that role bias can cause your model to be inconsistent. You just feed it your documents in whatever order you need and anticipate it to work. But by understanding the basic mechanism of these black-box models better, we will enhance them through addressing those obstacles,” stated Xinyi Wu, a graduate student in the MIT Institute for Data, Systems, and Society (IDSS) and the Laboratory for Information and Decision Systems (LIDS), and primary author of a paper on this research.

Her co-authors include Yifei Wang, an MIT postdoc; and senior authors Stefanie Jegelka, an accomplice professor of electrical engineering and computer science (EECS) and a member of IDSS and the Computer Science and Artificial Intelligence Laboratory (CSAIL); and Ali Jadbabaie, professor and head of the Department of Civil and Environmental Engineering, a center college member of IDSS, and a principal investigator in LIDS. The research might be presented at the International Conference on Machine Learning.

Analyzing attention

LLMs like Claude, Llama, and GPT-4 are powered via a type of neural network structure known as a transformer.
Transformers are designed to process sequential data, encoding a sentence into chunks referred to as tokens and then learning the relationships between tokens to anticipate what words comes next.

These models have gotten very good at this because of the attention mechanism, which uses interconnected layers of data processing nodes to make sense of context by permitting tokens to selectively focus on, or attend to, associated tokens.

But if every token can attend to every other token in a 30 page document, that quickly turns into computationally intractable.

So, when engineers constructs transformer models, they often employ attention masking methods which restrict the words a token can attend to.

For instance, a causal mask only allows words to attend of those who got here earlier than it.

Engineers also use positional encodings to support the model understand the location of each word in a sentence, enhancing performance.

The MIT researchers built a graph-primarily based theoretical framework to find how those modeling alternatives, attention masks and positional encodings, ought to affect position bias.

“Everything is coupled and tangled within the attention mechanism, so it’s very difficult to examine. Graphs are a flexible language to demonstrate the dependent relationship among words within the attention mechanism and trace them throughout multiple layers,” Wu says.

Their theoretical analysis suggestive that causal masking offers the model an inherent bias toward the starting of an input, even when that bias doesn’t exist within the data.

If the earlier words are tremendously insignificant for a sentence’s meaning, causal masking can cause the transformer to pay more attention to its beginning besides.

“While it’s often true that in advance words and later words in a sentence are more essential, if an LLM is used on a task that is not natural language generation, like ranking or data retrieval, these biases can be extremely harmful,” Wu says.

As a model develops, with extra layers of attention mechanism, this bias is developed because earlier parts of the input are used more often in the model’s reasoning process.

They also found that using positional encodings to link words more strongly to close by words can mitigate position bias. The approach refocuses the model’s interest in the right place, but its impact may be diluted in models with more attention layers.

And these layout choices are simplest one purpose of position bias — some can come from training data the model makes use of to learn how to prioritize words in a sequence.

“If you recognize your data are biased in a certain way, then you ought to additionally finetune your model on top of adjusting your modeling choices,” Wu says.

Lost in the middle

After they’d established a theoretical framework, the researchers carried out experiments in which they systematically various the location of the appropriate solution in text sequences for an information retrieval task.

The experiments confirmed a “lost-in-the-middle” phenomenon, wherein retrieval accuracy followed a U-shaped pattern. Models performed great if the right answer was located at the starting of the sequence. Performance declined the nearer it were given to the middle before rebounding a bit if the correct solution was close to the end.

Ultimately, their work suggests that using a distinctive masking approach, getting rid of extra layers from the attention mechanism, or strategically employing positional encodings may want to reduce position bias and enhance a model’s accuracy.

“By doing a combination of theory and experiments, we were a cable to look at the effects of model layout picks that weren’t clear at the time. If you need to use a model in high-stakes applications, you must know when it’s will work, while it won’t, and why,” Jadbabaie says.

In the future, the researchers want to future explore the consequences of positional encodings and observe how role bias could be strategically exploited in certain applications.

“These researchers offer an extraordinary theoretical lens into the attention mechanism at the heart of the transformer model.

They offer a compelling analysis that clarifies longstanding quirks in transformer behavior, showing that attention mechanisms, specially with causal masks, inherently bias models toward the starting of sequences. The paper achieves the best of both worlds — mathematical readability paired with insights that attain into the center of real-world structures,” says Amin Saberi, professor and director of the Stanford University Center for Computational Market Design, who was not concerned with this work.

This research is supported, in part, by the U.S. Office of Naval Research, the National Science Foundation, and an Alexander von Humboldt Professorship.

ShareTweetShareSend
Previous Post

Will AI take your job? The solution could hinge on the four S’s of the technology’s benefits over humans

Next Post

Politicians’ memecoins, dropped court cases fuel crypto ‘crime supercycle’

Tarun Khanna

Tarun Khanna

Founder DeepTech Bytes - Data Scientist | Author | IT Consultant
Tarun Khanna is a versatile and accomplished Data Scientist, with expertise in IT Consultancy as well as Specialization in Software Development and Digital Marketing Solutions.

Related Posts

The Rise of AI: Leading Computer Scientists anticipate a Star Trek-Like Future
Machine Learning

The Rise of AI: Leading Computer Scientists anticipate a Star Trek-Like Future

April 15, 2025
Researchers teach LLMs to solve complex planning challenges
Machine Learning

Researchers teach LLMs to solve complex planning challenges

April 3, 2025
Like human brains, large language models reason about diverse data in a standard way
Machine Learning

Like human brains, large language models reason about diverse data in a standard way

March 21, 2025
machine learning
Technology

Accelerating Machine Learning Model Deployment with MLOps Tools

November 16, 2024
Next Post
Politicians’ memecoins, dropped court cases fuel crypto ‘crime supercycle’

Politicians’ memecoins, dropped court cases fuel crypto ‘crime supercycle’

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

36 − 34 =

TRENDING

How Cybersecurity Is Employing The Help Artificial Intelligence And Machine Learning Now And In Upcoming Years.

cyber security artificial intelligence machine learning
by Tarun Khanna
April 7, 2021
0
ShareTweetShareSend

Binance NFT Market launches NFT subscription mechanism

binance-nft-market
by Tarun Khanna
January 6, 2022
0
ShareTweetShareSend

Top Companies that use Big Data

bigdata
by Sarah Gomes
January 30, 2021
0
ShareTweetShareSend

Most Disruptive AI Startups in India 2020

by Tarun Khanna
December 14, 2020
0
ShareTweetShareSend

Stochastic Optimization Algorithms:- A Gentle Introduction

Stochastic-optimisation
by Manika Sharma
February 16, 2021
0
ShareTweetShareSend

Top 10 Esteemed Big Data Analytics Companies

data-analytics
by Tarun Khanna
May 14, 2021
0
ShareTweetShareSend

DeepTech Bytes

Deep Tech Bytes is a global standard digital zine that brings multiple facets of deep technology including Artificial Intelligence (AI), Machine Learning (ML), Data Science, Blockchain, Robotics,Python, Big Data, Deep Learning and more.
Deep Tech Bytes on Google News

Quick Links

  • Home
  • Affiliate Programs
  • About Us
  • Write For Us
  • Submit Startup Story
  • Advertise With Us
  • Terms of Service
  • Disclaimer
  • Cookies Policy
  • Privacy Policy
  • DMCA
  • Contact Us

Topics

  • Artificial Intelligence
  • Data Science
  • Python
  • Machine Learning
  • Deep Learning
  • Big Data
  • Blockchain
  • Tableau
  • Cryptocurrency
  • NFT
  • Technology
  • News
  • Startups
  • Books
  • Interview Questions

Connect

For PR Agencies & Content Writers:

connect@deeptechbytes.com

Facebook Twitter Linkedin Instagram
Listen on Apple Podcasts
Listen on Google Podcasts
Listen on Google Podcasts
Listen on Google Podcasts
DMCA.com Protection Status

© 2024 Designed by AK Network Solutions

Welcome Back!

Login to your account below

Forgotten Password?

Retrieve your password

Please enter your username or email address to reset your password.

Log In
No Result
View All Result
  • Artificial Intelligence
  • Data Science
    • Language R
    • Deep Learning
    • Tableau
  • Machine Learning
  • Python
  • Blockchain
  • Crypto
  • Big Data
  • NFT
  • Technology
  • Interview Questions
  • Others
    • News
    • Startups
    • Books

© 2023. Designed by AK Network Solutions