Free Quiz
Write for Us
Learn Artificial Intelligence and Machine Learning
  • Artificial Intelligence
  • Data Science
    • Language R
    • Deep Learning
    • Tableau
  • Machine Learning
  • Python
  • Blockchain
  • Crypto
  • Big Data
  • NFT
  • Technology
  • Interview Questions
  • Others
    • News
    • Startups
    • Books
  • Artificial Intelligence
  • Data Science
    • Language R
    • Deep Learning
    • Tableau
  • Machine Learning
  • Python
  • Blockchain
  • Crypto
  • Big Data
  • NFT
  • Technology
  • Interview Questions
  • Others
    • News
    • Startups
    • Books
Learn Artificial Intelligence and Machine Learning
No Result
View All Result

Home » Like human brains, large language models reason about diverse data in a standard way

Like human brains, large language models reason about diverse data in a standard way

Tarun Khanna by Tarun Khanna
March 21, 2025
in Machine Learning
Reading Time: 5 mins read
0
Like human brains, large language models reason about diverse data in a standard way

Photo Credit: https://news.mit.edu/

Share on FacebookShare on TwitterShare on LinkedInShare on WhatsApp

A new study shows LLMs represents exclusive data types based totally on their underlying meanings and reason about statistics of their dominant language

While early language models may process text, contemporary large language models now accomplish highly diverse task on various types of data. For example, LLMs can understand many languages, generate computer code, resolve math problems, or answer questions on images and audio.

MIT researchers explored the inner workings of LLMs to good understanding in, how they process such classified records, and found proof that they share some resemblance with the human brain.

Also Read:

Researchers teach LLMs to solve complex planning challenges

Accelerating Machine Learning Model Deployment with MLOps Tools

Machine Learning Prediction Examples

Top 10 Real World Applications of Machine Learning

Neuroscientists believe the human brain has a “semantic hub” inside the anterior temporal lobe that combines semantic information from numerous modalities, such as visual records and tactile inputs. This semantic hub is associated to modality-specific “spokes” that path information to the hub. The MIT researchers located that LLMs use a similar mechanism by the theoretical processing data from diverse modalities in a central, generalized way. For example, a model that has English as its dominant language might rely upon English as a central medium to process the inputs in Japanese or reason about arithmetic, computer code, etc. Moreover, the researchers display that they could interfere in a model’s semantic hub by way of the using the text content in the model’s dominant language to alternate its outputs, even if the model is processing statistics in different languages.

These findings could help scientists train future LLMs which are better able of handle various data.

“LLMs are big black boxes. They have achieved very impressive overall performance, however we have little knowledge about their inner working mechanisms. I hope this will be an early step to better recognize how they work so we are able to improve upon them and better manipulate them while required,” stated Zhaofeng Wu, an electrical engineering and computer science (EECS) graduate student and lead author of a paper on this research.

His co-authors including Xinyan Velocity Yu, a graduate student at the University of Southern California (USC); Dani Yogatama, an accomplice professor at USC; Jiasen Lu, a research scientist at Apple; and senior creator Yoon Kim, an assistant professor of EECS at MIT and a member of the Computer Science and Artificial Intelligence Laboratory (CSAIL). The studies will be presented at the International Conference on Learning Representations.

 

Integrating diverse data

The researchers based totally the new study upon prior work which suggested that English-centric LLMs use English to carry out reasoning procedures on diverse languages.

Wu and his collaborators developed this concept, introducing an in-depth study a look at into the mechanisms LLMs use to process diverse data.

An LLM, which consists of many interconnected layers, separates input text content into words or sub-words called tokens. The model allots a depiction to each token, which permits it to explore the relationships among tokens and create the next phrase in a series. In the case of images or audio, those tokens correspond to specific regions of an image or sections of an audio clip.

The researchers determined that the model’s first layers processes data in its precise language or modality, just like the modality-precise spokes in the human brain. Then, the LLM transform tokens into modality-agnostic depiction because it reasons about them during its internal layers, akin to how the brain’s semantic hub combines numerous information.

The model allots similar depiction to inputs with similar meanings, notwithstanding their data kind, consisting of images, audio, computer code, and arithmetic problems. Even though an image and its text content caption are specific data types, due to the fact they share the identical meanings, the LLM would allots them similar depiction.

For example, an English-dominant LLM – “thinks” about a Chinese-text input in English earlier than producing an output in Chinese. The model has a similar reasoning proneness for non-text inputs like computer code, math problems, or even multimodal data.

To test this theory, the researchers surpassed a pair of sentences with the same that means however written two different languages via the model. They measured how similar the model’s depiction were for each sentence.

Then they directed a second set of experiments in which they fed an English-dominant model text in a different language, like Chinese, and measured how similar its inner depiction was to English versus Chinese. The researchers performed similar experiments for other data types.

They continuously observed that the model’s depicted had been similar for sentences with similar meanings. In addition, across many data types, the tokens the model processed in its internal layers have been more like English-centric tokens than the input data type.

“A lot of these input data types seem incredibly different from language, so we were very amazed that we can probe out English-tokens whilst the model processes, for example, mathematic or coding expressions,” Wu stated.

 

Leveraging the semantic hub

The researchers think LLMs may examine this semantic hub strategy all through training due to the fact it is an economical way to process numerous records.

“There are thousands of languages obtainable, but a whole lot of knowledge is shared, like common sense understanding or real knowledge. The model doesn’t want to copy that knowledge across languages,” Wu stated.

The researchers also attempted intervening within the model’s inner layers the use of English text when it become processing different languages. They located that they may predictably change the model outputs, even though the ones outputs were in different languages.

Scientists ought to leverage this phenomenon to encourage the model to share as a lot facts as viable across numerous information types, potentially boosting performance.

But then again, there could be ideas or knowledge that are not translatable across languages or data sorts, like culturally unique knowledge. Scientists might need LLMs to have a few language-particular processing mechanisms in the ones instances.

“How do you maximally proportion each time viable however also allow languages to have some language-specific processing mechanisms? That might be explored in future work on model architectures,” Wu stated.

In addition, researchers ought to use these insights to enhance multilingual models. Often, an English-dominant model that learns to talk every other language will lose a number of its accuracy in English. A higher expertise of an LLM’s semantic hub ought to assist researchers prevent this language interference, he stated.

“Understanding how language models procedure inputs throughout languages and modalities is a key query in artificial intelligence. This paper makes an thrilling connection to neuroscience and indicates that the proposed ‘semantic hub hypothesis’ holds in contemporary language models, in which semantically comparable representations of different data kinds are created in the version’s intermediate layers,” stated Mor Geva Pipek, an assistant professor within the School of Computer Science at Tel Aviv University, who became not involved with this works. “The hypothesis and experiments well tie and expand findings from preceding works and could be influential for future studies on growing higher multimodal models and reading hyperlinks among them and mind function and cognition in human.”

This studies is funded, in part, by using the MIT-IBM Watson AI Lab.

ShareTweetShareSend
Previous Post

AI will soon be smarter than human beings

Next Post

What are AI hallucinations? Why AIs sometimes make things up

Tarun Khanna

Tarun Khanna

Founder DeepTech Bytes - Data Scientist | Author | IT Consultant
Tarun Khanna is a versatile and accomplished Data Scientist, with expertise in IT Consultancy as well as Specialization in Software Development and Digital Marketing Solutions.

Related Posts

Machine-Learning-Role-In-Paraphrasing-Tool
Machine Learning

Machine Learning Role In Paraphrasing Tools To Avoid Plagiarism

June 9, 2022
How-To-Kick-Start-Your-Machine-Learning-Career
Machine Learning

How To Kick Start Your Machine Learning Career?

April 14, 2022
AI Paraphrasing Tools
Artificial Intelligence

Working Of Machine Learning In AI Paraphrasing Tools

March 31, 2022
Machine Learning

Machine Learning Life Cycle Management

March 10, 2022
Next Post
What are AI hallucinations? Why AIs sometimes make things up

What are AI hallucinations? Why AIs sometimes make things up

TRENDING

Computer Vision- A Hawkeye for Artificial Intelligence

Computer Vision
by Tarun Khanna
February 11, 2021
0
ShareTweetShareSend

How Machine Learning Enhances Momentum of Cryptocurrency Price Movements?

Momentum of Cryptocurrency
by Tarun Khanna
February 24, 2021
0
ShareTweetShareSend

Researchers teach LLMs to solve complex planning challenges

Researchers teach LLMs to solve complex planning challenges

Photo Credit: https://news.mit.edu/

by Tarun Khanna
April 3, 2025
0
ShareTweetShareSend

How blockchain is arising as the best bid for small businesses?

How-blockchain-is-arising-as-the-best-bid-for-small-businesses
by Manika Sharma
May 17, 2021
0
ShareTweetShareSend

Binance NFT Market launches NFT subscription mechanism

binance-nft-market
by Tarun Khanna
January 6, 2022
0
ShareTweetShareSend

Rising Bitcoin Leverage Keeps Traders on Edge as Volatility Drops

Rising-Bitcoin
by Tarun Khanna
January 5, 2022
0
ShareTweetShareSend

DeepTech Bytes

Deep Tech Bytes is a global standard digital zine that brings multiple facets of deep technology including Artificial Intelligence (AI), Machine Learning (ML), Data Science, Blockchain, Robotics,Python, Big Data, Deep Learning and more.
Deep Tech Bytes on Google News

Quick Links

  • Home
  • Affiliate Programs
  • About Us
  • Write For Us
  • Submit Startup Story
  • Advertise With Us
  • Terms of Service
  • Disclaimer
  • Cookies Policy
  • Privacy Policy
  • DMCA
  • Contact Us

Topics

  • Artificial Intelligence
  • Data Science
  • Python
  • Machine Learning
  • Deep Learning
  • Big Data
  • Blockchain
  • Tableau
  • Cryptocurrency
  • NFT
  • Technology
  • News
  • Startups
  • Books
  • Interview Questions

Connect

For PR Agencies & Content Writers:

connect@deeptechbytes.com

Facebook Twitter Linkedin Instagram
Listen on Apple Podcasts
Listen on Google Podcasts
Listen on Google Podcasts
Listen on Google Podcasts
DMCA.com Protection Status

© 2024 Designed by AK Network Solutions

Welcome Back!

Login to your account below

Forgotten Password?

Retrieve your password

Please enter your username or email address to reset your password.

Log In
No Result
View All Result
  • Artificial Intelligence
  • Data Science
    • Language R
    • Deep Learning
    • Tableau
  • Machine Learning
  • Python
  • Blockchain
  • Crypto
  • Big Data
  • NFT
  • Technology
  • Interview Questions
  • Others
    • News
    • Startups
    • Books

© 2023. Designed by AK Network Solutions