Free Quiz
Write for Us
Learn Artificial Intelligence and Machine Learning
  • Artificial Intelligence
  • Data Science
    • Language R
    • Deep Learning
    • Tableau
  • Machine Learning
  • Python
  • Blockchain
  • Crypto
  • Big Data
  • NFT
  • Technology
  • Interview Questions
  • Others
    • News
    • Startups
    • Books
  • Artificial Intelligence
  • Data Science
    • Language R
    • Deep Learning
    • Tableau
  • Machine Learning
  • Python
  • Blockchain
  • Crypto
  • Big Data
  • NFT
  • Technology
  • Interview Questions
  • Others
    • News
    • Startups
    • Books
Learn Artificial Intelligence and Machine Learning
No Result
View All Result

Home » Why it’s important to move beyond excessively aggregated machine-learning metrics

Why it’s important to move beyond excessively aggregated machine-learning metrics

Tarun Khanna by Tarun Khanna
January 27, 2026
in Machine Learning
Reading Time: 3 mins read
0
Why it’s important to move beyond excessively aggregated machine-learning metrics

Photo Credit: https://news.mit.edu/

Share on FacebookShare on TwitterShare on LinkedInShare on WhatsApp

New research identifies hidden evidence of mistaken correlations — and gives a technique to enhance accuracy.

MIT researchers have detected significant examples of machine-learning model failure when those models are carried out to data other than what they were trained on, increasing inquiries about the need to test each time a model is deployed in a new setting.

“We show that even while you train models on large amount of data, and select the best average model, in a latest setting this ‘best model’ will be the worst model for 6-75% of the brand new data,” says Marzyeh Ghassemi, an associate professor in MIT’s Department of Electrical Engineering and Computer Science (EECS), a member of the Institute for Medical Engineering and Science, and principal investigator at the Laboratory for Information and Decision Systems.

Also Read:

MIT researchers suggest a new model for legible, modular software

New prediction model could improve the reliability of fusion power plants

What does the future hold for generative AI?

3 Questions: The pros and cons of synthetic data in AI

In a paper that was demonstrated at the Neural Information Processing Systems (NeurIPS 2025) conference in December, the researchers point out that models trained to successfully diagnose illness in chest X-rays at one hospital, for example, may be considered effective in a different hospital, on average. The researchers’ overall performance evaluation, but, disclosed that some of the best-performing models at the first hospital had been the worst-acting on up to 75% of patients at the second hospital, despite the fact that when all patients are aggregated within the second hospital, high average performance hides this failure.

Their findings show that even though spurious correlations — a easy example of that is which is when a machine-learning system, not having “seen” many cows pictured on the beach, classifies a photo of a beach-going cow as an orca in reality because of its background — are idea to be mitigated via just improving model overall performance on observed data, they actually still occur and continue to be a risk to a model’s trustworthiness in latest settings. In many examples— along with areas examined by the researchers consisting as chest X-rays, cancers histopathology images, and hate speech detection — such spurious correlations are much difficult to detect.

In the case of a medical diagnosis model trained on chest X-rays, for example, the model may also have found out to correlate a specific and irrelevant marking on one hospital’s X-rays with a positive pathology. At another hospital wherein the marking is not used, that pathology could be ignored.

Previous research by Ghassemi’s group has proven that models can spuriously correlate such factors as age, gender, and race with medical findings. If, for example, a model has been trained on more older people’s chest X-rays that have pneumonia and hasn’t “seen” as many X-rays belonging to more youthful people, it’d predict that only older patients have pneumonia.

“We want models to learn how to look at the anatomical features of the patient and then make a decision based totally on that,” stated Olawale Salaudeen, an MIT postdoc and the lead author of the paper, “however simply something that’s in the data that’s correlated with a choice can be used by the model. And those correlations might not simply be sturdy with changes in the environment, making the model predictions undependable sources of decision-making.”

Spurious correlations make contributions to the risks of biased decision-making. In the NeurIPS conference paper, the researchers confirmed that, as an example, chest X-ray models that advanced ordinary analysis performance certainly accomplished worse on patient with pleural situations or enlarged cardiomediastinum, that means expansion of the heart or significant chest cavity.

Other authors of the paper consist PhD students Haoran Zhang and Kumail Alhamoud, EECS Assistant Professor Sara Beery, and Ghassemi.

While preceding work has particularly taken that models ordered best-to-worst by means of overall performance will keep that order while applied in latest settings, referred to as accuracy-on-the-line, the researchers were able to exhibit examples of when the best performing models in a one setting were the worst-performing in another.

Salaudeen devised an algorithm referred to as OODSelect to find examples where accuracy-on-the-line was broken. Basically, he trained lots of models using of in-distribution data, meaning the data were from the first setting, and calculated their accuracy. Then he applied the models to the data from the second setting. When those with the best accuracy on the first-setting data had been wrong when applied to a large percent of examples in the second setting, this recognized the issue subsets, or sub-populations. Salaudeen also emphasizes the dangers of aggregate statistics for evaluation, which can difficult to understand greater granular and consequential data about model performance.

In the course in their work, the researchers separated out the “most miscalculated examples” in order not to conflate spurious correlations within a dataset with situations that are simply hard to classify.

The NeurIPS paper releases the researchers’ code and a few diagnosed subsets for future work.

Once a hospital, or any organization employing machine-learning, identifies subsets on which a model is performing badly, that facts can be used to improve the model for its unique task and setting. The researchers suggest that future work adopt OODSelect which will highlight targets for evaluation and design approaches to enhancing performance extra continuously.

“We wish the released code and OODSelect subsets become a steppingstone,” the researchers write, “toward benchmarks and models that confront the adverse effects of spurious correlations.”

ShareTweetShareSend
Previous Post

Salesforce CEO Marc Benioff requires AI Regulation, Warns Models Have Become “Suicide Coaches”

Next Post

Smaller Than a Grain of Salt: Engineers Forms the World’s Tiniest Wireless Brain Implant

Tarun Khanna

Tarun Khanna

Founder DeepTech Bytes - Data Scientist | Author | IT Consultant
Tarun Khanna is a versatile and accomplished Data Scientist, with expertise in IT Consultancy as well as Specialization in Software Development and Digital Marketing Solutions.

Related Posts

AI takes flight: Venture to enhance machine learning in aircraft surface inspections
Artificial Intelligence

AI takes flight: Venture to enhance machine learning in aircraft surface inspections

September 2, 2025
New AI system could change how autonomous vehicles navigate without GPS
Machine Learning

New AI system could change how autonomous vehicles navigate without GPS

August 21, 2025
Toward a latest framework to accelerate large language model inference
Artificial Intelligence

Toward a latest framework to accelerate large language model inference

August 8, 2025
The unique, mathematical shortcuts language models use to anticipate dynamic situations
Machine Learning

The unique, mathematical shortcuts language models use to anticipate dynamic situations

July 24, 2025
Next Post
Smaller Than a Grain of Salt: Engineers Forms the World’s Tiniest Wireless Brain Implant

Smaller Than a Grain of Salt: Engineers Forms the World’s Tiniest Wireless Brain Implant

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

49 − = 41

TRENDING

Asia Market Open: Bitcoin Holds Ground, Stocks Rise as US Shutdown Deal Moves Forward

Asia Market Open: Bitcoin Holds Ground, Stocks Rise as US Shutdown Deal Moves Forward

Photo Credit: https://cryptonews.com/

by Tarun Khanna
November 11, 2025
0
ShareTweetShareSend

Best Free Datasets Resources To Help You In Your Data Science Projects

best-free-datasets
by Tarun Khanna
September 7, 2021
0
ShareTweetShareSend

Wearable Technology Are New Healthcare Revolution – Explore

Wearable Technology Are New Healthcare Revolution – Explore
by Suchita Gupta
January 24, 2023
0
ShareTweetShareSend

The Use of Machine Learning and Artificial Intelligence in the Tokyo Olympics 2021

tokyo-olympics-2021
by Tarun Khanna
August 7, 2021
0
ShareTweetShareSend

The Best Crypto Wallets in 2021

by Tarun Khanna
November 2, 2021
0
ShareTweetShareSend

Top Trends of Data Analytics and Artificial Intelligence and Data Science in 2021

data-analytics-trends
by Tarun Khanna
May 16, 2021
0
ShareTweetShareSend

DeepTech Bytes

Deep Tech Bytes is a global standard digital zine that brings multiple facets of deep technology including Artificial Intelligence (AI), Machine Learning (ML), Data Science, Blockchain, Robotics,Python, Big Data, Deep Learning and more.
Deep Tech Bytes on Google News

Quick Links

  • Home
  • Affiliate Programs
  • About Us
  • Write For Us
  • Submit Startup Story
  • Advertise With Us
  • Terms of Service
  • Disclaimer
  • Cookies Policy
  • Privacy Policy
  • DMCA
  • Contact Us

Topics

  • Artificial Intelligence
  • Data Science
  • Python
  • Machine Learning
  • Deep Learning
  • Big Data
  • Blockchain
  • Tableau
  • Cryptocurrency
  • NFT
  • Technology
  • News
  • Startups
  • Books
  • Interview Questions

Connect

For PR Agencies & Content Writers:

connect@deeptechbytes.com

Facebook Twitter Linkedin Instagram
Listen on Apple Podcasts
Listen on Google Podcasts
Listen on Google Podcasts
Listen on Google Podcasts
DMCA.com Protection Status

© 2024 Designed by AK Network Solutions

Welcome Back!

Login to your account below

Forgotten Password?

Retrieve your password

Please enter your username or email address to reset your password.

Log In
No Result
View All Result
  • Artificial Intelligence
  • Data Science
    • Language R
    • Deep Learning
    • Tableau
  • Machine Learning
  • Python
  • Blockchain
  • Crypto
  • Big Data
  • NFT
  • Technology
  • Interview Questions
  • Others
    • News
    • Startups
    • Books

© 2023. Designed by AK Network Solutions