Beyond Predictions: Causal Intelligence in Machine Learning22 min read


Executive Summary

Prediction is the primary end goal of present-day machine learning. However, the rapid development of machine learning and the rush to take advantage of it has seduced many organizations into a fundamental error: confusing correlation with causation. Causal intelligence is necessary to take machine learning beyond simple curve-fitting prediction and toward dynamic, explainable, and actionable prescriptions.

Table of Contents


The availability of huge datasets combined with dramatic advances in computing power have created a myriad of opportunities for understanding and applying data in new ways. A driving force in this process is machine learning, a subset of artificial intelligence (AI), which deploys complex algorithms that can find undiscovered patterns in massive and diverse datasets while automatically learning and improving in the process. 

Prediction is the primary end goal of present-day machine learning. In almost every industry, including finance, medicine, and defense, initiatives are moving full-speed-ahead to harness machine learning that draws on historical data to estimate outcomes and guide mission-critical decision-making. 

Although AI can unlock predictive insights from untapped data, the rapid development of machine learning and the rush to take advantage of it has seduced many organizations into a fundamental error: confusing correlation with causation. 

Understanding and processing cause-and-effect is one of the human brain’s most valuable traits, but to date, this is largely missing from machine learning solutions. AI’s ability to recognize patterns is astounding, but for all its capabilities, it cannot determine whether the rooster’s crow causes the sun to rise or vice versa. In other words, it’s an expert at finding correlation but inept at explaining causation. 

As long as machine learning is premised on “curve fitting” — predicting outcomes based on past associations — without accounting for causality, its prescriptive value will be inherently limited. Correlation-based models, even with refined algorithms, remain subject to perils that can make them misleading and even harmful. 

In contrast, incorporating causal intelligence into machine learning can move beyond simple curve fitting, providing deeper insight into the processes being analyzed and a more meaningful approach to assessing actions and improving outcomes. This capability is vital in the context of increasingly complex political, social, and business challenges that are subject to rapid change and involve a multitude of variables and potential causal relationships. 

Causal intelligence is necessary to take machine learning beyond simple curve-fitting prediction and toward dynamic, explainable, and actionable prescriptions.  


Perils of Correlation-Based Models

One of the most impressive abilities of machine learning is revealing connections in data that would otherwise go unnoticed. With state-of-the-art pattern-matching algorithms, “finding a needle in a haystack” goes from an impossible task to child’s play. 

Unfortunately, the novelty of finding these associations and the desire to quickly capitalize on data-driven models can make it easy to gloss over harder questions about what the associations really mean. Predictions based on correlations, no matter how advanced, are subject to various forms of bias. 

The maxim that correlation doesn’t equal causation is well-known, and for examples like the sun and the rooster, the point is obvious. But in more complicated contexts, especially those imbued with mountains of data, the distinction may be harder to recognize. 


Types of Data in Correlation-Based Models

To search for associations, a correlation-based model needs established datasets to work with, but not all data is created equal. Of course, unverified or poorly collected information is a hindrance to any rigorous analysis, but even among well-vetted datasets, it’s important to know the type of data involved and to distinguish between observational and experimental data. 

Observational data comes from situations in which information is recorded without a specific intervention being applied to the group or subject being studied. In contrast, data that comes from measuring an outcome after an intervention, usually with a control group, is known as experimental data. 

To highlight the difference, consider an example of a regional bank analyzing data about how to reduce late payments on home mortgages. 

  • Observational data could include customer transaction histories, credit scores, real estate trends, home appraisals, financial profiles, tax records, macroeconomic indicators, and other similar resources. 
  • Experimental data can be harder to come by but may be available if the bank had tried to affect customer behavior in the past, for example, by developing an online budgeting tool for a subset of its clients. In that case, customers who were offered the budgeting tool would be the study group, those who weren’t offered it would be the control group, and comparing outcomes between them would provide a source of experimental data. 


This example demonstrates one of the main benefits of observational data: in relation to experimental data, it is much more readily available. Though experimental data is frequently more illustrative and robust, obtaining it may not be practical for several reasons: 

  1. Time: Carrying out an experiment, especially one intended to gauge outcomes over the medium- to long-term, may be impossible given an organization’s time constraints.
  2. Cost: Designing and conducting experimental studies can be expensive, a problem that can be particularly acute for complicated research questions and in the medical field.
  3. Ethics: For certain types of studies, such as those testing the toxicity of a chemical compound, it is ethically unacceptable to experiment on people by exposing some of them to a potentially harmful agent.


Even when an experimental study is practical, it may not be desirable. In the mortgage payments example, although the bank may derive experimental information by only offering the budgeting tool to a certain set of clients, the bank’s leaders may be uncomfortable excluding customers from the program and limiting the reach of the tool from the get-go.  

Not surprisingly, observational data is typically more available and broader in scope. For this reason, the majority of machine learning solutions depend on datasets composed predominantly of observational data, but this dependence can be a key contributor to the shortcomings and flawed predictions of correlation-based models. 


What is a Correlation-Based Model? 

A correlation-based model draws on existing sources of observational data to make predictions about future outcomes. It identifies historical patterns and models them into forward-looking projections. 

Image: Zach Weinersmith

Various types of statistical analysis and modeling, such as regression analysis, can be used in correlation-based approaches, but they don’t have to involve advanced computing or AI. Diehard sports fans who notice that their team wins whenever they wear a certain pair of socks are employing a correlation-based model, albeit an exceedingly crude one, when they keep wearing those socks all season long without ever washing them. 

Image: Keith Allison

While the fans’ information comes only from their limited personal observations, big data platforms can take advantage of enormous databases. By linking diverse sources of information — financial and health records, census data, and public documents just scrape the surface of what could be included — machine learning can find associations that otherwise likely never would have been identified. 

A correlation-based model takes a curve-fitting approach, applying patterns in prior data to predict future conditions. So if an association is found in which the local sports team’s success has historically been preceded by a dip in laundromat revenue, a correlation-based model might predict an impending victory whenever laundromats report declining earnings.  

Of course, causally speaking, laundromat revenue has nothing to do with producing a victory, but these are the sort of associations that standard, correlation-based models identify. Using these types of models to inform actions to improve the chances of winning more games would be misguided and dangerous.


What Problems Affect Correlation-Based Models? 

Uncovering correlations can be meaningful, especially if it inspires additional research to explain why the correlation exists. Where correlation-based models encounter problems, though, is in assigning causation when only an association has been found. 

Statisticians and epidemiologists have worked meticulously to identify the specific ways in which correlations, especially those derived from observational data, can foster inaccuracies. Dozens of types of bias have been identified, and two of the most pervasive are confounding and selection bias.  



Public health officials looking to reduce cases of accidental death may be intrigued by an unexpected association: during the summer months, a surge in ice cream sales is accompanied by a similar rise in rates of drowning. While some might propose banning ice cream in the name of public safety, a bit of reflection reveals that a third factor — rising temperatures — likely explains the increase in both ice cream consumption and drowning. 

In this example, the summer heat is a confounder. It exerts an influence on ice cream consumption (people wanting a cool treat) and drownings (more people spending time at the beach or around swimming pools). Because the confounder affects both of the variables in question, it makes it problematic to assert a cause-and-effect relationship between ice cream and the incidence of drowning. 

While some correlations subject to confounding, like the link between ice cream and drowning, induce immediate skepticism, others may be less obvious. In scrutinizing its customer data, a bank may find that business owners that visit their local branch at least once a month are more likely to make loan payments on time. A tempting conclusion is that maintaining a face-to-face relationship with clients promotes timely payments, but potential confounders mean that the data alone cannot prove this to be true. 

In this case, engaged and highly attentive business owners could be a confounder. Their attention-to-detail may mean that they are more likely to be current on payments and to routinely visit the bank just to ensure that everything is in order. Since their personal traits can independently affect both variables, it blunts the ability to establish a cause-and-effect relationship. It’s possible that these business owners were more likely to pay on time because of their in-person bank visits, but there’s no way to know that for sure from the observational data. 

Though confounding emerges more frequently in analyses of observational studies, it is important to note that it can occur in experimental studies as well. Careful study design to eliminate differences between the experimental and control groups reduces the likelihood of confounding, but unknown or unaccounted-for variables can still influence outcomes in a controlled trial. 


Selection Bias

Selection bias occurs when the data being evaluated comes from a population that is not representative of a broader group to whom the study results will be applied. 

One of the most famous illustrations of managing selection bias comes from research by Abraham Wald, who advised the military on how to improve aircraft survivability during World War II, the Korean War, and the Vietnam War. Reinforcing an entire plane would have been too costly and heavy, so Wald set out to determine the most strategic areas in which to strengthen a plane’s defenses. 

Wald examined aircraft that had returned from missions to see where they were taking the most incoming fire. His recommendation? Reinforce all of the areas that showed the least amount of damage. If this suggestion seems counter-intuitive, it’s because of selection bias, which Wald was wise to avoid. He realized that he was studying planes that survived, but his real concern was with planes that didn’t make it back. If planes could sustain the damage he saw and still return to base, it meant that damage to other areas was a greater threat to the planes and their crew. 

Image: McGeddon / CC BY-SA

In a business setting, selection bias can occur when trying to extrapolate successes from one setting to another. For instance, consider a fast-food chain looking for a strategy to boost revenue. After analyzing the data, they find that the performance of their top-selling store is driven by sales of family combo meals. As the marketing team prepares a campaign for family combo meals in all their locations, a skilled data analyst points out the selection bias: the top-selling store is in a residential area with a high concentration of households with two or more children. Without considering the differences in each store’s surrounding community, the chain would be at risk of investing in a marketing strategy founded on an unrepresentative analysis. 

Image: Steak ‘n Shake

Selection bias can also go hand-in-hand with confounding. If researchers want to find out if taking a certain supplement is associated with weight loss, they might do a regression analysis on observational data from health surveys and medical records. But the buyers of supplements may not be a representative sample: they may be more health-conscious and inclined to exercise and eat healthy, which can confound any weight loss patterns detected in the data. 

Like confounding, selection bias is not limited to observational studies, but in experimental research there is normally a heightened emphasis on carefully defining the eligible study participants in order to root out selection bias from the outset. 


Reducing Bias in Correlation-Based Models

An array of statistical methods have been employed to try to counteract bias in observational studies and correlation-based models. Fine-tuned calculations attempt to filter out confounding or account for selection bias, and error bars can represent the potential for variation within correlation-based projections. While these contributions to data analysis should be acknowledged, it’s necessary to recognize their limitations. Bias can frequently arise even in well-curated approaches, and in practice many organizations deploy regression analyses without careful review of the methods that produced them or clear protocols for verifying their predictions. 


Misconceptions of “Predictive” Power in Machine Learning

The limitations of a correlation-based method may seem obvious from the examples above, and those shortcomings are certainly not breaking news in fields like biostatistics and epidemiology. Why, then, is it necessary to sound the alarm about them when it comes to AI and machine learning? 

Because correlation-based models powered by machine learning are prone to flawed recommendations about optimal next steps. There is no doubt that machine learning enables data analysis at tremendous scale and speed. But off-the-shelf AI solutions, especially those marketed to enterprise, can, under the banner of prediction, provide the veneer of causality even when the data only supports a correlation. 

Numerous applications of AI and machine learning are high-powered tools to dig deeper than ever into observational data. However, this doesn’t eliminate serious challenges like confounding and selection bias. In fact, the allure of advanced technology can obscure that most AI can’t distinguish between experimental and observational data and has no capacity for assigning causality. As a consequence, using standard AI and ML to guide decision-making can be dangerous. 

Source: Medium

One outcome of association-driven AI is increased identification of spurious correlations, which are variables that are shown to be associated but have no true cause-and-effect relationship. Instead, the correlation may be totally coincidental or the result of confounding. 

In business, medicine, and public policy, failure to scrutinize spurious correlations because they derive from machine learning algorithms can result in all types of strategic errors and flawed decision-making with serious economic and health consequences. 

Even when associations are not spurious, correlation-based models suffer from the inability to comprehend the underlying relationships. A correlation-based model can identify a strong association between sold-out sporting events and high ticket prices. Based on that association, the model could predict that what a losing team with low attendance really needs to boost its ticket sales is to raise prices. Because it doesn’t account for causation, the correlation-based model can’t recognize that ticket demand stimulates higher prices and not the other way around. 

Another misconception of correlation-based models is that they can reinforce societal inequalities, an issue highlighted in discussions of algorithmic fairness. To cite one example, Amazon attempted to deploy machine learning for recruiting to identify top candidates, but, based on the hiring disparities between men and women in the technology industry that were present in the historical datasets, the algorithm started predicting that male candidates would be better hires. It discriminated against applications that mentioned all-women’s colleges or women’s organizations until Amazon edited the algorithm and then ultimately disbanded the project altogether. 

Image: Machine Learning Techub


Causal Inference: Why Causality Matters

Causal reasoning is one of the most powerful components of human intelligence. In everyday life, decisions big and small are predicated on a fundamental awareness of cause and effect. 

The ability to see how an action influences an outcome serves as a building block for how we understand and act in the world, but at the same time, this form of thinking is so ingrained that it’s common to overlook basic questions about what causality means and why it matters. 

The Foundation of Causal Inference 

Cause and effect appears to be a seemingly straightforward concept, but in the past 15 years, researchers have attempted to apply standards of mathematics and scientific reasoning to drill down to what causality entails. 

A leading thinker in this field is Judea Pearl, a professor in the Cognitive Sciences Department at UCLA. While Pearl has won notoriety for a system of formulas and graphs to demonstrate causality, an advanced degree isn’t necessary to understand his central ideas about causal inference, which is the process by which we establish notions of causality. 

For Pearl, causal inference can be understood as moving progressively along a hierarchy or “Ladder of Causation.” 

  1. At the lowest level is association, which is represented by observation and correlation. 
  2. The second layer is intervention or doing. An experimental study with a designed action and measured outcome is a form of intervention. A/B testing in computer science or randomized controlled trials (RCTs) in medicine are considered the gold standard of intervention. In this research, both the control and experimental groups are determined at random, and specific outcomes to be measured are established in advance. Blinding, in which study organizers, participants, or both, do not know who is in which group, can further reduce the potential for bias in controlled intervention studies.  
  3. The third and top layer is counterfactual reasoning, which means thinking retroactively and imaginatively, such as reflecting on how things could have been different. 

 Relevant QuestionsExamples

3. Counterfactuals

(imagining, reflecting)


Was it X that caused Y? 

What if X had not happened? 

Was it exercise that caused me to lose weight? 

What if we hadn’t raised prices?

2. Intervention


What if I do X? 

How can I cause something to happen? 

Will exercising lead to weight loss? 

What happens if we raise prices? 

1. Association

(seeing, observing)

What do we observe?
How are two or more variables related? 

Every time a bell is rung, food is provided (Pavlov’s dogs)

What does health survey data show about exercise and weight loss? 

*Source: Adapted from The Seven Tools of Causal Inference with Reflections on Machine Learning, AI Can’t Reason Why, and The Book of Why 

In this hierarchy, associations and correlations aren’t stripped of importance; instead, their meaning is contextualized. Correlations are understood as statistical observations capable of providing useful information but incapable of demonstrating causality without more advanced reasoning based on intervention or counterfactuals. Understanding the distinction between the questions asked at each level provides a framework for causal inference and a check against conflating correlation with causation. 


Applying Causal Inference

The ladder of causation provides a framework for causal inference. In the case of A/B testing or other controlled interventions, deliberate study design facilitates straightforward causal determinations by directly comparing outcomes in the experimental and control groups. When these trials aren’t possible, causality must be inferred using more complex and nuanced analysis. 

In the 1950s, debates raged over whether smoking cigarettes causes cancer. Medical ethics wouldn’t permit a randomized trial deliberately exposing people to tobacco smoke, so causal inference was critical. In a preponderance of observational data, the relationship between smoking and cancer showed characteristics, such as being strong, consistent, and biologically plausible, that combined to resoundingly demonstrate causality.

The tools of causal inference have become increasingly sophisticated since tobacco smoke was identified as a carcinogen. Researchers have refined and standardized the understanding of causal relationships and the processes for studying them. 

Emboldened by that work, contemporary causal inference draws on technical charts that outline the variables affecting an outcome. Relationships between variables can be formulated mathematically and tested with powerful statistical techniques. This process can incorporate layers of domain knowledge to identify mechanisms that explain causation, account for confounding and selection bias, cross-validate conclusions, and ensure more robust causal findings that can inform effective interventions.


The Power of Causal Intelligence

In all types of decision-making, causal intelligence pays enormous dividends. It allows for better awareness of the consequences of any action and, when it answers the “why?” question, empowers the complex use of reason to optimize actions in virtually any aspect of personal and professional life. 

In machine learning, significant advances can be unlocked by incorporating causal intelligence to enable a “mini-revolution in AI.” Such a step would allow machine learning to move beyond its present-day focus on finding correlations. Rather than simply using old data to project future outcomes, machine learning that uses causal inference through innovative techniques and encoded domain knowledge produces trustworthy, accurate, and actionable information that can facilitate effective decision-making.  


Actionable Prescription 

Causal intelligence generates actionable, forward-looking information. A simple correlation-based machine learning model may be able to identify patients who are at a high risk of cardiovascular problems like heart failure or stroke, but it has no capability to help determine the therapy that could best address that risk. Because the model focuses solely on finding patterns and correlations, it fails to provide useful insight about the next steps for intervention and its possible outcomes. 

Causal inference resolves this shortcoming by offering a method for understanding the consequences of an action even if experimental data is not available. In epidemiology, a “target trial” attempts to use observational data to emulate a randomized study and provide resources for doctors to make informed recommendations to their patients even in the absence of a past clinical trial. 

The same target trial concept can be applied in business as well. For online marketing teams, it’s normal to deploy multiple advertisements simultaneously, and a common challenge is knowing which of their ads actually drive sales. If different customers see a distinct sequence of ads, it is hard to determine which ad was the most effective. 

Methods of causal inference allow observational data about customer ad exposure to be studied in ways that simulate an experimental and control group. By diligently conducting this comparison, companies can use causal reasoning to identify ads that are most likely to spur customer conversions and provide usable information for their marketing strategies. 

Uniting causal inference with machine learning opens even greater possibilities in deploying data science for actionable prescriptions. In a proof-of-concept study, researchers set out to offer data-driven advice to people in developing countries looking to receive micro-loans through an online platform. Machine learning technology processed written project descriptions posted online to discover key factors driving funding and then applied causal intelligence to determine whether funding arrived more quickly for individual or group applicants. Their analysis showed a significant benefit for group projects, providing direct guidance to people whose livelihoods can depend on successful micro-loan applications. 

Image: Kiva

These examples simply scratch the surface of the potential for adding causal inference to machine learning. Causation-based machine learning models offer meaningful information that can drive more intelligent and effective decision-making.


Understanding Causality and Improving Predictive Models

While embracing causal intelligence will primarily help guide decision making, a subsequent benefit will be that with our improved understanding of how the underlying systems work, we can use this knowledge to build better machine learning predictive models. 

A fantastic case study of this is the epidemiological modeling for COVID-19. As outbreaks began in the U.S., a model developed by the Institute for Health Metrics and Evaluation (IHME) was widely used to project the evolution of the pandemic. IHME used existing data from other countries to build a statistical model to project hospital utilization and mortality in the U.S. 

Unfortunately, the original IHME projections proved to badly miss the mark. Built primarily on curve-fitting projections, the model did not account for the full scope of underlying causal factors, such as levels of close interpersonal contact, that drive viral transmission. Alternate models, including some that incorporate machine learning, consider the effects of interventions, such as social distancing and mask-wearing, and have proven to be more effective at projecting the trajectory of COVID-19. 

Similarly, the historic success of Nate Silver’s FiveThirtyEight election predictions comes from his organization’s very deep understanding of the underlying political and electoral processes that lead to certain election outcomes. Election forecasts that combine expert domain knowledge and modeling around poll data out-perform naive approaches that simply process polls numbers to generate probabilities. This sort of expert knowledge comes from really understanding the causal processes of a system. 

Standard predictive modeling is strengthened by understanding causal processes; it is more responsive to changing circumstances and dynamic variables, allowing it to generate more dependable predictions. This improves the system’s testability and legitimacy and ultimately bolsters confidence in the value of predictions. This sort of understanding of how underlying processes work can be built out by employing causal inference and thinking causally. By leveraging knowledge of causation alongside its computing power, machine learning can analyze complex data and events, including those that have an array of causal influences, to shed light on the most significant challenges facing governments, society, science, and business.



To date, the hallmark of machine learning has been developing algorithms that identify novel correlations from disparate and sizable datasets. Although this capability is technologically impressive and has useful practical applications, it faces an underlying limitation in its inability to understand causal relationships. 

Unfortunately, organizations deploying machine learning for outcome-focused predictions are at risk of losing sight of this limitation and assigning causality in situations when only correlation is supported by the data. Because of significant potential sources of bias, including confounding and selection biases, correlation-driven models are subject to crippling errors and diminished utility.

A dedicated initiative to understand causal inference and incorporate components of causality into machine learning algorithms can set the stage for significant developments in AI. This step forward can advance machine learning beyond predictions and toward prescriptions – actionable insights – derived from sound and logical causal reasoning. Granted, building causal intelligence into machine learning is not a simple task, but it is imperative in order for AI to achieve new and more productive functions beyond those of existing models that are restricted to correlation-driven prediction.