1. Introduction

Cause and effect cannot be proven based on purely observational data – that’s a fact. Does this mean we need stop here already? All the amounts of data we collect in the era of Big Data can’t help us in one of the most important tasks – namely understanding “why” something happens and how we can influence a system to achieve a goal?

Cause and effect cannot be proven, but that does not mean that we can‘t get any hints on potential cause and effect relationships, in some cases close to a proof?

The primary challenge is the existence of so-called confounding factors – factors which are correlated to the target we want to understand, as well as correlated to other factors, the effect of which we want to understand. In healthcare, for example, an important confounding factor typically is “age of the patient”. If we want to understand the effect of a treatment and compare treated vs. non-treated patients, we need to adjust for a different age-distribution in those two groups.

If we miss information about important confounding factors … then indeed we have little chances to understand cause and effect. Put the other way round, the conclusion is: Understanding cause and effect based on observational data requires a comprehensive view to the object in focus of analysis with all important information attached to it – as otherwise we are prone to overlook the real causes for a target and misinterpret others.

Comprehensive, rich information means it will be complex data. We therefore also need a technology which can deal with such complex data and search that data for potential confounding factors. The more comprehensive the data we have and the stronger the algorithm which searches that data, the stronger are the generated hypotheses on potential causal factors driving a target.

With all-embracing data we might get close to a proof – usually, however, results need to be assessed by the expert. An important element of our approach therefore is intuitive to understand results and means for the expert to validate results, for example reject some factors or ask for alternatives. Algorithms are important, equally important is interaction with the expert.

The Causal Discovery algorithms are embedded into the XOE and demonstrated in a short video – see the corresponding section. For more detailed information and project support please also ask the Xplain Data team.

Causal Discovery Examples

Examples for target events

Here are a few examples of what a “target events” can be and how you can set it.

Healthcare

  • You want to understand why some patients suffer a stroke

  • …or a COVID-19 infection

Manufacturing

  • You want to understand why a specific failure of a technical device occurs

CRM

  • You want to understand why a customer terminates a contract

To set a target events

  • Select the corresponding events in the XOE UI (yellow selections).

  • Click the “bulb” icon at the monitor window which primarily defines the target.

Based on your selection a model configuration will be proposed (the model scope, the target events, the events to be ignored, …). Review this proposal, potentially change the settings and hit the start button.

Examples for start events

Start events define the start of an episode within which you want to understand why the selected target events may or may not appear. Here are a few examples of what a “start events” can be and how you can set it.

Healthcare

  • Pregnancy: Starting from the point of time where a pregnancy is diagnosed, you want to understand why in some cases the pregnancy ends up in a toxemia.

  • Within cases of COVID-19 you want to understand why sometimes Covid-19 escalates and ends up in hospital.

  • Within a therapy episode starting with product A, why do some patients switch to product B? Can this switch be predicted?

Manufacturing

  • At a certain production step, why do some of the manufactured parts fail to pass the test at the end of that step?

CRM

  • Within an inbound call of a customer asking to terminate a contract, how can the service agent manage to retain the customer?

To set a start events

  • Check the corresponding checkbox to enable start events. Based on your current selections (yellow), start events will be proposed. If those are not what you want, click at the red selection icon and edit the start events.

  • An alternative and often better way to define the start events is to use the red selections set. If there is a red selection set, and if this is non-empty, the start events will be proposed from there.

Examples for the event space

The event space defines the class of events within which you want to understand why the target events happen. Here are a few examples of what a “event space” can be and how you can set it.

Healthcare

  • Within a class of products, you want to understand why some patients start a therapy with product A.

Manufacturing

  • Within a class of failure events A, B, C,… why do we first see the event A and not others.

CRM

  • Within a class of products, you want to understand why some customers initially choose product A and not one of the others.

To set the event space

  • Check the corresponding checkbox to enable the event space. Based on your current (yellow) selections, start events will be proposed. If those are not what you want, click at the purple selection icon and edit the event space.

  • An alternative and often better way to define the event space is to use the purple selection set. If there is a purple selection set, and if this is non-empty, the event space set will be proposed from there.

Causal Discovery Whitepaper

Causal Discovery is an active field of research, and our algorithms are continuously developed.

Guarantees for results cannot be given. More details on the algorithmic approach, however, are available in our whitepaper:

Abstract

In the era of Big Data, we collect tons of observational data. “Observational” thereby means that data is collected not in terms of controlled or randomized experiments – we simply get all data we are able to observe.

Beyond Big Data the buzz now goes towards Artificial Intelligence. Interestingly, however, there is no talk about “Causality”. How can a system act intelligent to achieve a goal without a notion of cause and effect? Isn’t causality at the heart of Artificial Intelligence?

The reason for that discrepancy is that cause and effect cannot be proven based on purely observational data. It cannot be proven, but does it mean that observational data cannot give us any hints on potential cause and effect relationships? With all our Big Data, can’t we get to interesting hypotheses or close to a proof?

In this paper, we initially discuss the term “Causality”. In the context of only observational data, this term isn’t well defined. We therefore coin the term “Observational Causality” and thereby largely follow a definition which has been given by Kenney (1979). In short, this definition means: If, for an observed dependency between two variables, we cannot find any indirect explanation in terms of other available information, then we will need to assume a direct, potentially causal relationship.

According to this definition, we factorize a relationship into “independent explanation components” – those parts of the relationship between two variables which cannot be explained via other variables. Even if some or all of those components turn out to be non-causal, those explanation components are worth looking at: With those, myriads of meaningless correlations are reduced down to just a few independently contributing factors. Disentangled into independent contributions means that those factors can be assessed independently and intuitively by the domain expert, in particular with respect to their potential causal effect.

The concept “cannot be explained via other available information” quickly rises the question: What has been “other available information” in a specific case? The more we know about the object in focus of analysis the stronger is the statement, and the closer we can get to indeed causal relationships. Ideally, we have Big Data in terms of a big diversity of information. In order to manage diverse information for the object in focus of analysis, we no longer store data in “flat tables” – we use an object-oriented approach for storage and holistic analysis. This object-oriented analytics turns out essential for any approach to Causal Discovery.

Click to get full whitepaper