Understanding Causality in Complex Systems via Network Guided Causal Inference

Statistical Methodology; data science; artificial intelligence

This project will develop a novel network guided causal inference framework to uncover causal relations among entities in complex systems, with applications to various physical and human-made systems.

Research Interests
  • data science
  • Artificial Intelligence
  • causal inference
  • complex systems
  1. JL
    Jundong Li
    School of Engineering and Applied Science
  2. TL
    Tianxi Li
    College of Arts and Sciences
  3. HX
    Haifeng Xu
    School of Engineering and Applied Science

Many physical and human-made complex systems can be modeled as graphs, where nodes represent entities and edges represent their interactions. Examples include biological systems, critical infrastructure systems, collaboration networks, to name a few. Many machine learning and statistical modeling approaches have been developed to discover insights about the underlying interaction mechanisms in the systems. These efforts are either predictive (e.g., predicting the functionalities of a protein) or descriptive (e.g., measuring the risk of system failure) in nature. One of the critical weaknesses of such approaches is that they often exploit spurious correlations embedded in the data, leading to untrustworthy findings. Understanding the causal relations among different interacting entities in complex systems is crucial to mitigate the adverse impact of spurious correlations and deliver more accountable decisions. One timely example of significant importance is the epidemic system, where different types of entities and relations are at play. To guarantee effective policymaking, government agencies and healthcare providers need to quantify how different interventions (e.g., self-quarantine, school closure) will impact the individuals’ infections.

Despite the importance, discovering the causal relations in complex systems is notoriously challenging for the following reasons: (1) Traditional way of causality learning is to conduct randomized control trials (RCTs) to rule out the influence of other factors. However, RCTs are often expensive and even unethical to perform in practice. (2) As opposed to RCTs, an alternative solution is to make use of observational data. Nonetheless, existing studies often assume that observation data is independent identically distributed (i.i.d.), while different entities in complex systems are often connected in one way or another, making the causality learning process difficult due to the entanglement. (3) Many complex systems are naturally dynamic, and typical examples include interactions among individuals in an epidemic spreading and connections among users in a social network. How to capture the dynamics of such systems for causal understanding remains another challenge.

Given the above, we aim to develop a novel network-guided causal inference framework to facilitate the learning of causal relationships among entities in various physical and human-made complex systems.  Our team is well-positioned for this task, with expertise ranging from graph mining, causal inference, statistical analysis, and optimization. Our proposed research is organized into the following three thrusts: (1) we will develop novel causal inference models to quantify the causal relations for a pair of or a collection of interacting entities; (2) we will develop novel causal discovery models to identify potential causal relations between different entities and understand how they influence each other; (3) we will develop new approaches to shed light on the evolving process of causal relations in a dynamic system. The proposed research aims to bridge the knowledge gap between what we have (i.e., a large amount of observational data in complex systems across various high-impact domains) and what we need (i.e., more accountable and trustworthy insights on complex systems to support rational decision-making).

Desired outcomes

1. We will collect observational data from different complex systems (e.g., biological systems, critical infrastructure systems, social media networks) or create semi-synthetic datasets by simulating the counterfactual outcomes.

2. We will develop novel network-guided frameworks to eliminate the influence of hidden confounders and provide unbiased quantification of causal effects estimation among different interacting entities in complex systems.

3. We will develop novel network-guided frameworks to identify potential causal relations between different entities and understand how they influence each other.

4. We will investigate how the causal relations among entities evolve over time in a dynamic system and develop novel frameworks by time series modeling.

5. We will develop open-source causal inference packages and integrate our developed frameworks into the package.