2013 Conference of the International Medical Geology Association (25–29 August 2013)

Paper No. 6
Presentation Time: 5:10 PM


KLEINBERG, Samantha, Computer Science, Stevens Institute of Technology, 531 Hudson Street, Hoboken, NJ 07030, samantha.kleinberg@stevens.edu

One of the key problems we face with the accumulation of massive datasets (such as electronic health records and stock market data) is the transformation of data to actionable knowledge. In order to use the information gained from analyzing these data to intervene to, say, treat patients or create new fiscal policies, we need to know that the relationships we have inferred are causal. Further, we need to know the time over which the relationship takes place in order to know when to intervene.

A key barrier to applying causal inference methods to large-scale data from continuously monitored systems is that, given the dense sampling, many important events will seem rare. For example, data from intensive care units (ICUs) or body-worn sensors may be collected multiple times per minute over a period of days to weeks. Within these data there may be periods of time where a system functions according to a consistent set of rules before an event occurs that triggers a change in its underlying behavior. We develop new methods to infer how a system normally functions and determine whether rare events explain deviations from usual behavior.

We show that it is feasible to infer complex causal relationships involving rare events with minimal background knowledge from large biomedical datasets. Through application to simulated data we demonstrate that it is possible to infer both weak and strong rare causes, including those occurring as few as twice, while maintaining low false discovery rates.

<< Previous Abstract | Next Abstract