Monday, November 23, 2015


Last week, the Young Academy, a group of young scientists affiliated with the Royal Netherlands Academy of Arts and Sciences, published a small booklet about multidisciplinary and interdisciplinary science (only in Dutch, link to pdf).

Multidisciplinary science works by approaching a problem from multiple disciplines, all of which offer different yet complementing perspectives and methods. Together, the multidisciplinary perspective offers a deeper or more complete insight. On the other hand, interdisciplinary science revolves around questions that can principally not even be approached from a single discipline: they exist only on the edge between two or more disciplines. So it's not just the approach that is new, it's also the scientific problem itself that is 'invented'.

Interdisciplinary problems can be tantalizing. I personally am motivated by interdisciplinary questions like "Can we build a little system that functions like the human body? And if we can, does that mean that this engineered little system is alive? If not, why not? How much integration between technology and biology is needed before the integrated whole can be considered to be a living thing?"

The answers to such questions can only be given in a language that is not fully familiar to a biologist, a philosopher or an engineer. They will be written in a new, interdisciplinary language.

Learning to speak this language is both a risk and an opportunity. Yes, it will allow you to describe new phenomena, but the risk is that your message will only resonate with very few people. Being an interdisciplinary scientist means you'll have to abandon at least a part of your 'home discipline' along with the comfort, proven methods and scientific depth that all stem from its tradition and its community. You run the risk of alienating yourself from your colleagues and substituting the well-accepted challenges in your field for questions that appear interesting, but may very well turn out to be shallow or impossible to solve.

Still, interdisciplinary science is definitely worth it. After all, who would not be excited to learn answers to questions that they didn't even know existed in the first place?

Saturday, June 6, 2015

Procedural Generation

The video above is not an animation or a film. It's a 'demo', which means that the entire visual and musical experience is not drawn and recorded in advance by artists, but instead it is generated by a computer in real-time. In fact, if you have a Windows desktop or laptop, you can download the demo here (download link on the right) and then have it generate the exact same experience just for you, on your own computer, in real-time.

If you have downloaded the demo, you'll have noticed that the file size is ridiculously small: the whole demo is only 58 kilobyte. As a reference, the Youtube movie above - which is actually a video recording of the demo - is 14.8 megabyte. And that's just the non-HD version. This means that the demo is more than 250 times smaller than the recorded movie.

Does this mean that we can just turn any movie into a demo and then watch hours and hours of movies on our smartphones without ever exceeding the limit of our data plan? Obviously, no.

The reason that a demo is so much smaller than a video is because it doesn't actually contain any images or sounds. It just contains lines of code that instruct your computer to generate specific content on the fly. This may sound foreign, but everyone who has ever read a book is actually very familiar with the concept: the book contains no images or sounds, it just contains strings of words. Yet, when reading the words, they evoke very particular images in our minds. With minimal information, we manage to create our own experience. Just look at the 108 byte of words (almost 500 times less than the 58 kilobyte of the demo!) that make up the first line of Hemingway's The Short Happy Life of Francis Macomber.
"It was now lunch time and they were all sitting under the double green fly of the dining tent pretending that nothing had happened."
Definitely the start of a real experience; and all of it only exists in our minds.

This way of producing content or user experiences is called 'procedural generation'. By giving minimal instructions to a complex content generator, the most amazing experiences can be produced - regardless of whether the content generator is a computer with a graphics card and a monitor, or a person with a very lively imagination.

Another example is Terragen, a tool that takes user instructions to generate realistic-looking landscapes and sceneries. None of it is real, none of it is drawn by artists. It is all generated procedurally. Yet the results are truly stunning.

Landscape, procedurally generated in Terragen. Image by Robert Choronzuk -

Procedural generation is used for creating art, but can also be an important method in other fields, such as biology. Simple instructions in living beings can lead to complex results - just look at embryonic development!

For procedural generation to be applied in biology, we need to develop tools that help us generate biologically instructive signals. Luckily, we are already making great progress in doing so. From the design of instructive DNA, RNA and proteins in synthetic biology, to the engineering of brain-stimulating probes in cognitive science: all these methods can be used to generate biological systems with a desired form and function. It's an exciting time to be working in bioengineering!

Monday, May 4, 2015

Can Google Fit predict a heart attack?

I just installed Google Fit. It was free, and it took less than a minute to get it up and running. Now I know that I took 2,562 steps while thinking about this post. (I also know that it took me 26 minutes, which translates to 100 steps per minute, or a measly 4.8 km/h.) This data was extremely cheap - it didn't cost me any money, and gathering it only took a minute of my time.

There's more health-related data that is very cheap. Think about my current weight, or my current blood pressure and heart rate - the equipment to measure these things is easy to come by, and gathering the information is fast and cheap.

Blood pressure, heart rate, exercise. How is this data related to health, exactly? If I keep tracking my steps with Google Fit, will I eventually be able to accurately predict whether or not I'll get a heart attack this week?

Of course not. I'll be able to make very vague statements at best, because the measured data is so remote from the endpoint of interest. A man my age has a 0.0012% chance of getting a heart attack this week. I'll adjust that for the extremely healthy 2,562 steps I took just now. That makes my chance of getting a heart attack this week 0.0011%. Since I have no more information to adjust this prediction, there's nothing I can do but roll a 90900-sided dice and see if this is my unlucky week!

I can analyze all the cheap health-related data I want - weight, blood pressure, broccoli-intake, local air pollution index, beers consumed - I'll never be able to make that prediction accurately. Parameters about weight, exercise and diet may all have some predictive value about the health status of my body, but in the end, the predictive value is very modest. In that respect, the low cost of cheap health-related data comes at a price.

So what about more expensive data? What if I would undergo all sorts of blood tests and advanced body scans? This data is a lot more valuable, because it is 'high level': it makes a very clear, direct statement about the health status of my body. If a scan finds major unstable plaques in my coronary arteries, then this state is very intimately linked to the health problem that I'm interested in. There's only one step from the unstable plaque to a heart attack; all it takes is one rupture. Because of this high-level information, I would be able to make a far more accurate prediction of the risk of a heart attack.

So is this trade-off between price and value of health-related data a given? Is cheap data always of little value, and does valuable data always have to cost a lot?

Not necessarily. There are ways to produce cheap, yet valuable, health-related data.

First of all, biotechnological progress is continuously lowering the cost of health-related data. Whole-genome sequencing, microbiome analysis, wearable biosensors, data storage and analysis: all of this was formerly considered to be extremely expensive, but is becoming progressively cheaper.

Second of all, we should work hard to make our cheap data more valuable. This can be done by integrating all the cheap, lower-level data in ways that do justice to the structure of the human body - for example, exercise may in some cases be important to reduce the risk of a heart attack, but it may have no effect in people with specific genetic disorders and resulting cardiovascular malformations. A lot of value can be added to cheap data by understanding the systems biology, the causal relationships - the chains of effect, the feedback loops.

Finally, it seems that in-depth analysis of expensive, higher-level data can actually reveal lower-level surrogate parameters that are both valuable and cheap. For example, I was one of the researchers in a study in which we analyzed thrombosis with a very sophisticated model of a stenotic coronary artery - realistic flow patterns, real human blood, actual human tissue from vascular walls, pumps, fluorescence microscopy. In short: a very expensive way to generate health-related data. But what we found was that most of the thrombotic effect - the major pathophysiological end-point - was simply due to a single protein. This protein may be the key parameter of the process, and measuring its levels in human blood may be enough to predict thrombotic effects. Whether this is indeed the case remains to be seen, but it would translate the very expensive set of high-level data into a cheap lower-level parameter without sacrificing too much predictive value.

Developing health-related data that is both cheap and valuable is hard work. It requires you to thoroughly understand the system, to uncover causal effects and identify key parameters. My current Google Fit data is pretty much useless. However, we would be able to squeeze every drop of value from the data once we understand how my 2,562 steps fit into the complex web of relations and feedback loops that determines the dynamic health status of my body. Ultimately, we should strive for a level of understanding of the system that allows us to know whether the 2,562 steps will have nudged the occurence of a heart attack from this week to the next. But even if we never get to that level, there's a lot of valuable information to be discovered along the way.

Wednesday, April 29, 2015

Big Data, Human Disease Models and Clinical Practice

We live in the age of big data. For example, analysis of large databases of patient information, genome-wide association studies, large double-blind clinical trials to test for drug effects, etc. All this big data generates real, and useful information on health and disease. But how does it translate back to the individual patient?

Basically, the information that big data gives us can be translated into 'multipliers' for individual patients. Let's say that there's a baseline risk of you getting a heart attack in the next year. This baseline can be based on taking the average amounts of heart attacks per year, divided by the total amount of people. For the Netherlands, this is approximately 0.25%. But it's immediately clear that this number is useless for the individual patient: it makes a huge difference whether you're 45 years old or 75 years old. In the former case, you only have 0.03% and in the latter, approximately 1% chance of dying from a heart attack in the coming year. Multipliers would be 0.13 and 4. These multipliers are easily deduced from big data.

Some multipliers - like age - are hard-coded. Others can be modified. Both are important for a clinician when talking to an individual patient. The modifiable risk factors should be as low as possible, and at the same time the non-modifiable risk factors can be used to decide on potential treatments. Operating on a frail 75-year old is probably going to do more harm than good; operating on an otherwise healthy 45-year old will probably be worth it.

There are lots of multipliers that can be deduced from big data. What is the multiplier for having high cholesterol? For smoking? For eating lots of meat? For being overweight? All of these 'risk factors' will re-calibrate the baseline risk to a risk that is more tailored to the situation of the individual patient.

And big data from these epidemiological studies is not the only thing. What about genetics? Genome wide association studies may give specific gene variants that give an additional multiplier, thereby further tailoring the risk of dying from a heart attack to your situation.

There is data from drug studies that can be used to inform decisions about whether drugs will help modify the multiplier. (For preventing heart disease in people with no history, the answer currently is 'no'. But maybe we'll identify subgroups for whom a preventive treatment would work!)

So basically, if we pool all multipliers from big data together, we'll get an equation that looks something like this: 0.25% (baseline) * 1.4 (age) * 0.82 (cholesterol) * 1.1 (lifestyle) * 1.08 (BMI) * 0.987 (SNPs) * ... etc. etc. = 0.354% chance that you'll die of a heart attack in the coming year.

But what have we learned? We have learned that we should try and lower all multipliers as much as possible, and potentially develop new treatments that yield risk-reducing multipliers. It's great for reducing prevalence of symptomatic heart disease in the general population. But what about the individual patient?

In clinical practice, every patient is evaluated as an independent being. Let's say that two patients are brought in with a heart attack. One of them has an aggregate multiplier of 0.84, and one has an aggregate multiplier of 1.64. The doctor will look at them both and tell them that one of them was 'unlucky' and the heart attack of the other one was 'to be expected'. And that's basically all they can do, based on this epidemiological information. But of course, the most important thing in treating patients would be to be able to deduce what treatment will be effective, by using information from big data as a guide.

But that's where it gets tricky. Because big data can definitely provide this information if the multipliers are large. Someone with high cholesterol is in a high risk group and should definitely be treated to lower cholesterol levels. Someone with 80% stenosis in their carotid artery and with previous symptoms for stroke should definitely be operated on to get rid of the stenosis. But it becomes more complicated quickly. What should we do with a 78-year old, BMI 20, cholesterol 222, 40% stenosis, with suspect neurological complications? Should we operate, or not? How will big data help us? We would need to filter out the group of 75-80 year olds, with BMI 18-22, cholesterol 220-230 and stenosis 35-45% and see if the risk of dying will be reduced or increased if we operate. The problem is that the subgroup has by that point become so small that there is not enough data, and thus no real statistical strength anymore. The variation due to unknown factors is so big that we don't have enough statistical power to draw a conclusion. Big data really only has 'strength in numbers'.

We need the 'strength in numbers' because we are being agnostic about causal effects. With big data, we're mainly interested in correlation: finding a gene 'for' heart disease, finding a diet 'against' heart disease. We're not interested in how it works. And we make up for that lack of knowledge, all the variation due to unknown factors from different length and time scales (behavior, environment, upbringing, general stress, hormone levels, allergies), by 'averaging them out'. It's a great way to get information. But at some point - when trying to use it to inform real treatment of single individuals, it will begin to fall apart.

An individual is not the sum of their multipliers. We shouldn't think that big data is going to solve everything. Because we can only keep defining sub-groups, sub-sub-sub-groups for so long. At one point, you're going to end up with a sub-sub-sub-sub-group that consists of only one patient, and there is no longer any strength in numbers. You want to predict the outcome of an individual patient, yet, because of their uniqueness, you don't have any big data to inform your decisions.

The only way out of this situation is to supplement big data with system-level understanding of individual patients. How do all multipliers and risk factors interact to give a specific risk profile? If we can go beyond simple addition of multipliers, and instead understand the causal relationships between those multipliers, we can make predictions about treating a single patient. If we understand how old age, hypertension and high cholesterol interact, if we understand how common gene variants fit into the pathophysiological picture of the patient, we can use mathematical models to inform treatment of that specific patient.

In his book The Formula, Luke Dormehl argues that the only way to avoid the type of black box algorithm that 'big data' uses is to keep asking 'why'. Why is this product or treatment recommended for me? Why am I at increased risk for heart disease? Only by asking 'why', can we finally learn and find underlying patterns and causal relationships and avoid blindly following the statistical algorithms.

The amount of data that we can ever collect on health and lifestyle is undeniably finite, because the amount of people on earth is finite. This means that after predicting broad trends, after identifying broad categories, after categorizing and further sub-categorizing those trends and categories, algorithms will not have the 'big data' anymore to make further predictions about unique individuals or small groups of people. That's where we'll have to rely on research that uncovers causal relationships again.

The nice thing is that we'll probably be able to learn a lot of new mechanisms of disease from the correlating relationships that the current algorithms - and probably the algorithms for a few decades to come - will manage to uncover. We'll just have to realize that at some point, the current, algorithm-based tools of personalized medicine, or precision medicine, will have their limits. There's no ultimate truth in big data.

Big data should be complemented by principles and tools from exeprimental medicine and systems medicine to finally understand the individual patient. We should design experimental tools that allow us to use patient-specific cells and tissues, reproduce patient-specific conditions and then test hypotheses about how all the multipliers interact in cause-effect relationships to finally give a multiplier that is either zero or infinity. Something happens, or it doesn't. In that respect, Claude Bernard was right when he dismissed the use of statistics in experimental medicine: "If based on statistics, medicine can never be anything but a conjectural science."

The Stream of Big Data in Medicine

We are entering the age of big biometric data. Like I mentioned in my previous post, 'precision medicine' relies on generating huge amounts of information about individuals in order to determine the optimal strategy for disease treatment and prevention.

Advances in sensor technology and data management allow companies and hospitals to build comprehensive patient profiles, based on gene sequencing, blood pressure and heart rate monitoring and all sorts of other biometric information.

But how does this huge stream of big data trickle back to the individual patient? The concept is simple: by analyzing the data of many individuals, broad 'profiles' can be defined. If I would like to know what specific treatment regime to seek for my migraines, all I have to do is go through the data and find how other Caucasian, middle-aged men with similar relevant biometric parameters have responded to various drugs. The treatment that worked best for people like me will likely work for me, too. This is the great potential of precision medicine, and its great potential is only just beginning to be unleashed.

Of course, it's clear that there's a fundamental problem with precision medicine. Let's say I have a relatively rare disorder, which means that the pool of big data for me to fish in is relatively small: there are just not that many people that have this disease. How many people of that disease pool are 'like me'? How many fit the profile of middle-aged Caucasian male, with a set of specific risk factors based on genetics and lifestyle? The stream of big data inevitably turns into a tiny pool of data with not a lot of statistical power.

This is not just a problem for people with rare disorders. In the end, we're all part of a tiny sub-sub-subgroup with a size of one individual. That means that after we've used big data to see the broad patterns of specific risk factors and strong correlations in strategies for treatment and prevention, we can't zoom in any further without things getting fuzzy and uncertain. It's simple statistics.

Just to be clear: I think the potential of precision medicine is enormous; we're only just beginning to look at the broad picture and what we'll find in there will most likely have a very significant impact on medicine. Still, it's clear that the approach has its limits.

So how will we deal with this inherent problem of big data analysis? The answer lies in realizing that it's not the nature of the data itself that is the problem. It's what we do with it. The current approach to using biometric data is to analyze it with statistics: if a patient has this set of parameters, he's 4.2 times more likely to benefit from treatment X instead of treatment Y.

But there are different ways to analyze the data. The data doesn't have to be passive and bulky. It can also be used to serve as specific inspiration in experimental medicine and to find causal links between all risk factors, genetic information and other biometric parameters. If we understand why something is a risk factor, we can begin to move from vague statements about patient groups to more concrete ideas about individuals. In order to do this, we need to understand how all factors fit together. All the inputs, outputs and internal wiring of the system that we call a 'human'.

This way of analyzing big data is exactly what the field of Systems Medicine is all about. Remember the term: you'll be hearing a lot more from it in the future.

Monday, February 16, 2015

Precision Medicine versus Personalized Medicine

In the 2015 State of the Union, President Obama announced that he would financially support a Precision Medicine Initiative. An initiative that aims to give clinicians more tools to tailor their treatment for each individual patient, instead of having to rely on the concept of 'the average patient'.

The White House has posted some details about the initiative here. The biggest chunk of money (130 million dollars) goes to the creation of a 'voluntary national research cohort' of at least one million volunteers. All sorts of parameters and data related to health and lifestyle of these volunteers will be collected: genetic background, diet, microbiome, blood pressure, level and intensity of exercise, type and stage of disease, response to treatment. Based on this huge set of 'big data', one can analyze which specific subsets of this cohort are at high risk for certain diseases and which subsets would be most responsive to particular treatments and interventions.

The fact that Obama refers to this initiative as 'precision medicine' is a clear indication that this term is becoming increasingly popular to denote the concept that is more widely known as 'personalized medicine'. I think this is a good thing: 'precision medicine' is a better and more accurate term than 'personalized medicine'. There are two reasons for this.

First of all, as Robert Plenge also points out in a recent blog entry, clinicians rightfully bark at the 'breakthrough' of medicine finally becoming 'personalized'. Every patient in clinical practice is already treated as a person. Every diagnostic test, every treatment and preventive strategy is 'personalized' and prescribed to individual patients. Clinicians work hard every day to value their patients as the individuals they are, to listen to them, to examine them and to finally explain to them what is wrong and what can be done about it. Perhaps medicine is not perfect and precise, but it sure is personalized.

Related to this concern of clinicians that the term 'personalized' is being hijacked to describe something that is better defined as precise, is the fact that the term is also a bit misleading. It seems to suggest that if we can collect and quantify a lot of parameters related to health and lifestyle of some person, and we do this in a way that is good enough to select the best medical prevention strategy and treatment for this person, that we have somehow managed to define that person in the clinical context. What more is there to know, if we have quantified not only every genetic and physiological parameter, but also quantified every relevant behavioral, psychological and cognitive parameter? There is no more information that can be gathered that would change the way we define this patient and their treatment. At this point, it can be tempting to think that the measured parameters don't just apply to the patient, the patient is defined by the parameters. The patient is nothing more than a very comprehensive set of quantifiable parameters.

The result of defining a person as a set of parameters is what Gilles Deleuze describes as a 'dividual'. The person is nothing more than a collection of quantified values, strung together by a formula. How various parties in society deal with this 'dividual' depends on the outcome of automated algorithms. Of course, this is exactly how Google and Facebook manage to design their 'personalized ads'. By quantifying and measuring every aspect of our online presence, we are defined as this set of parameters and are treated accordingly. Lured to specific websites, offered 'personalized' pricing for products.

One can imagine similar trends in health care, where - based on the formula - a person is given 'personalized' rates for insurance, 'personalized' prices for fatty foods and gym memberships. In order to prevent heart disease or respiratory disease in children at risk, 'personalized' incentives would be given to some to be active and play outside while their friends would receive 'personalized' incentives to stay at home with 'personalized' air-purifying systems.

Of course, precision medicine is not - or at least should not be - about this re-defining of patients as sets of parameters and about 'personalizing' health care through subtle control mechanisms. It's about making medicine more precise. It's about improving medical science. Its focus is on the craft of medicine, not on how to define patients. In order to communicate this clearly and unequivocally to society, it's a good idea to steer clear of the loaded term 'personalized'.

Precision medicine is a term that more accurately defines what we as biomedical scientists have in mind when we think about using detailed patient data to discover and apply new ways to prevent and treat disease. Using the term 'personalized medicine' to describe this scientific discipline is inaccurate and may lead to confusion about the long-term ambition of the scientists involved.

Saturday, February 14, 2015


As scientists, we're trained to be indepedent thinkers. First and foremost, this means that we're skeptical - we won't accept stories and theories without proper proof. We use skepticism to polish the ideas and scientific stories produced by ourselves and others. We take a fluffy little nugget of knowledge, and brush it abrasively until only the undeniable fact or testable theory remains.

Skepticism is relatively easy to apply. After all, we are the experts and we can immediately question every new idea or theory and then point out errors in reasoning, come up with holes, or attack the method and interpretation of results. This is especially true if the new ideas are fresh, ambitious, interdisciplinary, or 'soft' - like in the humanities and sometimes in social science.

Because skepticism is so easy - especially when applying it to the ideas of others, we tend to ignore the fact that being skeptical is only part of what constitutes independent thinking in science. If all we would do is be skeptical, the progress of science would quickly grind to a halt.

The thing is, science is also a creative endeavour. Independent thinking also means that we can come up with interdisciplinary ideas, new theories and fresh interpretations of existing knowledge. It means combining, mixing and building on other work. Being an independent thinker means you should not just be skeptical of new ideas, it also means that you should be able to come up with new ideas in the first place!

Being creative and constructive is a lot more difficult than being skeptical. To be creative, you need to open your mind a little and give room to imagination and dreams. This is something that comes naturally for some, but that needs to be encouraged in others. Either way, it's important to make it a part of our education.

This is exactly what Prof. Jeroen Geurts, president of the Netherlands Young Academy of Arts and Sciences stresses in a recent interview (in Dutch) with Vrij Nederland magazine:

I'm a bit disappointed that in The Netherlands, we're particularly good at criticizing things, at identifying potential problems. I'm put off by being critical just for the sake of being critical. That's also what I tell my students when they ask each other questions: is this really something you'd like to know? Or are just trying to act all critical? More often than not it turns out that they were objecting just for the sake of arguing. "That's science, isn't it?" is what they'll say. "Being critical?" My answer is: "You're only allowed to be critical after you've allowed yourself to dream." Because that is how it all starts: to dream, to set a goal on the horizon. To have great ideals: to cure MS, to rid the world of Alzheimer's. Hypotheses should be nurtured first. You'll suffocate everything by immediately asking: "Is that even possible?"