An earlier version was published at 3 Quarks Daily.
______
Since 2015, as part of research that has involved many students, I’ve worked with a dataset called the Medical Expenditure Panel Survey (MEPS). The survey is conducted annually by the U.S. Agency for Healthcare Research and Quality (AHRQ). Thousands of households across the United States are sampled each year to represent the national demographic, and members of each household are compensated for their time spent completing lengthy questionnaires. It is tedious, painstaking work that involves many hours of interaction between the staff who conduct the survey and members of each household. Once completed, it takes years to carefully anonymize the data, organize it into different file types, and release it to the public. Which is why, as of today, the data files only go up to 2023; presumably, the 2024 files will be released next year. But with all the cuts to programs at Health and Human Services (HHS), I wonder what the future of MEPS and other similarly important national surveys—like the CDC’s National Health and Nutrition Examination Survey (NHANES)—will be.

As the title suggests, MEPS focuses on healthcare expenses – whether out of pocket or paid by insurance. To track expenses, the survey records the medical events a person experiences in a year. The figure above, for instance, visualizes the timeline of medical events for a 69-year-old woman in 2011. The timeline is interspersed with home health and office-based events, with one hospitalization in May 2011 and an emergency room (ER) visit in November. Home health events refer to care received at home; office-based events refer to typical doctor’s appointments: seeing a family physician, a cardiologist, an endocrinologist, a neurologist, a dermatologist, and so on. Each medical event also has diagnosis codes, so the underlying disease(s) (a viral infection, diabetes, osteoarthritis, breast cancer) someone had in the year of the survey can be inferred. And by using the prescribed medicines file – by far the largest file download on MEPS – you can even know the precise set of medications someone was taking that year.
This is what I mean when I said the survey is tedious and painstaking work: imagine how many questions must be asked and answered to gather all these medical—and very personal—details. The upside is that anyone with the patience to go through the documentation can use MEPS to construct a detailed portrait of the types of health conditions, medical events, and expenses in the country. A quick Google Scholar search reveals that over 500,000 studies have used MEPS since the survey started in 1996, on topics ranging from “national trends in aspirin use” to the relationship between “precarious employment and mental health”.
Multimorbidity and Its Causes
I’ve also used MEPS for different research questions, and data extracts from the survey are often part of computational assignments in my probability courses. In recent years, however, I’ve focused on patterns in multimorbidity, a clinical (and vaguely intimidating) term that refers to the presence of two or more chronic diseases in an individual. Multimorbidity rates have been rising globally. In 2017, researchers at the RAND Corporation estimated (using MEPS data from 2014) that 58% of the US population has either no chronic condition or just one. The remaining 42%, however, have two or more chronic conditions. That’s more than 130 million Americans – an astonishingly high number!
But given the advances in medicine, maybe we shouldn’t be so surprised. Multimorbidity can be attributed to the increased life expectancy we’ve enjoyed since the 1950s: the longer we live, the more diseases we can expect to have. And with the widespread availability of diagnostics – blood tests that evaluate dozens of biomarkers, MRIs, CT scans, X-rays, and, more recently, wearable devices that constantly monitor our vital signs – there’s a good chance that a “hidden” condition will be detected and treated early. Some would argue, perhaps correctly, that we are surrounded by a massive Medical-Pharmaceutical complex, whose financial interests drive the early detection of diseases. All it takes is for two biomarkers to be out of range, and you are on your way to multimorbidity. I’ve found this to be true for myself: my cholesterol levels are often above or below the recommended cutoffs, and my thyroid-stimulating hormone is always more than it should be.
In the last few decades, we’ve also witnessed some profound societal changes: how the food we eat is grown, transported, and processed to ensure a longer shelf-life and easy access to more calories than we need; how much more glued we are to our devices and therefore more sedentary; and how we are now exposed to range of environmental pollutants, from pesticides in the water to microplastics just about everywhere.
Some of these changes have noticeably impacted our health. The uptick in anxiety and depression among adolescents since the 2010s – the subject of Jonathan Haidt’s book The Anxious Generation – is one example. The increased prevalence of Type 2 Diabetes is possibly the clearest marker of these broader global trends. Although known and diagnosed in many ancient cultures (Egypt, Greece, and India) and long considered a disease of the wealthy, in recent decades its incidence has gone up around the world. And because diabetes impacts a host of other organs — the heart, the kidneys, the eyes — it is very much a gateway to multimorbidity.
The Combinatorics of Multimorbidity
In MEPS, multimorbidity can be inferred by analyzing the diagnosis codes that an individual collects in the year they are surveyed. This takes us back to the timeline of medical events that we saw in the figure earlier. By putting together diagnosis codes related to medical events, we can create a list of conditions for that year — what I will call an individual’s disease combination. (One caveat is that we do not know anything about how long these conditions were present, nor do we know anything about whether they were resolved in the following years.)
The diversity of disease combinations among the surveyed individuals is startling. Here, for example, are the disease combinations for three individuals from the survey. Each disease name is separated by a semicolon. Some disease names in the list may sound familiar, while others may sound strange, but for our purposes here, the specifics are not essential. Lipid Metabolism Disorders, for instance, is the fancy medical term for abnormal cholesterol levels.

The broader point is that while the three individuals share subsets of diseases – in fact, the last two have four diseases in common – no two of them share the same combination. This feature, where someone’s list of diseases becomes their unique signature, is simply a matter of combinatorics. Since there are hundreds of diseases, the specific list of that one person has in a year is unlikely to match exactly with the five that someone else has. In fact, among those surveyed in MEPS from 2016-19 that had 2-7 chronic conditions, we found a remarkable 34,880 unique disease combinations!
There’s a fascinating paradox here: each disease combination is quite rare (observed in only a few or, more commonly, in just one person), but there are so very many of them that their combined prevalence in the population is high (~60 million individuals in the US have four or more conditions). A 2013 paper, among the best on the topic, aptly called this paradox “the high prevalence of low prevalence chronic disease combinations”.
§
How can we identify meaningful patterns in what appears to be a chaotic jumble of disease combinations? This question has captivated me for many years, and I’ve turned to a variety of machine learning techniques. One approach I’ve considered is simply counting the most frequently occurring subsets of diseases, often called frequent pattern discovery. For example, anxiety and depression often occur together, making them a frequent dyad (or pair), while the most common triad is high cholesterol, high blood pressure, and diabetes. Such subsets provide clues to epidemiologists, since there might be as-yet-undiscovered physiological mechanisms that explain why the diseases co-occur.
It gets difficult to list quartets and quintets (subsets of four or five), since there are literally hundreds of them. But here’s one visualization, based on MEPS 2019 data. In the middle column, we see selected quartets and quintets of diseases that co-occur frequently. Each disease is indicated with a number. It would be impossible to list the names of diseases; the figure would get too unwieldy and perhaps unreadable. The right column contains the frequency — that is, the number of individuals in the MEPS 2019 dataset who shared the subset.

Take the very first quartet listed: [49,53,98,204]. This refers to [Diabetes, High Cholesterol, High Blood Pressure, Joint Disorders], and the column to the right indicates that 174 of the 28,512 individuals surveyed in MEPS 2019 share this precise subset of four diseases. The column to the left gives us a sense of the body and organ systems to which the diseases belong. Each body system is indicated with a different shape: a circle, a square, a half-circle, and so on. Diabetes, High Cholesterol, and High Blood Pressure are metabolic diseases in this classification and are indicated using squares, while Joint Disorders is a musculoskeletal disease and is indicated with a circle.
Generally speaking, the most prevalent quartets and quintets in the United States are mixtures of diseases from five different body systems: musculoskeletal, metabolic, respiratory, mental health, and cardiac. All the subsets shown contain diseases that belong to at least two different systems. Some, such as [204, 53, 657, 98] or [Joint Disorders, High Cholesterol, Depression, High Blood Pressure], which were present in 73 of the surveyed individuals, contain diseases from three different systems: musculoskeletal, metabolic, and mental health.
This feature of multimorbidity, where different body systems are often involved, poses challenges for both patients and physicians. A patient often ends up seeing a constellation of specialists: a cardiologist, an ophthalmologist, a neurologist, a psychiatrist, a gastroenterologist, a family physician, and so on and so forth. I noticed in MEPS that it is not uncommon to have dozens of doctors’ office visits in a single year, spread across 6-10 different specialties.
Imagine scheduling all the appointments, arranging transportation, keeping track of the medications, filling out countless forms, and going through the same tests again and again. I suppose this is the price we pay for the explosion of medical knowledge and our relentless search for longevity. For a physician, perhaps the most disconcerting aspect is advising a patient on a disease combination they’ve never encountered before, and not knowing how a medication or treatment typically prescribed for one disease might impact others.
Integrating the Silos of Medicine
All this paints a gloomy picture, but there is an upside. If diseases from seemingly disparate body systems are related, then that requires medicine to take a more holistic view: there’s a need to stitch together knowledge scattered across dozens of sub-specialties.
Some of this is already happening. We now take it for granted that diabetes, a disorder of the endocrine system, is closely linked to high blood pressure and heart disease, disorders of the circulatory and cardiac systems. We also take it for granted that physical activity and diet play an important role in the epidemiology of these conditions — who among us hasn’t been exhorted to cut down on desserts, or exercise more, or monitor our sugar or cholesterol levels?
Yet, the link between diabetes, high blood pressure, and heart disease is surprisingly recent and was not definitively established until the mid to late 20th century. It took studies that follow large cohorts of individuals, their diets and lifestyle across decades or even generations – the Framingham Heart Study, the Seven Countries Heart Study, and the Nurses’ Health Study – to slowly piece it all together (with much more still to come).
Other recent discoveries have revealed unexpected relationships: mental health conditions like depression appear to be linked with the health of the gut microbiome [link]; infection with the Epstein-Barr virus causes the later development of multiple sclerosis [link]; and a shingles vaccine may help prevent dementia [link].
All this makes me wonder: Could similar clues be hiding in the thousands of disease combinations I’ve been analyzing for so long in MEPS? Or, if we look again at long-term cohort studies that span decades, could we find patterns that medicine has not detected yet, but are clinically valid? And are there algorithms that can parse through a large and complicated medical dataset, not to causally predict an outcome (the most common type of machine learning algorithm), but to create a database of correlations and patterns, especially unusual ones that show up only in a small number of individuals?
There are data mining algorithms well suited for this purpose, and in fact, searching for the most frequent subsets of diseases is a special case – the simplest application – of these approaches. The algorithms need to carefully adjust for a host of factors such as age, gender, genetics, income, geography, nutrition, physical activity, prior medications, and treatments. This slices the data into smaller and smaller segments. While some signals in the data might be spurious, because of small sample sizes, others could lead to new discoveries.
At least, that is the promise. Unlike individual diseases — even rare genetic ones — which get a lot of attention, multimorbidity is a lesser-known and more diffuse phenomenon; as we saw earlier, individuals often end up with their own signature disease combinations. Still, the co-occurring diseases in these individuals may help create a holistic body of knowledge that can benefit us all.
References and Acknowledgements
1. The first figure — visualization of the timeline of events for a 69-year-old woman — is from a 2018 paper in the journal Medical Decision Making: Policy and Practice. The second figure is from a tutorial we published last year in the INFORMS Tutorials in Operations Research series.
2. Many students have been involved in this research over the years. I would like to acknowledge Gabriel Schmitt, Joshua Gladstone, Arjun Mohan, Pracheta Amaranath, Ali Jafari, and Sindhoora Prakash for their contributions.
Really enjoyed reading this essay — it explained multimorbidity clearly and in a way that actually made sense. I learned a lot, and the examples were super helpful. Looking forward to more posts like this!
https://www.davidstewartbooks.com/blogs/
Thanks, David!