Were you aware that there is an official French decision tree to orient patients towards relevant medical services, depending on their Covid-19 related symptoms? I transformed it into a point-based scoring system (100% logically equivalent), so that a patient can compute his/her score by adding simple weights. Using these Covid-19 orientation algorithms, a more general comparative analysis is carried out concerning decision trees and scoring systems.
If you are in a hurry, go and see the take-away messages!
Context and objectives
The Covid-19 outbreak in France challenges the French healthcare system ability to care and treat patients. In this context, an efficient orientation of potential patients is a pre-requisite to improve reactivity and good use of scarce resources: medical emergency telephone lines and medical emergency vehicles.
In early April 2020 and under population containment, the French strategy has been to promote real-time tele-consultation1 with general practitioners and other physicians when possible, so that emergency services are less overwhelmed. As presented in the next section, an orientation algorithm has been built in France to give personal recommendations to the potential patients, depending on their symptoms and background medical profile and history.
I will present additional context to the official orientation algorithm and then a slightly simplified version of the official decision tree. The simplified decision tree is then transformed into a point-based scoring system, the other type of decision-aid algorithm largely used in medicine. Finally, I will illustrate pros and cons of both algorithm types, with examples drawn from Covid-19 orientation problem.
The official patient questionnaire and decision tree
A taskforce named COVID-TELE2, supported by the French Ministry of Health and Solidarity, and co-sponsored by both AP-HP3 and Institut Pasteur4, released guidelines to help patient orientation to relevant medical services:
- a 22-item questionnaire for patients
- an decision tree algorithm to process questionnaire answers into recommendations
- an exact wording for the patient recommendations.
Several implementations can be found, either developed by the French government or by private foundations, where French-speaking Internet users can have their personal recommendations. All documentation is in French on a dedicated website (more details on this github repo for the most curious among you).
On the last available version (30th of March 2020 version) when writing this article, there are the three types of questions: symptoms (fever, cough, diarrhea, anosmia, sore throat or muscle aches), risk factors (age, body mass index, pregnancy status, chronic conditions and current medication) and severity factors (major or minor one).
Answering these questions lead to eight possible official patient recommendations. However, some of these recommendations are very similar and can be grouped. In the next section, I present a possible simplification of the orientation decision tree.
A simplified patient orientation decision tree
From the official decision tree, 2 simplifications are made:
- Focusing only on the 15 or more years old: the official decision tree only considers people over 15 years of age. I have removed this case, instead of explicitly using a branch (denoted END1 in the official tree). No loss of information though, as one can add a note that explains that the new simplified tree applies only to those over 15 years of age.
- Regrouping similar outcomes: some recommendations (originally denoted by END2, END3, …, END8) are very alike. In the simplified decision tree version, they are grouped into 3 outcomes:
- No action required (END2, END8)
- Schedule a tele-consultation (END3, END4, END6, END7)
- Call emergency services (END5)
With these two changes, the simplified patient orientation decision tree becomes more compact and readable:
Note that both official and simplified decision trees rely on a list of risk factors and severity factors:
Hereafter, I take the simplified decision tree as a reference. I will present an equivalent point-based scoring system which gives the same recommendation as the simplified decision tree for every possible case. The motivation for a scoring system is its ability to quickly capture the most important factors of an algorithm (see final section for more details).
An equivalent point-based scoring system
What is a scoring system? Scoring system means that a decision variable (the score) is built by adding, subtracting or multiplying a few meaningful numbers5. Point-based insists on the fact that i. the score involves integer numbers and ii. thresholds are used to make decisions from the score.
Point-based scoring system are largely used in medical applications. For example, CHA2DS2VASc6 is used on a daily basis in cardiology departments as a decision-aid tool to assess the risk of stroke (CVA) in the event of atrial fibrillation and to guide the prescription or non-prescription of anticoagulants. (see demo here). It is a sum of 8 contributions (0pt, 1pt or 2pts) with indicative thresholds to separate low, medium and high risk patients.
As a potential patient, summing points and comparing it to the decision thresholds leads to a personal recommendation. In the right column of the figure above, age (53y), risk factors (0) and minor severity factors (0) results in +6pts. Presence of cough adds another +2pts and anosmia increase the final score by +1pt. Then, the 9pt-score patient deduces its recommended action (schedule a tele-consultation).
Decision tree vs Scoring system: a comparative analysis
Both algorithms are largely used. For example, Lip et al. presented both a scoring system and a clinical flowchart (an action-oriented decision tree) in their 2010 paper6 (Table 2 and Figure 1) introducing CHA2DS2VASc. Applied to machine learning, both trees and scoring systems are the building blocks of hardly any interpretable predictive model. In his 2019 interpretable machine learning book, Molnar common presents interpretable models, which almost all fall into the scoring system family, tree family (similar to rule-based models) or both7.
The table below presents a comparative analysis of decision trees and scoring systems, which exposes arguments concerning their strengths and weaknesses. The comparative analysis addresses the different phases of an algorithm life: Build, Implementation and the User perspective during the “Run”. The latter refers to the patient’s ability to understand his/her situation (Local importance understanding) and to understand the main decision drivers of the whole algorithm (Global importance understanding).
Most of the arguments regarding local and global importances are illustrated by 4 discussions in the next section. These examples are based on the both algorithms presented above: the Covid-19 simplified decision tree and the orientation scoring system.
|Criteria||Phases||Arguments in favour of Decision Trees
|Arguments in favour of Scoring Systems
|Choice: the typical problem addressed||Build||+ DTree are well suited to capture factor interactions.||+ SSys are well suited to capture gradual and accumulative effects of factors.|
|Easiness of design by experts||Build||+ DTree processes cases by cases, successively and exhaustively.|
|Technical implementation||Implement||Both can be easily programmed.|
|Functional design||Implement||Both allow to take some decisions before having collected all factors.|
|Local importance understanding:
Ability for a patient to understand his/her situation
|User perspective during Run||+ Easier to follow a DTree path than computing a potentially large SSys sum.||+ SSys big sums may be challenging, but DTree can have non-trivial AND/OR conditions.|
+ SSys allows to quickly identify how small changes would change would make the outcome tip over (see Discussion A below).
|Global importance understanding:
Ability for anyone to understand the main decision drivers of the whole algorithm
|User perspective during Run||+ When well built, DTree offer a good understanding of the core principles that drive the outcomes (see Discussion B below).||+ SSys offer good insights of main factor importance, by comparing weights (see Discussion C below).
+ DTree suffer from a interpretation bias: a factor near the tree leaves does not imply a factor with low importance (see Discussion D below).
The comparative analysis, illustrated by Covid-19 orientation examples
Four general arguments of the comparative analysis are illustrated with the Covid-19 orientation algorithms.
Discussion A: a patient story
Let’s assume the patient is a 55-year woman who has lost her sense of taste and smell (anosmia8). She has no risk factor, nor severity factor and no other symptom. Her Covid-19 orientation score is 7pts, 2 points away from crossing the tele-consultation threshold. She is able to rapidly conclude that if she gets an other listed symptom different from sore throat/muscle aches (which would add only +1pt), she should schedule a tele-consultation. This insight could be derived from the simplified decision tree but in a less straightforward manner. Scoring system wins here.
Discussion B: the fever and cough interaction
Patients with both fever and cough should have a tele-consultation! This fever/cough interaction is one core principle of the algorithm. Identify it from the simplified decision tree is straightforward. With the scoring system, the interaction is explicitly written (+4pts if fever and cough together), but concluding that it implies a tele-consultation requires some more steps. It is only by summing 3 terms (fever, cough and the interaction term) that one can make the conclusion. Decision tree wins here.
Discussion C: weights and importance
Looking at the scoring system weights gives a quick way to assess the importance of factors. The highest weights are +23, +8, +6 and +4. The major severity factor (+23pts) immediately stands out as implying the emergency outcome. Then risk (+8pts) and minor severity (+6pts) factors should be considered very important but treated together (as there weights cannot occur at the same time and add up to +14). Finally, the fever and cough interaction term (+4pts) raises awareness on the fact that having both fever and cough leads to a total +9pts. As a conclusion, looking at scoring system highest weights is a quick method to reliably assess the important factors, at the core of the algorithm. This method has no simple equivalent for decision trees, as detailed in the next discussion.
Discussion D: top-vs-bottom node importance
One could think that there is a straightforward relationship between the “relative importance”9 of two factors and their “relative altitude”9 in a decision tree. But neither the absolute altitude (does it appear near the top, or near the bottom?) of a factor or the “relative altitude” of two factors can be interpreted as factor importance. First, altitude does not allow to compare factors properly (fever being above and below cough at the same time). Moreover, there is no simple way to account for factors, like anosmia and sore throats, that are “diluted” in “OR” logical conditions. Finally, factors appearing near the top (major severity factor) or near the bottom (risk factor) can be both be important. As a consequence, decision trees lack of a measure of importance of factor whereas scoring systems intrinsically bring their own (weights).
I hope that this post has:
- Enlightened your understanding of the French Covid-19 orientation algorithm.
- Convinced you that decision trees have their strengths, notably being easier to build by experts.
- Made you enthusiastic about point-based scoring systems and the measure of importance that their weights bring, both for local exploration around a particular user and for a global understanding of the algorithm.
- Increased you curiosity about how we could compute a rigorous importance measure (see my post on variable importances and shapley values).
- Questioned you on how I obtained the weights of the scoring system (see my future post on ordinal regression).
- Made you, algorithm builders10 and machine learning practitioners, think about how you could better pick rule-based vs additive explanation methods (more on this topic later).
Thank you Paul Narchi and Chieh-An Lin for your proofreading and thoughtful comments. Thank you Dr. Nelson Kanyep for the details regarding the widespread use of some scoring systems in cardiology hospital units. Thank you Bastien Guerry for your explanations regarding COVID-TELE members and initiatives (and the quality of the documentation of the Covid-19 orientation algorithm).
Translated from French: “The Department of Health encourages doctors and nurses to equip themselves with teleconsultation and telemonitoring solutions for the management of patients with Covid-19.” ↩︎
The mission of the COVID-TELE taskforce is to collect all pseudonymized questionnaire answers in order to carry out real-time epidemiological surveillance, identify the most relevant symptoms and potential risk factors for treatment failure and improve patient care pathways. ↩︎
AP-HP is a world-renowned university hospital trust, composed by 39 hospitals. ↩︎
Institut Pasteur is a private non-profit research foundation on diseases. ↩︎
Ustun, B. and Rudin, C., 2016. Supersparse linear integer models for optimized medical scoring systems. Machine Learning, 102(3), pp.349-391. ↩︎
Lip, G.Y., Nieuwlaat, R., Pisters, R., Lane, D.A. and Crijns, H.J., 2010. Refining clinical risk stratification for predicting stroke and thromboembolism in atrial fibrillation using a novel risk factor-based approach: the euro heart survey on atrial fibrillation. Chest, 137(2), pp.263-272. ↩︎
All interpretable models mentioned by Molnar falls into scoring system family (Linear Regression, Logistic Regression, Generalized Linear Models and Generalized Additive Models), tree family (Decision Tree, Decision Rules) or both (RuleFit). Only two notable exceptions are mentioned. On the one hand, the Naive Bayes Classifier, that can still be seen as a GLM under additional assumptions. On the other hand, k-Nearest-Neighbors, which is not comparable because of its different nature: an instance-based learning model. ↩︎
Technically, anosmia is the loss of the ability to detect one or more smells whereas dysgeusia is a distortion of the sense of taste. ↩︎
By relative altitude, I mean that a factor can appear above (parent to child, grand-child), below (the reverse), or there can be no clear relationship. ↩︎
This is not a negative feedback to the French official algorithm, whose documentation is I think well made and pretty clear. ↩︎