Skip to main content
  • Submit
  • About
    • Editorial Board
    • PNAS Staff
    • FAQ
    • Accessibility Statement
    • Rights and Permissions
    • Site Map
  • Contact
  • Journal Club
  • Subscribe
    • Subscription Rates
    • Subscriptions FAQ
    • Open Access
    • Recommend PNAS to Your Librarian
  • Log in
  • My Cart

Main menu

  • Home
  • Articles
    • Current
    • Special Feature Articles - Most Recent
    • Special Features
    • Colloquia
    • Collected Articles
    • PNAS Classics
    • List of Issues
  • Front Matter
  • News
    • For the Press
    • This Week In PNAS
    • PNAS in the News
  • Podcasts
  • Authors
    • Information for Authors
    • Editorial and Journal Policies
    • Submission Procedures
    • Fees and Licenses
  • Submit
  • About
    • Editorial Board
    • PNAS Staff
    • FAQ
    • Accessibility Statement
    • Rights and Permissions
    • Site Map
  • Contact
  • Journal Club
  • Subscribe
    • Subscription Rates
    • Subscriptions FAQ
    • Open Access
    • Recommend PNAS to Your Librarian

User menu

  • Log in
  • My Cart

Search

  • Advanced search
Home
Home

Advanced Search

  • Home
  • Articles
    • Current
    • Special Feature Articles - Most Recent
    • Special Features
    • Colloquia
    • Collected Articles
    • PNAS Classics
    • List of Issues
  • Front Matter
  • News
    • For the Press
    • This Week In PNAS
    • PNAS in the News
  • Podcasts
  • Authors
    • Information for Authors
    • Editorial and Journal Policies
    • Submission Procedures
    • Fees and Licenses

New Research In

Physical Sciences

Featured Portals

  • Physics
  • Chemistry
  • Sustainability Science

Articles by Topic

  • Applied Mathematics
  • Applied Physical Sciences
  • Astronomy
  • Computer Sciences
  • Earth, Atmospheric, and Planetary Sciences
  • Engineering
  • Environmental Sciences
  • Mathematics
  • Statistics

Social Sciences

Featured Portals

  • Anthropology
  • Sustainability Science

Articles by Topic

  • Economic Sciences
  • Environmental Sciences
  • Political Sciences
  • Psychological and Cognitive Sciences
  • Social Sciences

Biological Sciences

Featured Portals

  • Sustainability Science

Articles by Topic

  • Agricultural Sciences
  • Anthropology
  • Applied Biological Sciences
  • Biochemistry
  • Biophysics and Computational Biology
  • Cell Biology
  • Developmental Biology
  • Ecology
  • Environmental Sciences
  • Evolution
  • Genetics
  • Immunology and Inflammation
  • Medical Sciences
  • Microbiology
  • Neuroscience
  • Pharmacology
  • Physiology
  • Plant Biology
  • Population Biology
  • Psychological and Cognitive Sciences
  • Sustainability Science
  • Systems Biology
Research Article

Building more accurate decision trees with the additive tree

View ORCID ProfileJosé Marcio Luna, View ORCID ProfileEfstathios D. Gennatas, Lyle H. Ungar, Eric Eaton, Eric S. Diffenderfer, Shane T. Jensen, Charles B. Simone II, Jerome H. Friedman, Timothy D. Solberg, and Gilmer Valdes
PNAS October 1, 2019 116 (40) 19887-19893; first published September 16, 2019; https://doi.org/10.1073/pnas.1816748116
José Marcio Luna
aDepartment of Radiation Oncology, University of Pennsylvania, Philadelphia, PA 19104;
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for José Marcio Luna
  • For correspondence: gilmer.valdes@ucsf.edu jose.luna@pennmedicine.upenn.edu jhf@stanford.edu
Efstathios D. Gennatas
bDepartment of Radiation Oncology, University of California, San Francisco, CA 94115;
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Efstathios D. Gennatas
Lyle H. Ungar
cDepartment of Computing and Information Science, University of Pennsylvania, Philadelphia, PA 19104;
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Eric Eaton
cDepartment of Computing and Information Science, University of Pennsylvania, Philadelphia, PA 19104;
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Eric S. Diffenderfer
aDepartment of Radiation Oncology, University of Pennsylvania, Philadelphia, PA 19104;
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Shane T. Jensen
dDepartment of Statistics, University of Pennsylvania, Philadelphia, PA 19104;
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Charles B. Simone II
eDepartment of Radiation Oncology, New York Proton Center, New York, NY 10035;
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Jerome H. Friedman
fDepartment of Statistics, Stanford University, Stanford, CA 94305
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • For correspondence: gilmer.valdes@ucsf.edu jose.luna@pennmedicine.upenn.edu jhf@stanford.edu
Timothy D. Solberg
bDepartment of Radiation Oncology, University of California, San Francisco, CA 94115;
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Gilmer Valdes
bDepartment of Radiation Oncology, University of California, San Francisco, CA 94115;
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • For correspondence: gilmer.valdes@ucsf.edu jose.luna@pennmedicine.upenn.edu jhf@stanford.edu
  1. Contributed by Jerome H. Friedman, August 8, 2019 (sent for review October 10, 2018; reviewed by Adele Cutler and Giles Hooker)

This article has a Letter. Please see:

  • The phylogenetic tree of boosting has a bushy carriage but a single trunk - April 07, 2020

See related content:

  • Reply to Nock and Nielsen: On the work of Nock and Nielsen and its relationship to the additive tree
    - Apr 07, 2020
  • Article
  • Figures & SI
  • Info & Metrics
  • PDF
Loading

Significance

As machine learning applications expand to high-stakes areas such as criminal justice, finance, and medicine, legitimate concerns emerge about high-impact effects of individual mispredictions on people’s lives. As a result, there has been increasing interest in understanding general machine learning models to overcome possible serious risks. Current decision trees, such as Classification and Regression Trees (CART), have played a predominant role in fields such as medicine, due to their simplicity and intuitive interpretation. However, such trees suffer from intrinsic limitations in predictive power. We developed the additive tree, a theoretical approach to generate a more accurate and interpretable decision tree, which reveals connections between CART and gradient boosting. The additive tree exhibits superior predictive performance to CART, as validated on 83 classification tasks.

Abstract

The expansion of machine learning to high-stakes application domains such as medicine, finance, and criminal justice, where making informed decisions requires clear understanding of the model, has increased the interest in interpretable machine learning. The widely used Classification and Regression Trees (CART) have played a major role in health sciences, due to their simple and intuitive explanation of predictions. Ensemble methods like gradient boosting can improve the accuracy of decision trees, but at the expense of the interpretability of the generated model. Additive models, such as those produced by gradient boosting, and full interaction models, such as CART, have been investigated largely in isolation. We show that these models exist along a spectrum, revealing previously unseen connections between these approaches. This paper introduces a rigorous formalization for the additive tree, an empirically validated learning technique for creating a single decision tree, and shows that this method can produce models equivalent to CART or gradient boosted stumps at the extremes by varying a single parameter. Although the additive tree is designed primarily to provide both the model interpretability and predictive performance needed for high-stakes applications like medicine, it also can produce decision trees represented by hybrid models between CART and boosted stumps that can outperform either of these approaches.

  • additive tree
  • decision tree
  • interpretable machine learning
  • CART
  • gradient boosting

The increasing application of machine learning to high-stakes domains such as criminal justice (1, 2) and medicine (3⇓–5) has led to a surge of interest in understanding the generated models. Mispredictions in these domains can incur serious risks in scenarios such as technical debt (6, 7), nondiscrimination (8), medical outcomes (9, 10), and, recently, the right to explanation (11), thus motivating the need for users to be able to examine why the model made a particular prediction. Despite recent efforts to formalize the concept of interpretability in machine learning, there is considerable disagreement on what such a concept means and how to measure it (12, 13). Lately, 2 broad categories of approaches for interpretability have been proposed (14), namely, post hoc simple explanations for potentially complex models (e.g., visual and textual explanations) (15, 16) and intuitively simple models (e.g., additive models, decision trees). This paper focuses on intuitively simple models, specifically decision trees, which are used widely in fields such as healthcare, yet have lower predictive power than more sophisticated models.

Classification and Regression Tree (CART) analysis (17) is a well-established statistical learning technique that has been adopted by numerous fields for its model interpretability, scalability to large datasets, and connection to rule-based decision-making (18). Specifically, in fields like medicine (19⇓–21), the aforementioned traits are considered a requirement for clinical decision support systems. CART builds a model by recursively partitioning the instance space, and labeling each partition with either a predicted category (in the case of classification) or real value (in the case of regression). Despite widespread use, CART models are often less accurate than other statistical learning models, such as kernel methods and ensemble techniques (22). Among the latter, boosting methods were developed as a means to train an ensemble that iteratively combines multiple weak learners (often CART models) into a high-performance predictive model, albeit with an increase of the number of nodes, therefore, harming model interpretability. In particular, gradient boosting methods (23) iteratively optimize an ensemble’s prediction to increasingly match the labeled training data. In addition, some ad hoc approaches (24, 25) have been successful at improving the accuracy of decision trees, but at the expense of altering their topology, therefore affecting their capacity of explanation.

Decision tree learning and gradient boosting have been connected primarily through CART models used as the weak learners in boosting. However, a rigorous analysis in ref. 26 proves that decision tree algorithms, specifically CART and C4.5 (27), are, in fact, boosting algorithms. Based on this approach, AdaTree, a tree-growing method based on AdaBoost (28), was proposed in ref. 29. A sequence of weak classifiers on each branch of the decision tree was trained recursively using AdaBoost; therefore, rendering a decision tree where each branch conforms to a strong classifier. The weak classifiers at each internal node were implemented either as linear classifiers composed with a sigmoid or as Haar-type filters with threshold functions at their output. Another approach is the Probabilistic Boosting-Tree (PBT) algorithm introduced in ref. 30, which also uses AdaBoost to build decision trees. PBT trains ensembles of strong classifiers on each branch for image classification. The strong classifiers are based on up to third-order moment histogram features extracted from Gabor and intensity filter responses. PBT also carries out a divide-and-conquer strategy in order to perform data augmentation to estimate the posterior probability of each class. Recently, MediBoost, a tree-growing method based on the more general gradient boosting approach, was proposed in ref. 31. In contrast to AdaTree and PBT, MediBoost emphasizes interpretability, since it was conceived to support medical applications while exploiting the accuracy of boosting. It takes advantage of the shrinkage factor inherent to gradient boosting, and introduces an acceleration parameter that controls the observation weights during training to leverage predictive performance. Although the presented experimental results showed that MediBoost exhibits better predictive performance than CART, no theoretical analysis was provided.

In this paper, we propose a rigorous mathematical approach for building single decision trees that are more accurate than traditional CART trees. In this context, we use the total number of decision nodes (NDN) as a quantification of interpretability, due to their direct association with the number of decision rules of the model and the depth of the tree. This paper addresses a number of fundamental shortcomings: 1) We introduce the additive tree (AddTree) as a mechanism for creating decision trees. Each path from the root node to a leaf represents the outcome of a gradient boosted ensemble for a partition of the instance space. 2) We theoretically prove that AddTree generates a continuum of single-tree models between CART and gradient boosted stumps (GBS), controlled via a single tunable parameter of the algorithm. In effect, AddTree bridges between CART and gradient boosting, identifying previously unknown connections between additive and full interaction models. 3) We provide empirical results showing that this hybrid combination of CART and GBS outperforms CART in terms of accuracy and interpretability. Our experiments also provide further insight into the continuum of models revealed by AddTree.

Background on CART and Boosting

Assume we are given a training dataset X,y=(xi,yi)i=1N, where each d-dimensional data instance xi∈X⊆X has a corresponding label yi∈Y, drawn independently and from an identical and unknown distribution D. In a binary classification setting, Y={±1}; in regression, Y=R. The goal is to learn a function F:X↦Y that will perform well in predicting the label on new examples drawn from D. CART analysis recursively partitions X, with F assigning a single label in Y to each terminal node; in this manner, there is full interaction. Different branches of the tree are trained with disjoint subsets of the data, as shown in Fig. 1.

Fig. 1.
  • Download figure
  • Open in new tab
  • Download powerpoint
Fig. 1.

A depiction of the continuum relating CART, GBS, and our AddTree. Each algorithm has been given the same 4 training instances (blue and red symbols); the symbol’s size depicts its weight when used to train the adjacent node.

In contrast, boosting iteratively trains an ensemble of T weak learners {ht:X↦Y}t=1T, such that the resulting model F(x) is a weighted sum of the weak learners’ predictions: F(x)=∑t=1Tρtht(x) with ρ∈RT.* Each boosted weak learner is trained with a different weighting of the entire dataset, unlike CART, repeatedly emphasizing mispredicted instances at every round (Fig. 1). GBS or simple regression creates a pure additive model in which each new ensemble member reduces the residual of previous members (32). Interaction terms can be included in the ensemble by using more complex learners, such as deeper trees.

Classifier ensembles with decision stumps as the weak learners, ht(x), can be trivially rewritten (31) as a complete binary tree of depth T, where the decision made at each internal node at depth t − 1 is given by ht(x), and the prediction at each leaf is given by F(x). Intuitively, each path through the tree represents the same ensemble, but one that tracks the unique combination of predictions made by each member.

The Additive Tree

This interpretation of boosting lends itself to the creation of a tree-structured ensemble learner that bridges CART and GBS: the AddTree. At each node, a weak learner is found using gradient boosting on the entire dataset. Rather than using only the data in the current node to decide the next split, the algorithm allows the remaining data to also influence this split, albeit with a potentially differing weight. Critically, this process results in an interpretable model that is simply a decision tree of the weak learner’s predictions, but with higher predictive power due to its growth via boosting and the resulting effect of regularizing the tree splits. This process is carried out recursively until the depth limit is reached. By varying the weighting scheme, this approach interpolates between CART trees at one extreme and boosted regression stumps at the other. Notably, the resulting tuning of the weighting scheme allows AddTree to improve in predictive performance over CART and GBS, while allowing direct interpretation.

AddTree maintains a perfect binary tree of depth n, with 2n−1 internal nodes, each of which corresponds to a weak learner. The kth weak learner, denoted hk,l, along the path from the root node to a leaf prediction node l induces 2 disjoint partitions of the feature space X, namely Pk,l and its complement Pk,lc. Let {R1,l,…,Rn,l} be the corresponding set of partitions along that path to l, where the kth term in the tuple, namely Rk,l, can be either Pk,l or Pk,lc. We can then define the partition of X associated with the leaf node l as the intersection of all of the set of partitions along the path from the root node to the leaf node l; that is, Rn,l=⋂k=1nRk,l. AddTree predicts a label for each x∈Rn,l via the ensemble consisting of all weak learners along the path to l so that the resulting model is given by F(x∈Rn,l)=∑k=1nρk,lhk,l(x). To focus each branch of the tree on corresponding instances, thereby constructing diverse ensembles, AddTree maintains a set of weights denoted wn,l∈RN over all training data. Such weights are associated with training a weak learner hn,l at the leaf node l at depth n.

We train the tree as follows. At each boosting step, we have a current estimate of the function Fn−1,l(x) at the leaf l of a perfect binary tree of depth n−1. We seek to improve this estimate by replacing each of the 2n−1 leaf prediction nodes with additional weak learners {hn,l}l=12n−1 with corresponding scaling factor {ρn,l}l=12n−1, growing the tree by one level. This yields a revised estimate of the function at each terminal node l equivalent to 2n−1 separate functions,Fn,l(x)=Fn−1,l(x)+ρn,lhn,l(x) ∀l∈{1,…,2n−1},one for each leaf’s corresponding ensemble. The goal is to minimize the loss over the dataLn(X,y)=∑l=12n−1∑i=1Nwn,l,i×ℓyi,Fn−1,l(xi)+ρn,lhn,l(xi),[1]by choosing the appropriate values of ρn,l and hn,l at each leaf. Our proposed approach aims at breaking the connection between the global loss in Eq. 1 and the loss used for each weak learner by independently minimizing the inner summation for each l∈{1,…,2n−1}; that is,Ln,l(X,y)=∑i=1Nwn,l,i ℓyi,Fn−1,l(xi)+ρn,lhn,l(xi).[2]Eq. 2 can be solved efficiently via gradient boosting of each Ln,l(⋅) in a level-wise manner through the tree.

Next, we focus on deriving AddTree where the weak learners are binary regression trees with least squares as the loss function ℓ(⋅). Following ref. 23, we first estimate the negative unconstrained gradients at each data instance ỹi=−∂ℓ(yi,Fn−1,l(xi))∂Fn−1,l(xi)i=1N, which are equivalent to the residuals [i.e., ỹi=yi−Fn−1,l(xi)]. Then, we can determine the optimal parameters for Ln,l(⋅) by solvingargminρn,l, hn,l∑i=1Nwn,l,i ỹi−ρn,l hn,l(xi)2.[3]Gradient boosting solves Eq. 3 by first fitting decision stumps hn,l to the residuals (X,ỹ),hn,l(xi)=argminh∑i=1Nwn,l,i(ỹi−h)2,then solving for the optimal scaling factor ρn,l, whose main function is to scale the weak learners obtained from pseudoresiduals to fit the original data,ρn,l=argminρ∑i=1Nwn,l,iyi−Fn−1,l(xi)−ρhn,l(xi)2,or, equivalently, solving for the simple regressor γn,l(xi)=ρn,lhn,l(xi) as follows:γn,l(xi)=argminγ∑i=1Nwn,l,iyi−Fn−1,l(xi)−γ2.If all instance weights wn,l remain constant, this approach would build a perfect binary tree of depth T, where each path from the root to a leaf represents the same ensemble, and so would be exactly equivalent to gradient boosting of (X,y). Instead, AddTree constructs diverse ensembles by focusing each branch of the tree on corresponding instances that belong to that partition. To accomplish this, the weights are updated separately for each of hn,l’s 2 children: Instances in the corresponding partition have their weights multiplied by a factor of 1+λ, and instances outside the partition have their weights multiplied by a factor λ∈[0,∞) denoted “interpretability factor.” The update rule for the weight wn,l(xi) of xi for Rn,l∈{Pn,l,Pn,lc} is given bywn,l(xi)=wn−1,l(xi)λ+1[xi∈Rn,l],[4]where 1[p] is a binary indicator function that is 1 if predicate p is true, and 0 otherwise, and the initial weights w0 are typically uniformly distributed. Therefore, we can also think of AddTree as growing a collection of interrelated ensembles, where each is tuned to yield optimal predictions for one partition of the instance space. The complete AddTree approach is detailed as Algorithm 1. Notice that a learning rate or shrinkage parameter ν is in place in order to apply regularization by shrinkage in step 8 of the algorithm. The learning rate balances the contributions of each node to the running function estimate of its ensemble. Larger learning rates allow each weak learner to contribute more to the function estimate, resulting in shorter (and consequently more interpretable trees); smaller learning rates slow the running function estimate to yield larger but potentially more accurate trees as shown in SI Appendix, Figs. S5 and S6.

Results

We provide the following results: 1) a theoretical analysis establishing the connections between CART, GBS, and AddTree; 2) empirical results demonstrating that AddTree yields a decision tree that outperforms CART and performs equivalently to GBS; and 3) an empirical analysis of the behavior of AddTree under various parameter settings.

View this table:
  • View inline
  • View popup
Algorithm 1:

AddTree(X,y,wn,l,λ,n,T,Rn,l,Fn−1,l,ν)

Theoretical Analysis.

This section proves that AddTree is equivalent to CART when λ=0 and converges to GBS as λ→∞, thus establishing a continuum between CART and GBS. We include proof sketches for the 3 lemmas used to prove our main result in Theorem 1; full proofs are included in SI Appendix.

Lemma 1 The weight of xi at leaf l∈{1,…,2n} at depth n of the tree (∀n=1,2,…) is given bywn,l(xi)=w0(xi)λ+1∑k=1n 1[xi∈Rk,l]λ∑k=1n 1[xi∈Rk,lc],[5]where {R1,l,…,Rn,l} is the sequence of partitions along the path from the root to l.

Proof Sketch: Shown by induction based on Eq.4. □

Lemma 2 The optimal simple regressor γn,l*(x) that minimizes the squared error loss function Eq.2 at leaf l∈{1,…,2n} at depth n of the tree is given byγn,l*(x)=∑i:xi∈Rn,lwn,l(xi)yi−Fn−1,l(xi)∑i:xi∈Rn,lwn,l(xi) if xi∈Rn,l ∑i:xi∈Rn,lcwn,l(xi)(yi−Fn−1,l(xi))∑i:xi∈Rn,lcwn,l(xi) otherwise.[6]Proof Sketch: For a given region Rn,l⊂X at the depth n of the tree, the simple regressor has the formγn,l(x)=γn1,l if x∈Rn,l γn2,l otherwise,[7]with constants γn1,l,γn2,l∈R. We take the derivative of the loss function (Eq.2) in each of the 2 regions Rn,l and Rn,lc, and solve for where the derivative is equal to zero, obtaining Eq.6. □

Lemma 3 The AddTree update rule is given by Fn,l(x)=Fn−1,l(x)+γn,l(x). If γn,l(x) is defined asγn,l(x)=∑i:xi∈Rn,lwn,l(xi)(yi−Fn−1,l(xi))∑i:xi∈Rn,lwn,l(xi),with constant F0(x)=ȳ0, then Fn,l(x)=ȳn,l is constant, withȳn,l=∑i:xi∈Rn,lwn,l(xi)yi∑i:xi∈Rn,lwn,l(xi), ∀n=1,2,…  .[8]Proof Sketch: The proof is by induction on n, building upon Eq.7. Each γn,l(xi) is constant and so ȳn,l is constant; therefore, the lemma holds under the given update rule. □

Building upon these 3 lemmas, our main theoretical result is presented in the following theorem, and explained in the subsequent 2 remarks:

Theorem 1 Given the AddTree optimal simple regressor (Eq.6) that minimizes the loss (Eq.2), the following limits hold for the parameter λ of the weight update rule (Eq.4):limλ→0γn,l*(x)=∑i:xi∈Rn,lw0(xi)yi∑i:xi∈Rn,lw0(xi)−ȳn−1,l,[9]limλ→∞γn,l*(x)=∑xi∈Rn,lw0(xi)(yi−Fn−1,l(xi))∑xi∈Rn,lw0(xi) if xi∈Rn,l ∑xi∈Rn,lcw0(xi)(yi−Fn−1,l(xi))∑xi∈Rn,lcw0(xi) otherwise,[10]where w0(xi) is the initial weight for the ith training sample.

Proof Eq.9 follows from applying Eq.5 in Lemma 1 to Eq.6 in Lemma 2 and taking the limit when λ→0, regarding the result Fn(x)=ȳn,l, with constant ȳn,l defined by Eq.8. Similarly, Eq.10 follows from applying Eq.5 in Lemma 1 to Eq.6 in Lemma 2 and taking the limit when λ→∞. □

Remark 1 The simple regressor given by Eq.9 calculates a weighted average of the difference between the random output variables yi and the previous estimate ȳn−1,l of the function F*(x) in the disjoint regions defined by Rn,l. This formally defines the behavior of the CART algorithm.

Remark 2 The simple regressor given by Eq.10 calculates a weighted average of the difference between the random output variables yi and the previous estimate of F*(x) given by the piece-wise constant function Fn−1,l(xi). Fn−1,l(xi) is defined in the overlapping region determined by the latest stump, namely Rn,l. This defines the behavior of the GBS algorithm.

Based on Remarks 1 and 2, we can conclude that AddTree is equivalent to CART when λ→0 and GBS as λ→∞. In addition to identifying connections between these 2 algorithms, AddTree provides the flexibility to train a model that lies between CART and GBS, with potentially improved performance over either, as we show empirically in the next section. Finally, notice that, although the introduction of the λ parameter might be reminiscent of decision trees trained with fuzzy logic (33), in the fuzzy logic approach, the continuity is between a simple constant model and CART, while AddTree produces a continuous mapping between CART and GBS.

Empirical Evaluation of Predictive Performance.

In this section, the predictive performance based on balanced accuracy (BACC) (34) of a set of optimized machine learning models, namely, AddTree, CART, GBS, Random Forest (RF) and Gradient Boosting Machines (GBM), was calculated in 83 classification tasks selected from the Penn Machine Learning Benchmark (PMLB) (35). The frequency of the classification tasks for which one of the aforementioned models outperformed the BACC of all of the others is illustrated in Fig. 2, where ties are not included, for the sake of clarity. AddTree outperformed CART in 55 (66.3%) classification tasks with high statistical significance (P=3.08×10−7), while AddTree was outperformed by GBS in 46 (55.4%) tasks, but with no statistical significance (P=0.06). In fact, AddTree exhibits the highest mean BACC (improvement rate 1.4%) and F1 score (improvement rate 2.9%) across the PMLB tasks, as shown in Table 1. Moreover, RF and GBM displayed superior performance to GBS and to the single trees, that is, CART and AddTree. These results are also consistent with the F1-score comparison shown in SI Appendix, Fig. S1.

Fig. 2.
  • Download figure
  • Open in new tab
  • Download powerpoint
Fig. 2.

Frequency that an algorithm (rows) has higher average BACC compared to each one of the remaining algorithms (columns) under study across 83 binary classification tasks. Each of the learning algorithms has been tuned to maximize BACC. Ties have been ruled out for the sake of clarity.

View this table:
  • View inline
  • View popup
Table 1.

Performance indexes for all of the learning algorithms under comparison across 83 PMLB classification tasks

In Table 1, the average BACC and F1 score of AddTree overcomes GBS, even exhibiting less SD, despite the fact that GBS overcomes AddTree in the counts of classification tasks with better BACC as illustrated in Fig. 2. Furthermore, comparison of the BACCs of AddTree and CART is illustrated on the bar chart in Fig. 3, where the 55 tasks where AddTree outperforms CART are indicated by the bars with ΔBACC>0. A similar result is shown in the scatter plot in SI Appendix, Fig. S2. Also, notice the meaningful reduction of NDNs of AddTree with respect to GBS. As expected, the ensemble methods RF and GBM are more accurate than the single-tree approaches, but at the expense of having larger NDNs.

Fig. 3.
  • Download figure
  • Open in new tab
  • Download powerpoint
Fig. 3.

Bar chart showing the difference ΔBACC=BACCAddTree−BACCCART. AddTree exhibits significantly better BACC than CART (P=3.08×10−7) in 55 tasks (ΔBACC>0) out of the 83 PMLB classification tasks. The one outlier task in favor of CART consists of a synthetic parity problem, which is ill-suited to be captured by any soft regression-like method such as AddTree, and it is better solved by the hard binary logic structure of CART (more details in SI Appendix).

Number of Decision Nodes.

As shown in Table 1, the average NDN used to carry out the classification tasks was 55.4 (SD=0.18) for CART trees and 50.7 (SD=0.17) for AddTree trees, and this reduction was statistically significant (P=0.001). Contrasting with the previous NDN values for AddTree and CART, the average number of stumps assembled by GBS to carry out the classification tasks was 4,692.0 (SD = 1,614.1), surpassed only by the large tree ensembles RF with 412,458.4 (SD = 752,654.8) and GBM with 316,728.8 (SD = 700,981.6). In contrast to GBS, AddTree exploits model interactions, which may explain the equivalence in performance with a smaller number of nodes.

Empirical Variance of AddTree.

By using training data from PMLB, the variance of AddTree as a function of the interpretability parameter λ is estimated and shown in Fig. 4. As observed in the plot, as λ increases, AddTree reduces variance with respect to CART (λ=0) without compromising bias since, as λ→∞, it converges to GBS, which is a consistent estimator. Therefore, the reduction of variance without compromising bias explains the superior performance of AddTree over CART.

Fig. 4.
  • Download figure
  • Open in new tab
  • Download powerpoint
Fig. 4.

Estimate of the variance of AddTree as a function of the interpretability parameter λ. Notice that AddTree is able to improve the variance with respect to CART. The error bars represent 1 SE.

Discussion

The previous section provided a formal proof that AddTree establishes a previously unknown connection between CART and GBS. This connection motivates the hypothesis that the predictive performance of AddTree may surpass the performances of CART and GBS, while improving the limited interpretability that GBS provides. Preliminary results regarding performance of AddTree were previously presented in ref. 36, but using small datasets. The superior statistical power of PMLB allows a more rigorous assessment of the performance and interpretability of AddTree. Therefore, a set of experiments was carried out to validate this hypothesis.

The results presented in Fig. 2 validate the ability of AddTree to surpass the predictive performance of CART over a wide variety of classification tasks. This is further corroborated by SI Appendix, Fig. S3, which shows the distribution of the difference ΔBACC=BACCAddTree−BACCCART, in which the mean reflects a bias toward positive values of ΔBACC; therefore, EBACCAddTree>EBACCCART.† The superior performance of AddTree over CART has been shown to be statistically significant as well. Moreover, AddTree showed a significant reduction on the average NDN with respect to CART. The difference between the NDNs of AddTree and CART, that is, ΔNDN=NDNAddTree−NDNCART, has the distribution shown in SI Appendix, Fig. S4, whose mean value is −4.66 (SD=48.50). Therefore, we conclude that AddTree can generate better predictions than CART while building smaller trees.

Despite the apparent performance advantage of GBS over AddTree, the results are not statistically significant. Furthermore, GBS had an average NDN of 4,692.0 (SD = 1,614.1) over the classification tasks, whereas AddTree had an average NDN of 50.7 (SD = 0.17); however, GBS can be indirectly interpreted through partial dependence plots (23). Notice that limiting the GBS ensemble size to match the maximum depth of CART and AddTree would have provided a better comparison between GBS and AddTree in terms of interpretability based on NDNs. However, the number of stumps of GBS was adaptively added, and the hyperparameters were optimally tuned, to provide the best performance possible, and yet the predictive performance of GBS does not improve over AddTree significantly. Through these results, we conclude that AddTree provides an alternative paradigm for building decision trees with a predictive performance that surpasses the performance of CART and which is as accurate as optimized GBS. AddTree improvement in predictive performance over CART is a result not only of its connection to boosting, but of the effect of regularizing the tree splits through the passing of all of the training data to the descendant nodes and the variation of the weighting scheme. In fact, as illustrated in Fig. 4, AddTree reduces variance as λ increases without compromising bias since, as λ→∞, it converges to GBS, a boosting-based algorithm that primarily reduces bias.

The tree ensembles RF and GBM outperformed CART, AddTree, and GBS with statistical significance, at the expense of obtaining less-interpretable models, by using the total NDN as proxies to measure interpretability. However, even in the scenarios where understanding the model is not of concern, the superior predictive performance of AddTree provides possibilities for improving over the predictive performance of RF and GBM, namely, 1) the replacement of the stumps (γn,l(x), Eq. 6) by decision trees such as CART trees, 2) the implementation of bagged AddTree, and 3) the application of gradient boosting to AddTree. Furthermore, 4) formal extensions of AddTree to other losses, and 5) the analysis of connections of AddTree to the recently introduced Generalized Random Forest (37) are in our future research agenda.

Materials and Methods

All experiments were carried out using the rtemis machine learning library (38) written in the R language (39). The statistical significance of all of the predictive performance measurements was evaluated using the Wilcoxon rank sum test (40).

Predictive Performance Evaluation and NDN.

The PMLB repository (35) contains a selection of curated datasets, accessible through GitHub, for evaluating supervised classification. PMLB includes datasets from commonly used machine learning benchmarks such as the University of California, Irvine machine learning repository (41), the metalearning benchmark (42), and the Knowledge Extraction based on Evolutionary Learning tool (43). It includes real and synthetic datasets in order to assess and compare the performance of different learning algorithms. Out of the 95 available PMLB datasets, 11 datasets with less than 100 instances were ruled out to avoid overfitting in the analysis, and 1 dataset with 48,842 instances was discarded because of the extended execution times that the performance analyses demanded in this particular dataset. The remaining 83 datasets have a mean of 1,713.0 instances (SD = 2,900.3) with 36.3 features (SD = 111.4). The distributions of performance comparisons, including the tests for statistically significant differential performances, were based on viewing the 83 datasets as a random sample of potential classification tasks.

In this study, we evaluated the performance of AddTree against CART, GBS, RF, and GBM using BACC, a measurement commonly used to calculate performance in 2-class imbalanced domains (34). All experiments were performed using nested resampling. For each dataset, the hyperparameters of each algorithm were tuned using grid search to maximize BACC across 5 stratified subsamples (internal resampling). External resampling was performed using 25 stratified subsamples, fixed for each dataset to ensure a direct comparison across algorithms. In both internal and external resampling, the training set size was set to 75% of the available number of cases. The following hyperparameters were tuned: 1) The interpretability factor λ and the learning rate were tuned for AddTree, 2) the complexity parameter used in cost complexity pruning (44) was tuned for CART, 3) the learning rate and the maximum number of stumps in the ensemble were tuned for GBS, 4) the number of predictors randomly sampled as candidates at each split was tuned for RF, and 5) the maximum tree depth as well as the learning rate were tuned for GBM. Detailed definitions of the hyperparameters are provided in SI Appendix, Table S1, and the hyperparameter space is shown in SI Appendix, Table S2.

The AddTree approach grows trees where a minimum of one observation per child node is required to make a node internal; otherwise, such a node becomes a leaf node (min.membership parameter). A maximum tree depth of 30 is also in place (maxdepth parameter); however, this tree depth was not reached in any of the experiments, mostly due to the stopping effect of min.membership. Furthermore, the number of nodes in AddTree is reduced based on the elimination of impossible and redundant paths (i.e., paths where subsequent child branches give the same class in both outputs); however, no other statistical pruning method (e.g., minimum-error or small-tree based pruning) was carried out. In contrast, CART trees were grown and pruned using the rpart package (44). The CART tree growth is limited by a minimum number of 2 observations for a split to be attempted (minsplit parameter) and, same as in AddTree, a maxdepth of 30 for fair comparison. In addition, any branch whose complexity measure is less than a value denoted prune.cp is trimmed. The prune.cp is optimally tuned using grid search. CART also relies on a variation of the complexity parameter that indicates that any split which does not decrease the overall lack of fit based on the cross-validated classification error by a factor of 0.0001 is not attempted. The GBS and GBM ensembles are grown and pruned using the gbm package (45). Individual trees’ growth is limited by the minimum number of 1 observation in the terminal nodes (n.minobsinnode parameter), as well as the maxdepth parameter, which was also tuned using grid search. The number of GBS and GBM trees were tuned by early stopping based on cross-validated classification error. RF tree ensembles were grown and pruned using the ranger package (46). The trees were grown until the minimum number of 1 observation at the terminal nodes was achieved (nodesize parameter). Moreover, a default number of 1,000 ensembled trees is specified (ntree parameter) to try to ensure that every input row gets predicted at least a few times.

Empirical Variance of AddTree.

The variance of AddTree was calculated as indicated in ref. 47 using the twonorm dataset of PMLB. We calculated the percent of times a test prediction was different from the mode of all test set predictions across the bootstraps where the case was left out (not part of training set); 500 bootstraps were drawn from 1,000 randomly subsampled cases. AddTree was trained on each bootstrap for a set of λ values, namely, {0.00,0.11,0.25,0.43,0.67,1.00,1.50,2.33,4.00,9.00}.

Acknowledgments

This work was partially supported by an award granted by the Emerson Collective. Additionally, the research reported in this publication was supported by the National Institute of Biomedical Imaging and Bioengineering of the National Institutes of Health under Award K08EB026500. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.

Footnotes

  • ↵1J.M.L. and E.D.G. contributed equally to this work.

  • ↵2To whom correspondence may be addressed. Email: gilmer.valdes{at}ucsf.edu, jose.luna{at}pennmedicine.upenn.edu, or jhf{at}stanford.edu.
  • Author contributions: J.M.L., E.D.G., L.H.U., E.E., E.S.D., C.B.S., J.H.F., T.D.S., and G.V. designed research; J.M.L., E.D.G., and G.V. performed research; J.M.L., E.D.G., L.H.U., E.E., S.T.J., J.H.F., T.D.S., and G.V. contributed new reagents/analytic tools; J.M.L., E.D.G., L.H.U., E.E., E.S.D., S.T.J., C.B.S., T.D.S., and G.V. analyzed data; and J.M.L., E.D.G., L.H.U., E.E., E.S.D., S.T.J., C.B.S., J.H.F., T.D.S., and G.V. wrote the paper.

  • Reviewers: A.C., Utah State University; and G.H., Cornell University.

  • Conflict of interest statement: J.M.L., E.E., L.H.U., C.B.S., T.D.S., and G.V. have a patent titled “Systems and methods for generating improved decision trees,” pending status.

  • ↵*In classification, F gives the sign of the prediction. CART models are often used as the hts.

  • ↵†The symbol E⋅ denotes the expectation operator.

  • This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.1073/pnas.1816748116/-/DCSupplemental.

  • Copyright © 2019 the Author(s). Published by PNAS.

This open access article is distributed under Creative Commons Attribution-NonCommercial-NoDerivatives License 4.0 (CC BY-NC-ND).

View Abstract

References

  1. ↵
    1. S. Matwin,
    2. S. Yu,
    3. F. Farooq
    1. S. Corbett-Davies,
    2. E. Pierson,
    3. A. Feller,
    4. S. Goel,
    5. A. Huq
    , “Algorithmic decision making and the cost of fairness” in Proceedings of the ACM International Conference on Knowledge Discovery and Data Mining, S. Matwin, S. Yu, F. Farooq, Eds. (SIGKDD, 2017), pp. 797–806.
  2. ↵
    1. A. Caliskan,
    2. J. J. Bryson,
    3. A. Narayanan
    , Semantics derived automatically from language corpora contain human-like biases. Science 356, 183–186 (2017).
    OpenUrlAbstract/FREE Full Text
  3. ↵
    1. F. Cabitza,
    2. G. Banfi
    , Machine learning in laboratory medicine: Waiting for the flood? Clin. Chem. Lab. Med. 56, 516–524 (2018).
    OpenUrl
  4. ↵
    1. M. Khosrow-Pour,
    2. S. Clarke,
    3. M. E. Jennex,
    4. A. Becker,
    5. A.-V. Anttiroiko
    1. M. A.-M. Salem,
    2. A. Atef,
    3. A. Salah,
    4. M. Shams
    , “Recent survey on medical image segmentation” in Computer Vision: Concepts, Methodologies, Tools, and Applications, M. Khosrow-Pour, S. Clarke, M. E. Jennex, A. Becker, A.-V. Anttiroiko, Eds. (IGI Global, 2018), pp. 129–169.
  5. ↵
    1. P. Yadav,
    2. M. Steinbach,
    3. V. Kumar,
    4. G. Simon
    , Mining electronic health records (EHRs): A survey. ACM Comput. Surv. 50, 1–40 (2018).
    OpenUrl
  6. ↵
    1. C. Cortes,
    2. N. D. Lawrence,
    3. D. D. Lee,
    4. M. Sugiyama,
    5. R. Garnett
    1. D. Sculley et al.
    , “Hidden technical debt in machine learning systems” in Proceedings of International Conference on Neural Information Processing Systems, C. Cortes, N. D. Lawrence, D. D. Lee, M. Sugiyama, R. Garnett, Eds. (NeurlPS, 2015), pp. 2503–2511.
  7. ↵
    1. F. Louzada,
    2. A. Ara,
    3. G. B. Fernandes
    , Classification methods applied to credit scoring: Systematic review and overall comparison. Surv. Oper. Res. Manage. Sci. 21, 117–134 (2016).
    OpenUrl
  8. ↵
    1. J. Angwin,
    2. J. Larson,
    3. S. Mattu,
    4. L. Kirchner
    , Machine bias: There’s software used across the country to predict future criminals. And it’s biased against blacks. ProPublica. https://www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing. Accessed 1 October 2018.
  9. ↵
    1. A. G. Wilson,
    2. J. Yosinski,
    3. P. Simard,
    4. R. Caruana,
    5. W. Herlands
    1. Z. C. Lipton
    , “The doctor just won’t accept that!” in Proceedings of Symposium of Interpretable Machine Learning at the International Conference on Neural Information Processing Systems, A. G. Wilson, J. Yosinski, P. Simard, R. Caruana, W. Herlands, Eds. (NeurlPS, 2017), pp. 1–3.
  10. ↵
    1. L. Cao et al.
    1. R. Caruana et al.
    , “Intelligible models for healthcare: Predicting pneumonia risk and hospital 30-day readmission” in Proceedings of the ACM International Conference on Knowledge Discovery and Data Mining, L. Cao et al., Eds. (SIGKDD, 2015), pp. 1721–1730.
  11. ↵
    1. B. Kim,
    2. D. M. Malioutov,
    3. K. R. Varshney
    1. B. Goodman,
    2. S. Flaxman
    , “European Union regulations on algorithmic decision-making and a ‘right to explanation’” in Proceedings of Workshop on Human Interpretability in Machine Learning at the International Conference on Machine Learning, B. Kim, D. M. Malioutov, K. R. Varshney, Eds. (International Conference on Machine Learning, 2016), pp. 1–9.
  12. ↵
    1. Z. C. Lipton
    , The mythos of model interpretability. arXiv:1606.03490. Deposited 6 March 2017.
  13. ↵
    1. F. Doshi-Velez,
    2. B. Kim
    , Towards a rigorous science of interpretable machine learning. arXiv:1702.08608. Deposited 2 March 2017.
  14. ↵
    1. A. G. Wilson,
    2. J. Yosinski,
    3. P. Simard,
    4. R. Caruana,
    5. W. Herlands
    1. F. Poursabzi-Sangdeh,
    2. D. G. Goldstein,
    3. J. M. Hofman,
    4. J. Wortman Vaughan,
    5. H. Wallach
    , “Manipulating and measuring model interpretability” in Proceedings of Symposium of Interpretable Machine Learning at the International Conference on Neural Information Processing Systems, A. G. Wilson, J. Yosinski, P. Simard, R. Caruana, W. Herlands, Eds. (NeurlPS, 2017), pp. 1–14.
  15. ↵
    1. B. Krishnapuram et al.
    1. M. T. Ribeiro,
    2. S. Singh,
    3. C. Guestrin
    , “Why should I trust you?: Explaining the predictions of any classifier” in Proceedings of the ACM International Conference on Knowledge Discovery and Data Mining, B. Krishnapuram et al., Eds. (SIGKDD, 2016), pp. 1135–1144.
  16. ↵
    1. B. Leibe,
    2. J. Matas,
    3. N. Sebe,
    4. M. Welling
    1. L. A. Hendricks et al.
    , “Generating visual explanations” in European Conference On Computer Vision, B. Leibe, J. Matas, N. Sebe, M. Welling, Eds. (Springer, 2016), pp. 3–19.
  17. ↵
    1. L. Breiman,
    2. J. H. Friedman,
    3. R. A. Olshen,
    4. C. J. Stone
    , Classification and Regression Trees (Wadsworth & Brooks, Monterey, CA, 1984).
  18. ↵
    1. W.-Y. Loh
    , Fifty years of classification and regression trees. Int. Stat. Rev. 82, 329–348 (2014).
    OpenUrlCrossRef
  19. ↵
    1. S. R. Safavian,
    2. D. Landgrebe
    , A survey of decision tree classifier methodology. IEEE Trans. Sys. Man Cybernetics 21, 660–674 (1991).
    OpenUrl
  20. ↵
    1. C. Cortes,
    2. N. D. Lawrence,
    3. D. D. Lee,
    4. M. Sugiyama,
    5. R. Garnett
    1. B. Kim,
    2. J. A. Shah,
    3. F. Doshi-Velez
    , “Mind the gap: A generative approach to interpretable feature selection and extraction” in Proceedings of International Conference on Neural Information Processing Systems, C. Cortes, N. D. Lawrence, D. D. Lee, M. Sugiyama, R. Garnett, Eds. (NeurlPS, 2015), pp. 2260–2268.
  21. ↵
    1. J. M. Luna et al.
    , Predicting radiation pneumonitis in locally advanced stage II–III non-small cell lung cancer using machine learning. Radiother. Oncol. 133, 106–112 (2019).
    OpenUrl
  22. ↵
    1. W. Cohen,
    2. A. Moore
    1. R. Caruana,
    2. A. Niculescu-Mizil
    , “An empirical comparison of supervised learning algorithms” in Proceedings of the International Conference on Machine Learning, W. Cohen, A. Moore, Eds. (Association for Computing Machinery, 2006), pp. 161–168.
  23. ↵
    1. J. H. Friedman
    , Greedy function approximation: A gradient boosting machine. Ann. Stat. 29, 1189–1232 (2001).
    OpenUrlCrossRef
  24. ↵
    1. W.-Y. Loh
    , Regression trees with unbiased variable selection and interaction detection. Stat. Sin. 12, 361–386 (2002).
    OpenUrl
  25. ↵
    1. W.-Y. Loh
    , Improving the precision of classification trees. Ann. Appl. Stat. 3, 1710–1737 (2009).
    OpenUrlCrossRef
  26. ↵
    1. M. Kearns,
    2. Y. Mansour
    , On the boosting ability of top-down decision tree learning algorithms. J. Comput. Syst. Sci. 58, 109–128 (1999).
    OpenUrl
  27. ↵
    1. J. R. Quinlan
    , C4.5: Programs for Machine Learning (Elsevier, 2014).
  28. ↵
    1. Y. Freund,
    2. R. E. Schapire
    , A decision-theoretic generalization of on-line learning and an application to boosting. J. Comput. Syst. Sci. 55, 119–139 (1997).
    OpenUrlCrossRef
  29. ↵
    1. Z. Zhu,
    2. R. Kumar,
    3. Y.-P. Hung,
    4. R. Haralick,
    5. A. Hanson
    1. E. Grossmann
    , “AdaTree: Boosting a weak classifier into a decision tree” in Conference on Computer Vision and Pattern Recognition Workshop (CVPRW), Z. Zhu, R. Kumar, Y.-P. Hung, R. Haralick, A. Hanson, Eds. (IEEE, 2004), pp. 105–105.
  30. ↵
    1. S. Ma,
    2. H.-Y. Shum,
    3. W. T. Freeman,
    4. L. Van Gool,
    5. S. Chaudhuri
    1. Z. Tu
    , “Probabilistic boosting-tree: Learning discriminative models for classification, recognition, and clustering” in Proceedings of the IEEE International Conference on Computer Vision (ICCV), S. Ma, H.-Y. Shum, W. T. Freeman, L. Van Gool, S. Chaudhuri, Eds. (IEEE, 2005), vol. 2, pp. 1589–1596.
  31. ↵
    1. G. Valdes et al.
    , MediBoost: A patient stratification tool for interpretable decision making in the era of precision medicine. Sci. Rep. 6, 37854 (2016).
    OpenUrl
  32. ↵
    1. J. Friedman,
    2. T. Hastie,
    3. R. Tibshirani
    , Additive logistic regression: A statistical view of boosting. Ann. Stat. 28, 337–407 (2000).
    OpenUrlCrossRef
  33. ↵
    1. Y. Yuan,
    2. M. J. Shaw
    , Induction of fuzzy decision trees. Fuzzy Sets Syst. 69, 125–139 (1995).
    OpenUrlCrossRef
  34. ↵
    1. A. Ercil,
    2. K. Boyer,
    3. M. Cetin,
    4. S.-W. Lee
    1. K. H. Brodersen,
    2. C. S. Ong,
    3. K. E. Stephan,
    4. J. M. Buhmann
    , “The balanced accuracy and its posterior distribution” in Proceedings of the International Conference on Pattern Recognition (ICPR), A. Ercil, K. Boyer, M. Cetin, S.-W. Lee, Eds. (IEEE, 2010), pp. 3121–3124.
  35. ↵
    1. R. S. Olson,
    2. W. La Cava,
    3. P. Orzechowski,
    4. R. J. Urbanowicz,
    5. J. H. Moore
    , PMLB: A large benchmark suite for machine learning evaluation and comparison. BioData Min. 10, 36 (2017).
    OpenUrl
  36. ↵
    1. A. G. Wilson,
    2. J. Yosinski,
    3. P. Simard,
    4. R. Caruana,
    5. W. Herlands
    1. J. M. Luna et al.
    , “Tree-structured boosting: Connections between gradient boosted stumps and full decision trees” in Conference on Neural Information Processing Systems (NIPS 2017), A. G. Wilson, J. Yosinski, P. Simard, R. Caruana, W. Herlands, Eds. (NeurlPS, 2017)., pp. 1–8.
  37. ↵
    1. S. Athey,
    2. J. Tibshirani,
    3. S. Wager
    , Generalized random forests. Ann. Stat. 47, 1148–1178 (2019).
    OpenUrl
  38. ↵
    1. E. D. Gennatas
    , “Towards precision psychiatry: Gray matter development and cognition in adolescence,” PhD thesis, University of Pennsylvania, University Park, PA (2017).
  39. ↵
    1. R Core Team
    , R: A Language and Environment for Statistical Computing (R Foundation for Statistical Computing, Vienna, Austria, 2018).
  40. ↵
    1. M. Pagano,
    2. K. Gauvreau
    , Principles of Biostatistics (Chapman and Hall/CRC, 2018).
  41. ↵
    1. D. Dheeru,
    2. E. K. Taniskidou
    , UCI machine learning repository. http://archive.ics.uci.edu/ml. Accessed 15 June 2017.
  42. ↵
    1. J. S. Sánchez,
    2. P. L. Carmona
    1. M. Reif
    , “A comprehensive dataset for evaluating approaches of various meta-learning tasks” in Proceedings of the International Conference on Pattern Recognition and Methods (ICPRAM), J. S. Sánchez, P. L. Carmona, Eds. (IEEE, 2012), pp. 273–276.
  43. ↵
    1. J. Alcalá-Fdez et al.
    , Keel data-mining software tool: Data set repository, integration of algorithms and experimental analysis framework. J. Mult. Valued Logic Soft Comput. 17, 255–287 (2011).
    OpenUrl
  44. ↵
    1. T. Therneau,
    2. B. Atkinson,
    3. B. Ripley
    , package ‘rpart.’ R package version 4.1-13. https://cran.r-project.org/web/packages/rpart/rpart.pdf. Accessed 1 July 2018.
  45. ↵
    1. G. Ridgeway et al.
    , package ‘gbm.’ R package version 2.1.3. https://cran.r-project.org/web/packages/gbm/gbm.pdf. Accessed 7 July 2018.
  46. ↵
    1. M. N. Wright,
    2. A. Ziegler
    . ranger: A fast implementation of random forests for high dimensional data in C++ and R. J. Stat. Softw. 77, 1–17 (2017).
    OpenUrl
  47. ↵
    1. T. Fawcett,
    2. N. Mishra
    1. G. Valentini,
    2. T. G. Dietterich
    , “Low bias bagged support vector machines” in Proceedings of the 20th International Conference on Machine Learning (ICML-03), T. Fawcett, N. Mishra, Eds. (AAAI Press, 2003), pp. 752–759.
PreviousNext
Back to top
Article Alerts
Email Article

Thank you for your interest in spreading the word on PNAS.

NOTE: We only request your email address so that the person you are recommending the page to knows that you wanted them to see it, and that it is not junk mail. We do not capture any email address.

Enter multiple addresses on separate lines or separate them with commas.
Building more accurate decision trees with the additive tree
(Your Name) has sent you a message from PNAS
(Your Name) thought you would like to see the PNAS web site.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Citation Tools
Building more accurate decision trees with the additive tree
José Marcio Luna, Efstathios D. Gennatas, Lyle H. Ungar, Eric Eaton, Eric S. Diffenderfer, Shane T. Jensen, Charles B. Simone, Jerome H. Friedman, Timothy D. Solberg, Gilmer Valdes
Proceedings of the National Academy of Sciences Oct 2019, 116 (40) 19887-19893; DOI: 10.1073/pnas.1816748116

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
Request Permissions
Share
Building more accurate decision trees with the additive tree
José Marcio Luna, Efstathios D. Gennatas, Lyle H. Ungar, Eric Eaton, Eric S. Diffenderfer, Shane T. Jensen, Charles B. Simone, Jerome H. Friedman, Timothy D. Solberg, Gilmer Valdes
Proceedings of the National Academy of Sciences Oct 2019, 116 (40) 19887-19893; DOI: 10.1073/pnas.1816748116
Digg logo Reddit logo Twitter logo Facebook logo Google logo Mendeley logo
  • Tweet Widget
  • Facebook Like
  • Mendeley logo Mendeley
Proceedings of the National Academy of Sciences: 116 (40)
Table of Contents

Submit

Sign up for Article Alerts

Article Classifications

  • Physical Sciences
  • Statistics

Jump to section

  • Article
    • Abstract
    • Background on CART and Boosting
    • The Additive Tree
    • Results
    • Discussion
    • Materials and Methods
    • Acknowledgments
    • Footnotes
    • References
  • Figures & SI
  • Info & Metrics
  • PDF

You May Also be Interested in

Abstract depiction of a guitar and musical note
Science & Culture: At the nexus of music and medicine, some see disease treatments
Although the evidence is still limited, a growing body of research suggests music may have beneficial effects for diseases such as Parkinson’s.
Image credit: Shutterstock/agsandrew.
Scientist looking at an electronic tablet
Opinion: Standardizing gene product nomenclature—a call to action
Biomedical communities and journals need to standardize nomenclature of gene products to enhance accuracy in scientific and public communication.
Image credit: Shutterstock/greenbutterfly.
One red and one yellow modeled protein structures
Journal Club: Study reveals evolutionary origins of fold-switching protein
Shapeshifting designs could have wide-ranging pharmaceutical and biomedical applications in coming years.
Image credit: Acacia Dishman/Medical College of Wisconsin.
White and blue bird
Hazards of ozone pollution to birds
Amanda Rodewald, Ivan Rudik, and Catherine Kling talk about the hazards of ozone pollution to birds.
Listen
Past PodcastsSubscribe
Goats standing in a pin
Transplantation of sperm-producing stem cells
CRISPR-Cas9 gene editing can improve the effectiveness of spermatogonial stem cell transplantation in mice and livestock, a study finds.
Image credit: Jon M. Oatley.

Similar Articles

Site Logo
Powered by HighWire
  • Submit Manuscript
  • Twitter
  • Facebook
  • RSS Feeds
  • Email Alerts

Articles

  • Current Issue
  • Special Feature Articles – Most Recent
  • List of Issues

PNAS Portals

  • Anthropology
  • Chemistry
  • Classics
  • Front Matter
  • Physics
  • Sustainability Science
  • Teaching Resources

Information

  • Authors
  • Editorial Board
  • Reviewers
  • Librarians
  • Press
  • Site Map
  • PNAS Updates

Feedback    Privacy/Legal

Copyright © 2021 National Academy of Sciences. Online ISSN 1091-6490