How do leaf nodes behave in regression decision trees?

fog37 · Nov 8, 2023

Hello.

Decision trees are really cool. They can be used for either regression or classification. They are built with nodes and each node represents an if-then statement that gets evaluated to be either true or false. Does that mean there are always and only two edges/branches coming out of an internal node (leaf nodes don't have edges)? Or are there situations in which there can be more than 2 edges?

In the case of classification trees, the leaf nodes are the output nodes, each with a single class output (there can be more leaf nodes than the classes available). In the case of regression trees, how do the leaf nodes behave? The goal is to predict a numerical output (ex: the price of a house). How many leaf nodes are there? One for each possible numerical value? That would be impossible. I know the tree gets trained with a finite number of examples/instances and the tree structure and decision statements are formed...

Thank you for any clarification.

Dale · Nov 8, 2023

I have never heard of using a decision tree for regression. Do you have a source for this?

fog37 · Nov 8, 2023

I do. One Medium article entitled

Decision Tree Regression Explained with Implementation in Python

Many more.

https://www.geeksforgeeks.org/python-decision-tree-regression-using-sklearn/

Dale · Nov 8, 2023

fog37 said:

TL;DR Summary: understand how decision trees and leaf nodes behave in the case of regression...

In the case of regression trees, how do the leaf nodes behave? The goal is to predict a numerical output (ex: the price of a house). How many leaf nodes are there? One for each possible numerical value?

It looks like the leaves themselves can assume continuous outputs. So you would only need one leaf per regression parameter.

fog37 · Nov 9, 2023

Dale said:

It looks like the leaves themselves can assume continuous outputs. So you would only need one leaf per regression parameter.

Thank you. Let me see if I understand correctly. In the example figure below, I notice that the leaf nodes have specific amounts, i.e. the value (last line). What if the inputs are such that the predicted value is none of the values mentioned in the leaf nodes? That is my dilemma. It seems that there is a finite number of leaf nodes having their own value...

Dale · Nov 9, 2023

Sorry, I cannot help you. Literally all I know about it is that one page that you cited where it says "Continuous output means that the output/result is not discrete, i.e., it is not represented just by a discrete, known set of numbers or values".

If you need more technical information then you need to find a more technical source. If you have a more technical source that has the information you need then I can help you understand it, but there simply is not any more information available there than the quote.

fog37 · Nov 10, 2023

Dale said:

Sorry, I cannot help you. Literally all I know about it is that one page that you cited where it says "Continuous output means that the output/result is not discrete, i.e., it is not represented just by a discrete, known set of numbers or values".

If you need more technical information then you need to find a more technical source. If you have a more technical source that has the information you need then I can help you understand it, but there simply is not any more information available there than the quote.

No worries.

After some research, I learned that, in the case of a regression decision tree, the possible numerical output answers are the averages of the values for those instances in the training dataset that together reached a particular leaf node based on the sequence of if-then statements along the tree itself...

How do leaf nodes behave in regression decision trees?

Discussion Overview

Discussion Character

Main Points Raised

Areas of Agreement / Disagreement

Contextual Notes

Decision Tree Regression Explained with Implementation in Python

Similar threads

Undergrad Please Explain (actually explain) The Monty Hall Problem

Undergrad A variant of the Monty Hall problem

Graduate Hypothesis testing: Defining H0, HA hypotheses so that ( H_A)_A' makes sense

Undergrad My basic understanding of set theory

Undergrad How do E[X] and E[|X|] relate?

Insights Remote Operated Gate Control System

Insights AI Enriched Problem Solving

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect

Insights What Exactly is Dirac’s Delta Function? - Insight

How do leaf nodes behave in regression decision trees?

Discussion Overview

Discussion Character

Main Points Raised

Areas of Agreement / Disagreement

Contextual Notes

Decision Tree Regression Explained with Implementation in Python​

Similar threads

Decision Tree Regression Explained with Implementation in Python