How do leaf nodes behave in regression decision trees?

Click For Summary

Discussion Overview

The discussion centers on the behavior of leaf nodes in regression decision trees, exploring how these nodes function in predicting numerical outputs. Participants examine the structure of decision trees, the nature of leaf nodes, and the implications of continuous outputs in regression contexts.

Discussion Character

  • Exploratory
  • Technical explanation
  • Debate/contested

Main Points Raised

  • One participant notes that decision trees can be used for both regression and classification, questioning whether internal nodes always have two edges or if more can exist.
  • Another participant expresses surprise at the use of decision trees for regression and requests a source for this information.
  • A source is provided that discusses decision tree regression, indicating that leaf nodes can assume continuous outputs and suggesting that one leaf is needed per regression parameter.
  • A participant raises a concern about the situation where the predicted value does not match any of the values in the leaf nodes, questioning the finite nature of leaf nodes.
  • One participant admits to limited knowledge on the topic and emphasizes the need for more technical sources to understand the continuous output concept.
  • A later reply suggests that the numerical outputs in regression decision trees are averages of the values from training instances that reach a specific leaf node.

Areas of Agreement / Disagreement

Participants express varying levels of understanding and knowledge about regression decision trees, with some uncertainty about the behavior of leaf nodes and the implications of continuous outputs. No consensus is reached on the specifics of how leaf nodes function in regression trees.

Contextual Notes

There are limitations regarding the assumptions about the number of leaf nodes and the nature of outputs, as well as the dependency on definitions of continuous outputs. The discussion does not resolve these uncertainties.

fog37
Messages
1,566
Reaction score
108
TL;DR
understand how decision trees and leaf nodes behave in the case of regression...
Hello.

Decision trees are really cool. They can be used for either regression or classification. They are built with nodes and each node represents an if-then statement that gets evaluated to be either true or false. Does that mean there are always and only two edges/branches coming out of an internal node (leaf nodes don't have edges)? Or are there situations in which there can be more than 2 edges?

In the case of classification trees, the leaf nodes are the output nodes, each with a single class output (there can be more leaf nodes than the classes available). In the case of regression trees, how do the leaf nodes behave? The goal is to predict a numerical output (ex: the price of a house). How many leaf nodes are there? One for each possible numerical value? That would be impossible. I know the tree gets trained with a finite number of examples/instances and the tree structure and decision statements are formed...

Thank you for any clarification.
 
Physics news on Phys.org
I have never heard of using a decision tree for regression. Do you have a source for this?
 
fog37 said:
TL;DR Summary: understand how decision trees and leaf nodes behave in the case of regression...

In the case of regression trees, how do the leaf nodes behave? The goal is to predict a numerical output (ex: the price of a house). How many leaf nodes are there? One for each possible numerical value?
It looks like the leaves themselves can assume continuous outputs. So you would only need one leaf per regression parameter.
 
  • Like
Likes   Reactions: fog37
Dale said:
It looks like the leaves themselves can assume continuous outputs. So you would only need one leaf per regression parameter.
Thank you. Let me see if I understand correctly. In the example figure below, I notice that the leaf nodes have specific amounts, i.e. the value (last line). What if the inputs are such that the predicted value is none of the values mentioned in the leaf nodes? That is my dilemma. It seems that there is a finite number of leaf nodes having their own value...

1699537077116.png
 
Sorry, I cannot help you. Literally all I know about it is that one page that you cited where it says "Continuous output means that the output/result is not discrete, i.e., it is not represented just by a discrete, known set of numbers or values".

If you need more technical information then you need to find a more technical source. If you have a more technical source that has the information you need then I can help you understand it, but there simply is not any more information available there than the quote.
 
  • Like
Likes   Reactions: fog37
Dale said:
Sorry, I cannot help you. Literally all I know about it is that one page that you cited where it says "Continuous output means that the output/result is not discrete, i.e., it is not represented just by a discrete, known set of numbers or values".

If you need more technical information then you need to find a more technical source. If you have a more technical source that has the information you need then I can help you understand it, but there simply is not any more information available there than the quote.
No worries.

After some research, I learned that, in the case of a regression decision tree, the possible numerical output answers are the averages of the values for those instances in the training dataset that together reached a particular leaf node based on the sequence of if-then statements along the tree itself...
 
  • Like
Likes   Reactions: Dale

Similar threads

Replies
1
Views
2K
  • · Replies 13 ·
Replies
13
Views
2K
  • · Replies 1 ·
Replies
1
Views
3K
  • · Replies 3 ·
Replies
3
Views
2K
  • · Replies 9 ·
Replies
9
Views
3K
Replies
1
Views
2K
  • · Replies 1 ·
Replies
1
Views
1K
  • · Replies 2 ·
Replies
2
Views
2K
  • · Replies 1 ·
Replies
1
Views
6K
  • · Replies 40 ·
2
Replies
40
Views
8K