How do leaf nodes behave in regression decision trees?

  • #1
fog37
1,568
108
TL;DR Summary
understand how decision trees and leaf nodes behave in the case of regression...
Hello.

Decision trees are really cool. They can be used for either regression or classification. They are built with nodes and each node represents an if-then statement that gets evaluated to be either true or false. Does that mean there are always and only two edges/branches coming out of an internal node (leaf nodes don't have edges)? Or are there situations in which there can be more than 2 edges?

In the case of classification trees, the leaf nodes are the output nodes, each with a single class output (there can be more leaf nodes than the classes available). In the case of regression trees, how do the leaf nodes behave? The goal is to predict a numerical output (ex: the price of a house). How many leaf nodes are there? One for each possible numerical value? That would be impossible. I know the tree gets trained with a finite number of examples/instances and the tree structure and decision statements are formed...

Thank you for any clarification.
 
Physics news on Phys.org
  • #2
I have never heard of using a decision tree for regression. Do you have a source for this?
 
  • #4
fog37 said:
TL;DR Summary: understand how decision trees and leaf nodes behave in the case of regression...

In the case of regression trees, how do the leaf nodes behave? The goal is to predict a numerical output (ex: the price of a house). How many leaf nodes are there? One for each possible numerical value?
It looks like the leaves themselves can assume continuous outputs. So you would only need one leaf per regression parameter.
 
  • Like
Likes fog37
  • #5
Dale said:
It looks like the leaves themselves can assume continuous outputs. So you would only need one leaf per regression parameter.
Thank you. Let me see if I understand correctly. In the example figure below, I notice that the leaf nodes have specific amounts, i.e. the value (last line). What if the inputs are such that the predicted value is none of the values mentioned in the leaf nodes? That is my dilemma. It seems that there is a finite number of leaf nodes having their own value...

1699537077116.png
 
  • #6
Sorry, I cannot help you. Literally all I know about it is that one page that you cited where it says "Continuous output means that the output/result is not discrete, i.e., it is not represented just by a discrete, known set of numbers or values".

If you need more technical information then you need to find a more technical source. If you have a more technical source that has the information you need then I can help you understand it, but there simply is not any more information available there than the quote.
 
  • Like
Likes fog37
  • #7
Dale said:
Sorry, I cannot help you. Literally all I know about it is that one page that you cited where it says "Continuous output means that the output/result is not discrete, i.e., it is not represented just by a discrete, known set of numbers or values".

If you need more technical information then you need to find a more technical source. If you have a more technical source that has the information you need then I can help you understand it, but there simply is not any more information available there than the quote.
No worries.

After some research, I learned that, in the case of a regression decision tree, the possible numerical output answers are the averages of the values for those instances in the training dataset that together reached a particular leaf node based on the sequence of if-then statements along the tree itself...
 
  • Like
Likes Dale

1. What is a leaf node in a regression decision tree?

A leaf node in a regression decision tree represents the final output or decision made by the tree after considering all relevant input features. It is the endpoint of a decision path where no further splitting occurs. In regression trees, each leaf node provides a predicted numerical value based on the input features that led to that particular leaf.

2. How is the value at a leaf node in a regression tree determined?

The value at a leaf node in a regression tree is typically determined by the average of the target values of the training samples that fall into that leaf. This average provides the prediction for new data points that reach this leaf, assuming they follow similar patterns as the training data.

3. What criteria are used to decide when to stop adding leaf nodes in a regression decision tree?

Several criteria can be used to decide when to stop adding leaf nodes in a regression decision tree, including setting a maximum depth for the tree, requiring a minimum number of samples in a node before it can be split further, setting a minimum reduction in variance as a condition for splitting, or stopping when additional splits no longer provide meaningful differentiation in the output values.

4. How do leaf nodes affect the accuracy of a regression decision tree?

The configuration and number of leaf nodes can significantly affect the accuracy of a regression decision tree. Too few leaf nodes can lead to underfitting where the model is too simple to capture important patterns in the data. Conversely, too many leaf nodes can lead to overfitting where the model captures noise in the training data rather than genuine trends, negatively impacting its performance on new, unseen data.

5. Can the values of leaf nodes in a regression decision tree change over time?

Once a regression decision tree model is trained and deployed, the values of its leaf nodes remain static and do not change over time unless the model is retrained with new data. However, if the underlying data patterns change significantly (concept drift), it may become necessary to update or retrain the model, which could result in different values at the leaf nodes.

Similar threads

  • Set Theory, Logic, Probability, Statistics
Replies
1
Views
1K
  • Engineering and Comp Sci Homework Help
Replies
1
Views
1K
  • Engineering and Comp Sci Homework Help
Replies
1
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
13
Views
1K
  • Programming and Computer Science
Replies
3
Views
2K
  • Engineering and Comp Sci Homework Help
Replies
1
Views
1K
  • Engineering and Comp Sci Homework Help
Replies
9
Views
2K
  • Engineering and Comp Sci Homework Help
Replies
1
Views
912
  • Biology and Medical
Replies
2
Views
828
  • Engineering and Comp Sci Homework Help
Replies
1
Views
3K
Back
Top