What is the point of these rows?

SlurrerOfSpeech · Jan 16, 2016

I have a book SQL Antipatterns that shows how to use tables to represent a relationship like

Code:

       1
      /   \
     2     3
    /  \      \
   4   5     6
              /
             7

with a table like

Code:

      CommentPaths
============================
ancestor_id | descendant_id
============================
     1            1
     1            2
     1            3
     1            4
     1            5
     1            6
     1            7
     2            2
     2            4
     2            5
     4            4
     5            5
     3            3
     3            6
     3            7
     6            6
     7            7

My question is, What is the point of the rows where ancestor_id=descendant_id? What useful information do they add? Makes no sense.

Svein · Jan 17, 2016

SlurrerOfSpeech said:

My question is, What is the point of the rows where ancestor_id=descendant_id? What useful information do they add? Makes no sense.

I will hazard a guess: Those elements may contain some information in addition to links to descendants. Therefore, when looking for information (keys), you must inspect the element itself in addition to its descendants.

Samy_A · Jan 17, 2016

What @Svein said.

Notice also that for all records of the Comments table, the pair (commentId,commentId) has been added to the CommentPaths table.
That makes it easy to build queries about a comment and all of its descendants.

Borg · Jan 17, 2016

The CommentPaths table is a terrible way to store relationships. While it might make it easy to determine the set of records that contain a record and all of its children, the table is a maintenance nightmare. Suppose you add a subtree to one of the child nodes? You have to find all of the parent nodes associated with that child node and add all of the subtree nodes to it. Then, if you want to move the subtree to another child node, you have to do the same thing for the new position plus remove the old relationships.

SlurrerOfSpeech · Jan 17, 2016

Borg said:

The CommentPaths table is a terrible way to store relationships. While it might make it easy to determine the set of records that contain a record and all of its children, the table is a maintenance nightmare. Suppose you add a subtree to one of the child nodes? You have to find all of the parent nodes associated with that child node and add all of the subtree nodes to it. Then, if you want to move the subtree to another child node, you have to do the same thing for the new position plus remove the old relationships.

If we want to do something like

Code:

parents_id | child_id
================
  NULL             1
    1                   2
    2                   4 
    2                   5
    1                   3
    3                   6
    6                   7

then it's still a maintenance nightmare, isn't it? The advantage of the table in my original post (called a Closure Table) is that if you delete a comment it's easy to find all it's descendants to delete.

It's also easy to add a comment. Suppose you want to change the table to

Code:

       1
      /     \
     2       3
    /  \        \
   4   5       6
                 / \
               7   8

meaning add 8 as a child of 6. All you have to really do is enter as ancestors of 8 all ancestors of 6.

Borg · Jan 17, 2016

SlurrerOfSpeech said:
If we want to do something like
Code:
parents_id | child_id
================
  NULL             1
    1                   2
    2                   4
    2                   5
    1                   3
    3                   6
    6                   7
then it's still a maintenance nightmare, isn't it?

Why would you consider updating one record to be more of a maintenance headache than updating multiple records?
For example, let's say that you're the DBA for Ancestry.com. You just got a record that says that someone named John Doe is a relative of Thomas Jefferson. You have the entire set of relatives for John Doe all the way to the present day (presumably thousands of relatives). You now have to create new parent/child relationships for all of John Doe's relatives in order to relate them to Thomas Jefferson. Two days later, you find out that he should have been tied to someone named Tomas Jefferson. You now have to remove thousands of records tied to Thomas Jefferson and again create thousands for Tomas Jefferson. The table above only requires updating a single record.

SlurrerOfSpeech · Jan 17, 2016

Borg said:

The table above only requires updating a single record.

Yes, but your example doesn't relate to my sistuation because the numbers are auto-incremented ints representing ids of comments. Those ids have no reason to ever change. In your example, what if you find out that Thomas Jefferson was not a real person and you've imposed a rule that your ancestory database only contains members whose entire lineage is known? You're going to have to write a Hell of a query to delete all of Thomas Jefferson's descendants from the tree.

Silicon Waffle · Jan 17, 2016

SlurrerOfSpeech said:

...The advantage of the table in my original post (called a Closure Table) is that if you delete a comment it's easy to find all it's descendants to delete.

Is that table good for just explaining how the data are being stored in the B-tree ? But in implementation repeated rows as shown in that table are not required. A tree grows from the basic concept of a linked list with more pointers in within to point to the left and right of the tree itself.
So something like this I think is enough

Code:

parents_id | child_id
================
  NULL             1
    1                   2
    2                   4
    2                   5
    1                   3
    3                   6
    6                   7

I don't know how a comment 3 is false but its descendants 6 and 7 are true. So deleting 3, 6,7 all at once is logical, isn't it ?

Borg · Jan 18, 2016

SlurrerOfSpeech said:

you've imposed a rule that your ancestory database only contains members whose entire lineage is known

I gave that as an example, not as a database rule.

SlurrerOfSpeech said:

You're going to have to write a Hell of a query to delete all of Thomas Jefferson's descendants from the tree.

It only requires modifying a single child record that ties John Doe to Thomas Jefferson. Once that record is changed, the decendants are no longer linked to Jefferson. How is that a "Hell of a query"?

SlurrerOfSpeech · Jan 18, 2016

Borg said:

I gave that as an example, not as a database rule.
It only requires modifying a single child record that ties John Doe to Thomas Jefferson. Once that record is changed, the decendants are no longer linked to Jefferson. How is that a "Hell of a query"?

It would be a Hell of a query if you need to delete all Jefferson's descendants. I guess that wouldn't make sense in the family tree example since you'd keep the descendants so that you can attach them to the correct ancestor later. In my example, where the tree represents comments in a thread, the descendant comments are junk data as soon as the ancestor comment is deleted, hence it's nice to have the tree be a structure in which it's easy to query descendants.

Borg · Jan 18, 2016

I guess that I wasn't following your example very well. I see that the table is named CommentPaths but there wasn't much to describe its purpose until now. If you want to delete descendant comments, there are other options for deleting all of the children automatically. Cascade delete is perfect for this situation.

SlurrerOfSpeech · Jan 18, 2016

Borg said:

I guess that I wasn't following your example very well. I see that the table is named CommentPaths but there wasn't much to describe its purpose until now. If you want to delete descendant comments, there are other options for deleting all of the children automatically. Cascade delete is perfect for this situation.

Can you show me how you would write a cascade delete that does all this?

Borg · Jan 18, 2016

I would break up the comments into two tables like Comments and Responses. Comments would be the starting comment and responses would be any children for that particular comment. Something like this:

Code:

CREATE TABLE COMMENTS(
    COMMENT_ID long(14),
    COMMENT varchar(250)
);
CREATE TABLE RESPONSES (
    RESPONSE_ID long(14),
    COMMENT_ID long(14),
    PARENT_ID long(14),
    RESPONSE varchar(250)
);

You then just need to modify the RESPONSES table like this example:

Code:

CREATE TABLE child_table
(
  column1 datatype [ NULL | NOT NULL ],
  column2 datatype [ NULL | NOT NULL ],
  ...

  CONSTRAINT fk_name
    FOREIGN KEY (child_col1, child_col2, ... child_col_n)
    REFERENCES parent_table (parent_col1, parent_col2, ... parent_col_n)
    ON DELETE CASCADE
    [ ON UPDATE { NO ACTION | CASCADE | SET NULL | SET DEFAULT } ]
);

There is only one entry for any response or comment. All Response entries have to be tied to a comment. If a comment is deleted, all of the responses tied to that comment will be deleted.
The Response Parent ID would signify its relationship to its parent response. If its parent is a comment, you could null it or have it reference itself to signify that your code should treat it differently. The two tables are similar enough that you could change a response into a comment with a few commands but order would be important to avoid cascading a delete. You could also probably put everything into a single table but that would take some additional effort.

What is the point of these rows?

Discussion Overview

Discussion Character

Main Points Raised

Areas of Agreement / Disagreement

Contextual Notes

Similar threads

Use of AI (ML/DL) in Science

Sweetspot of data compression

Other than just FizzBuzz to test programmer candidates

How to show RS(U+TRS)* is equivalent to (R+SUT)SU?

HTML/CSS Problems with DNS records

Insights Revisiting the Velocity-Time Function

Insights Remote Operated Gate Control System

Insights AI Enriched Problem Solving

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect