What is the point of these rows?

  • Thread starter SlurrerOfSpeech
  • Start date
  • Tags
    Point
In summary: If we want to do something likeparents_id | child_id================ NULL 1 1 2 2 4 2 5 3
  • #1
SlurrerOfSpeech
141
11
I have a book SQL Antipatterns that shows how to use tables to represent a relationship like

Code:
       1
      /   \
     2     3
    /  \      \
   4   5     6
              /
             7

with a table like

Code:
      CommentPaths
============================
ancestor_id | descendant_id
============================
     1            1
     1            2
     1            3
     1            4
     1            5
     1            6
     1            7
     2            2
     2            4
     2            5
     4            4
     5            5
     3            3
     3            6
     3            7
     6            6
     7            7

My question is, What is the point of the rows where ancestor_id=descendant_id? What useful information do they add? Makes no sense.
 
Technology news on Phys.org
  • #2
SlurrerOfSpeech said:
My question is, What is the point of the rows where ancestor_id=descendant_id? What useful information do they add? Makes no sense.
I will hazard a guess: Those elements may contain some information in addition to links to descendants. Therefore, when looking for information (keys), you must inspect the element itself in addition to its descendants.
 
  • #3
What @Svein said.

Notice also that for all records of the Comments table, the pair (commentId,commentId) has been added to the CommentPaths table.
That makes it easy to build queries about a comment and all of its descendants.
 
  • #4
The CommentPaths table is a terrible way to store relationships. While it might make it easy to determine the set of records that contain a record and all of its children, the table is a maintenance nightmare. Suppose you add a subtree to one of the child nodes? You have to find all of the parent nodes associated with that child node and add all of the subtree nodes to it. Then, if you want to move the subtree to another child node, you have to do the same thing for the new position plus remove the old relationships.
 
  • Like
Likes jim mcnamara and phinds
  • #5
Borg said:
The CommentPaths table is a terrible way to store relationships. While it might make it easy to determine the set of records that contain a record and all of its children, the table is a maintenance nightmare. Suppose you add a subtree to one of the child nodes? You have to find all of the parent nodes associated with that child node and add all of the subtree nodes to it. Then, if you want to move the subtree to another child node, you have to do the same thing for the new position plus remove the old relationships.

If we want to do something like

Code:
parents_id | child_id
================
  NULL             1
    1                   2
    2                   4 
    2                   5
    1                   3
    3                   6
    6                   7

then it's still a maintenance nightmare, isn't it? The advantage of the table in my original post (called a Closure Table) is that if you delete a comment it's easy to find all it's descendants to delete.

It's also easy to add a comment. Suppose you want to change the table to

Code:
       1
      /     \
     2       3
    /  \        \
   4   5       6
                 / \
               7   8

meaning add 8 as a child of 6. All you have to really do is enter as ancestors of 8 all ancestors of 6.
 
  • #6
SlurrerOfSpeech said:
If we want to do something like
Code:
parents_id | child_id
================
  NULL             1
    1                   2
    2                   4
    2                   5
    1                   3
    3                   6
    6                   7
then it's still a maintenance nightmare, isn't it?
Why would you consider updating one record to be more of a maintenance headache than updating multiple records?
For example, let's say that you're the DBA for Ancestry.com. You just got a record that says that someone named John Doe is a relative of Thomas Jefferson. You have the entire set of relatives for John Doe all the way to the present day (presumably thousands of relatives). You now have to create new parent/child relationships for all of John Doe's relatives in order to relate them to Thomas Jefferson. Two days later, you find out that he should have been tied to someone named Tomas Jefferson. You now have to remove thousands of records tied to Thomas Jefferson and again create thousands for Tomas Jefferson. The table above only requires updating a single record.
 
  • #7
Borg said:
The table above only requires updating a single record.

Yes, but your example doesn't relate to my sistuation because the numbers are auto-incremented ints representing ids of comments. Those ids have no reason to ever change. In your example, what if you find out that Thomas Jefferson was not a real person and you've imposed a rule that your ancestory database only contains members whose entire lineage is known? You're going to have to write a Hell of a query to delete all of Thomas Jefferson's descendants from the tree.
 
  • #8
SlurrerOfSpeech said:
...The advantage of the table in my original post (called a Closure Table) is that if you delete a comment it's easy to find all it's descendants to delete.
Is that table good for just explaining how the data are being stored in the B-tree ? But in implementation repeated rows as shown in that table are not required. A tree grows from the basic concept of a linked list with more pointers in within to point to the left and right of the tree itself.
So something like this I think is enough

Code:
parents_id | child_id
================
  NULL             1
    1                   2
    2                   4
    2                   5
    1                   3
    3                   6
    6                   7
I don't know how a comment 3 is false but its descendants 6 and 7 are true. So deleting 3, 6,7 all at once is logical, isn't it ?
 
  • #9
SlurrerOfSpeech said:
you've imposed a rule that your ancestory database only contains members whose entire lineage is known
I gave that as an example, not as a database rule.
SlurrerOfSpeech said:
You're going to have to write a Hell of a query to delete all of Thomas Jefferson's descendants from the tree.
It only requires modifying a single child record that ties John Doe to Thomas Jefferson. Once that record is changed, the decendants are no longer linked to Jefferson. How is that a "Hell of a query"?
 
  • #10
Borg said:
I gave that as an example, not as a database rule.
It only requires modifying a single child record that ties John Doe to Thomas Jefferson. Once that record is changed, the decendants are no longer linked to Jefferson. How is that a "Hell of a query"?

It would be a Hell of a query if you need to delete all Jefferson's descendants. I guess that wouldn't make sense in the family tree example since you'd keep the descendants so that you can attach them to the correct ancestor later. In my example, where the tree represents comments in a thread, the descendant comments are junk data as soon as the ancestor comment is deleted, hence it's nice to have the tree be a structure in which it's easy to query descendants.
 
  • #11
I guess that I wasn't following your example very well. I see that the table is named CommentPaths but there wasn't much to describe its purpose until now. If you want to delete descendant comments, there are other options for deleting all of the children automatically. Cascade delete is perfect for this situation.
 
  • #12
Borg said:
I guess that I wasn't following your example very well. I see that the table is named CommentPaths but there wasn't much to describe its purpose until now. If you want to delete descendant comments, there are other options for deleting all of the children automatically. Cascade delete is perfect for this situation.

Can you show me how you would write a cascade delete that does all this?
 
  • #13
I would break up the comments into two tables like Comments and Responses. Comments would be the starting comment and responses would be any children for that particular comment. Something like this:
Code:
CREATE TABLE COMMENTS(
    COMMENT_ID long(14),
    COMMENT varchar(250)
);
CREATE TABLE RESPONSES (
    RESPONSE_ID long(14),
    COMMENT_ID long(14),
    PARENT_ID long(14),
    RESPONSE varchar(250)
);
You then just need to modify the RESPONSES table like this example:
Code:
CREATE TABLE child_table
(
  column1 datatype [ NULL | NOT NULL ],
  column2 datatype [ NULL | NOT NULL ],
  ...

  CONSTRAINT fk_name
    FOREIGN KEY (child_col1, child_col2, ... child_col_n)
    REFERENCES parent_table (parent_col1, parent_col2, ... parent_col_n)
    ON DELETE CASCADE
    [ ON UPDATE { NO ACTION | CASCADE | SET NULL | SET DEFAULT } ]
);

There is only one entry for any response or comment. All Response entries have to be tied to a comment. If a comment is deleted, all of the responses tied to that comment will be deleted.
The Response Parent ID would signify its relationship to its parent response. If its parent is a comment, you could null it or have it reference itself to signify that your code should treat it differently. The two tables are similar enough that you could change a response into a comment with a few commands but order would be important to avoid cascading a delete. You could also probably put everything into a single table but that would take some additional effort.
 

1. What is the purpose of having rows in a scientific experiment?

The rows in a scientific experiment serve as a way to organize and collect data. Each row typically represents a different variable or condition that is being tested.

2. How do rows contribute to the validity of an experiment?

By having multiple rows, scientists can compare and analyze data from different variables or conditions, increasing the reliability and accuracy of the results.

3. Are rows necessary in all scientific experiments?

It depends on the type of experiment and the data being collected. In some cases, rows may not be necessary if there is only one variable being tested. However, in most experiments, rows are essential for organizing and analyzing data.

4. Can the number of rows affect the outcome of an experiment?

Yes, the number of rows can impact the outcome of an experiment. Having an insufficient number of rows may not provide enough data to draw accurate conclusions, while having too many rows can make it difficult to analyze the data efficiently.

5. How do scientists decide on the number of rows to use in an experiment?

The number of rows used in an experiment is typically determined by the complexity of the experiment and the amount of data needed to draw statistically significant conclusions. Scientists may also consider factors such as time and resources available when deciding on the number of rows to use.

Similar threads

  • Programming and Computer Science
Replies
34
Views
2K
  • Programming and Computer Science
Replies
4
Views
996
  • Programming and Computer Science
Replies
27
Views
2K
  • Programming and Computer Science
Replies
3
Views
679
  • Programming and Computer Science
Replies
1
Views
1K
Replies
0
Views
303
  • Programming and Computer Science
Replies
4
Views
866
  • Programming and Computer Science
Replies
1
Views
909
  • Programming and Computer Science
Replies
9
Views
1K
  • Programming and Computer Science
Replies
23
Views
2K
Back
Top