Dismiss Notice
Join Physics Forums Today!
The friendliest, high quality science and math community on the planet! Everyone who loves science is here!

What is the point of these rows?

  1. Jan 16, 2016 #1
    I have a book SQL Antipatterns that shows how to use tables to represent a relationship like

    Code (Text):

           1
          /   \
         2     3
        /  \      \
       4   5     6
                  /
                 7
     
    with a table like

    Code (Text):

          CommentPaths
    ============================
    ancestor_id | descendant_id
    ============================
         1            1
         1            2
         1            3
         1            4
         1            5
         1            6
         1            7
         2            2
         2            4
         2            5
         4            4
         5            5
         3            3
         3            6
         3            7
         6            6
         7            7
     
    My question is, What is the point of the rows where ancestor_id=descendant_id? What useful information do they add? Makes no sense.
     
  2. jcsd
  3. Jan 17, 2016 #2

    Svein

    User Avatar
    Science Advisor

    I will hazard a guess: Those elements may contain some information in addition to links to descendants. Therefore, when looking for information (keys), you must inspect the element itself in addition to its descendants.
     
  4. Jan 17, 2016 #3

    Samy_A

    User Avatar
    Science Advisor
    Homework Helper

    What @Svein said.

    Notice also that for all records of the Comments table, the pair (commentId,commentId) has been added to the CommentPaths table.
    That makes it easy to build queries about a comment and all of its descendants.
     
  5. Jan 17, 2016 #4

    Borg

    User Avatar
    Science Advisor
    Gold Member

    The CommentPaths table is a terrible way to store relationships. While it might make it easy to determine the set of records that contain a record and all of its children, the table is a maintenance nightmare. Suppose you add a subtree to one of the child nodes? You have to find all of the parent nodes associated with that child node and add all of the subtree nodes to it. Then, if you want to move the subtree to another child node, you have to do the same thing for the new position plus remove the old relationships.
     
  6. Jan 17, 2016 #5
    If we want to do something like

    Code (Text):

    parents_id | child_id
    ================
      NULL             1
        1                   2
        2                   4
        2                   5
        1                   3
        3                   6
        6                   7
     
    then it's still a maintenance nightmare, isn't it? The advantage of the table in my original post (called a Closure Table) is that if you delete a comment it's easy to find all it's descendants to delete.

    It's also easy to add a comment. Suppose you want to change the table to

    Code (Text):

           1
          /     \
         2       3
        /  \        \
       4   5       6
                     / \
                   7   8
     
    meaning add 8 as a child of 6. All you have to really do is enter as ancestors of 8 all ancestors of 6.
     
  7. Jan 17, 2016 #6

    Borg

    User Avatar
    Science Advisor
    Gold Member

    Why would you consider updating one record to be more of a maintenance headache than updating multiple records?
    For example, lets say that you're the DBA for Ancestry.com. You just got a record that says that someone named John Doe is a relative of Thomas Jefferson. You have the entire set of relatives for John Doe all the way to the present day (presumably thousands of relatives). You now have to create new parent/child relationships for all of John Doe's relatives in order to relate them to Thomas Jefferson. Two days later, you find out that he should have been tied to someone named Tomas Jefferson. You now have to remove thousands of records tied to Thomas Jefferson and again create thousands for Tomas Jefferson. The table above only requires updating a single record.
     
  8. Jan 17, 2016 #7
    Yes, but your example doesn't relate to my sistuation because the numbers are auto-incremented ints representing ids of comments. Those ids have no reason to ever change. In your example, what if you find out that Thomas Jefferson was not a real person and you've imposed a rule that your ancestory database only contains members whose entire lineage is known? You're going to have to write a Hell of a query to delete all of Thomas Jefferson's descendants from the tree.
     
  9. Jan 17, 2016 #8
    Is that table good for just explaining how the data are being stored in the B-tree ? But in implementation repeated rows as shown in that table are not required. A tree grows from the basic concept of a linked list with more pointers in within to point to the left and right of the tree itself.
    So something like this I think is enough

    Code (Text):

    parents_id | child_id
    ================
      NULL             1
        1                   2
        2                   4
        2                   5
        1                   3
        3                   6
        6                   7
     
    I don't know how a comment 3 is false but its descendants 6 and 7 are true. So deleting 3, 6,7 all at once is logical, isn't it ?
     
  10. Jan 18, 2016 #9

    Borg

    User Avatar
    Science Advisor
    Gold Member

    I gave that as an example, not as a database rule.
    It only requires modifying a single child record that ties John Doe to Thomas Jefferson. Once that record is changed, the decendants are no longer linked to Jefferson. How is that a "Hell of a query"?
     
  11. Jan 18, 2016 #10
    It would be a Hell of a query if you need to delete all Jefferson's descendants. I guess that wouldn't make sense in the family tree example since you'd keep the descendants so that you can attach them to the correct ancestor later. In my example, where the tree represents comments in a thread, the descendant comments are junk data as soon as the ancestor comment is deleted, hence it's nice to have the tree be a structure in which it's easy to query descendants.
     
  12. Jan 18, 2016 #11

    Borg

    User Avatar
    Science Advisor
    Gold Member

    I guess that I wasn't following your example very well. I see that the table is named CommentPaths but there wasn't much to describe its purpose until now. If you want to delete descendant comments, there are other options for deleting all of the children automatically. Cascade delete is perfect for this situation.
     
  13. Jan 18, 2016 #12
    Can you show me how you would write a cascade delete that does all this?
     
  14. Jan 18, 2016 #13

    Borg

    User Avatar
    Science Advisor
    Gold Member

    I would break up the comments into two tables like Comments and Responses. Comments would be the starting comment and responses would be any children for that particular comment. Something like this:
    Code (Text):

    CREATE TABLE COMMENTS(
        COMMENT_ID long(14),
        COMMENT varchar(250)
    );
    CREATE TABLE RESPONSES (
        RESPONSE_ID long(14),
        COMMENT_ID long(14),
        PARENT_ID long(14),
        RESPONSE varchar(250)
    );
     
    You then just need to modify the RESPONSES table like this example:
    Code (Text):
    CREATE TABLE child_table
    (
      column1 datatype [ NULL | NOT NULL ],
      column2 datatype [ NULL | NOT NULL ],
      ...

      CONSTRAINT fk_name
        FOREIGN KEY (child_col1, child_col2, ... child_col_n)
        REFERENCES parent_table (parent_col1, parent_col2, ... parent_col_n)
        ON DELETE CASCADE
        [ ON UPDATE { NO ACTION | CASCADE | SET NULL | SET DEFAULT } ]
    );
    There is only one entry for any response or comment. All Response entries have to be tied to a comment. If a comment is deleted, all of the responses tied to that comment will be deleted.
    The Response Parent ID would signify its relationship to its parent response. If its parent is a comment, you could null it or have it reference itself to signify that your code should treat it differently. The two tables are similar enough that you could change a response into a comment with a few commands but order would be important to avoid cascading a delete. You could also probably put everything into a single table but that would take some additional effort.
     
Know someone interested in this topic? Share this thread via Reddit, Google+, Twitter, or Facebook