Formal Definition of Inner Join?

  • Thread starter Thread starter WWGD
  • Start date Start date
  • Tags Tags
    Definition Sql
Click For Summary

Discussion Overview

The discussion revolves around the formal definition of an inner join in SQL, particularly focusing on its characteristics and implications when joining two tables along a common field. Participants explore the theoretical underpinnings, practical examples, and nuances of inner joins, as well as comparisons with outer joins.

Discussion Character

  • Technical explanation
  • Conceptual clarification
  • Debate/contested
  • Mathematical reasoning

Main Points Raised

  • One participant proposes a definition of an inner join that describes it as a function producing a new table without the common field and containing all other columns from the two tables.
  • Another participant challenges this definition, suggesting that it does not account for cases where the joining field has repeated values across records.
  • A participant discusses the internal optimization processes involved in joins, mentioning factors like indexing and the choice of tables for buffering during the join operation.
  • Some participants express uncertainty about the implications of NULL values in joins, particularly in the context of outer joins versus inner joins.
  • There is a suggestion that the discussion could be expanded to include other types of joins, such as left and right outer joins, and whether the term "join" typically implies an inner join.
  • A later reply clarifies the mathematical representation of inner and outer joins, emphasizing the role of matching values and the inclusion of NULLs in outer joins.
  • Participants note the need for a precise definition of the resulting table from an inner join, indicating that it should be definable in terms of set theory.

Areas of Agreement / Disagreement

Participants express differing views on the definition and implications of inner joins, particularly regarding the treatment of repeated values and NULLs. There is no consensus on a single definition or understanding of the inner join, and discussions about outer joins introduce further complexity.

Contextual Notes

Some limitations in the discussion include the ambiguity around the handling of NULL values in joins, the specifics of implementation in different database systems, and the varying interpretations of outer joins compared to inner joins.

WWGD
Science Advisor
Homework Helper
Messages
7,804
Reaction score
13,107
Hi All,
Hope this is not too simple/dumb. I think I have a good idea of what an inner join of two SQL tables T1, T2 along a common field F is, but , for an exercise, I am looking for a precise definition. I am looking for a definition of this sort: (please correct if necessary or let me know if this is right)
Inner Join function of relational tables T1, T2 along a common column F:
It is a function that takes as inputs a pair T1, T2 of tables and spits out a table T3 so that:

1) F does not appear in T3.

2) T3 contains as columns : all columns other than F that are either in T1 or in T2 (or both). Each of these appears exactly once in T3.

Is this right?
Thanks.
 
  • Like
Likes   Reactions: Silicon Waffle
Technology news on Phys.org
Hi WWGD:

I think you are mistaken. See:
http://www.programmerinterview.com/index.php/database-sql/inner-vs-outer-joins/ .​

Regards,
Buzz
 
  • Like
Likes   Reactions: WWGD
Thanks, Buzz, that helps, but the example does not seem to explain what happens when we join along a column that has repetitions; I don't mean non-atomic values, but I mean different records have the same value along the field using for joining (which may happen if the field along which we join is not a primary or candidate key in at least one of the tables). Do you know what happens in that case?
 
Your question is just difficult at the internal level of how such these clauses have been implemented and optimized overtime instead. I think it is Yes, there should be a third table for your display but that is the final result. The optimization I think is performed in the join process itself i.e via choices made by the engine optimizer of keys (index, foreign, clustered, etc), of column types (either same or different), and of picking one table over the other to be the inner storing buffer, usually the smaller table is preferred, as a cache container for comparison of each columns to be done between the two, etc.
 
Silicon Waffle said:
Your question is just difficult at the internal level of how such these clauses have been implemented and optimized overtime instead. I think it is Yes, there should be a third table for your display but that is the final result. The optimization I think is performed in the join process itself i.e via choices made by the engine optimizer of keys (index, foreign, clustered, etc), of column types (either same or different), and of picking one table over the other to be the inner storing buffer, usually the smaller table is preferred, as a cache container for comparison of each columns to be done between the two, etc.
Thank you, but isn't it possible to give a generalization of the table one gets by joining two tables along a common field? This is all ultimately set theory, so there should be a specific answer in terms of the rows and columns of T1, T2, shouldn't it? I understand the table created is a virtual table, but still, it should be definable.
 
  • Like
Likes   Reactions: Silicon Waffle
If you see T1 and T2 as sets of records, and you have a field ##f_1## in T1 and a field ##f_2## in T2 you want to use for the inner join, then I think
##T3=\{(r_1,r_2) | r_1 \in T1, r_2 \in T1, r_1(f_1)=r_2(f_2)\}##
T3 is just a subset of T1xT2.
 
  • Like
Likes   Reactions: WWGD
Hi WWGD:

What Samy said is the key:
Samy_A said:
T3 is just a subset of T1xT2.
This is true for both inner and outer joins. The difference is in how the subsets are selected. The inner join does not use NULL as a matchable value, while the outer join does. In both cases, the subset consists of those records in the cross product that have a matching value for a specified field from each source table.

You question, "Do you know what happens in that case?" is ambiguous. If you mean, "What is the result in that case?" then the above is the answer. If you mean "What goes on internally in that case?" then Silicon's answer is a good summary.

Regards,
Buzz
 
  • Like
Likes   Reactions: WWGD
Correct, I should have added the condition that the field values are not NULL (been hating NULL's for some 25 years now).

Nitpick: I'm not sure that an outer join is a subset of T1xT2. If we take the outer join from T1 to T2, then we get as result the union of the records in T1 and T2 that match on the chosen field(s), and those records in T1 that have no matching record in T2 (the "T2-fields" in the outer join are then filled with NULL's: I don't know if that is by definition or just standard DB-implementation).

I guess the outer join is a subset of T1x(T2∪{dummy NULL record}).
 
Last edited:
  • Like
Likes   Reactions: WWGD, Silicon Waffle and Greg Bernhardt
Samy_A said:
Nitpick: I'm not sure that an outer join is a subset of T1xT2.
Hi Samy:

I have never used an outer join, so my memory of its details is rather vague. From the somewhat limited description in the article that I linked to in my post #2, I get the impression that the plain vanilla outer join matches a NULL to a NULL, but there are also a left-outer and a right-outer that matches a non-NULL from one of the tables to a NULL in the other table. However, I may be misinterpreting this.

Regards,
Buzz
 
  • #10
Since we have been doing this post on inner joins, we might as well extended to other types of joins. Full, outer, etc. Also, does the word 'join' alone usually imply an inner join?
 
  • #11
Buzz Bloom said:
Hi Samy:

I have never used an outer join, so my memory of its details is rather vague. From the somewhat limited description in the article that I linked to in my post #2, I get the impression that the plain vanilla outer join matches a NULL to a NULL, but there are also a left-outer and a right-outer that matches a non-NULL from one of the tables to a NULL in the other table. However, I may be misinterpreting this.

Regards,
Buzz
I have used outer joins, but as left or right outer joins. The article you linked gives examples for left outer join, right outer join and inner join, but not for what you call the plain vanilla outer join. It may well be as you describe it, I don't know.
 
  • #12
WWGD said:
Since we have been doing this post on inner joins, we might as well extended to other types of joins. Full, outer, etc. Also, does the word 'join' alone usually imply an inner join?
Well, for a left outer join it would be something like this (using the same terminology as before):
##T3=\{(r_1,r_2) | r_1 \in T1, r_2 \in T2, r_1(f_1)=r_2(f_2)\} \cup \{(r_1,NULL) | r_1 \in T1, \{r_2 \in T2, r_1(f_1)=r_2(f_2)\}=\phi\} ##

Right outer join would be the same, with the role of T1 and T2 switched. As I wrote above I don't know what a vanilla outer join should be.

In MySQL, join and inner join are the same.

I noticed a typo in my representation for an inner join. It should have been:
##T3=\{(r_1,r_2) | r_1 \in T1, r_2 \in T2, r_1(f_1)=r_2(f_2)\}## (and not ##r_2 \in T1##).
 
Last edited:
  • #13
Thanks, Samy. I think I finally understood, in a more informal way. This is the way to join T1, T2 along the field F :

We create a table T3 in which:
1) We include all columns in F, both from T1, T2, having common records,
i.e., let F_11, F_12,..,F_1n ; F_21, F_22,..., F_2m be the columns in T1, T2 respectively containing entries from F
Then F_1j , F_2k are included , and listed in separate columns if they are in the intersection ## F_1. \cap F_2. ##

2) We include all rows in which F_ij, F_2k are columns of T3, as in 1) above.
 
  • #14
Buzz Bloom said:
Hi WWGD:

What Samy said is the key:

This is true for both inner and outer joins. The difference is in how the subsets are selected. The inner join does not use NULL as a matchable value, while the outer join does. In both cases, the subset consists of those records in the cross product that have a matching value for a specified field from each source table.

You question, "Do you know what happens in that case?" is ambiguous. If you mean, "What is the result in that case?" then the above is the answer. If you mean "What goes on internally in that case?" then Silicon's answer is a good summary.

Regards,
Buzz
Yes, thanks, I got it, I realized my question did not make much sense: basically, we include all rows in which we have pairs ##(f_{1i}, f_{2j}): f_{1i}=f_{2j} ##, together with the pair itself. Of course, ##f_{1i}, f_{2j} ## are entries from the common field in tables T1, T2 respectively.

In a more direct way: table contains columns common to both fields and all rows associated with those common fields. Still kind of a mouthful; will try to simplify it. But still, internally, we have some set theory and model theory, where queries get a result if there is a model for them. A query is a wff in 1st order Predicate Calculus, and the outputs are (possible) models for the wffs.
 

Similar threads

  • · Replies 16 ·
Replies
16
Views
4K
  • · Replies 51 ·
2
Replies
51
Views
6K
  • · Replies 3 ·
Replies
3
Views
2K
  • · Replies 5 ·
Replies
5
Views
3K
  • · Replies 1 ·
Replies
1
Views
2K
  • · Replies 18 ·
Replies
18
Views
5K
  • · Replies 10 ·
Replies
10
Views
3K
Replies
7
Views
2K
Replies
1
Views
4K
  • · Replies 15 ·
Replies
15
Views
3K