# Formal Definition of Inner Join?

Tags:
1. Dec 15, 2015

### WWGD

Hi All,
Hope this is not too simple/dumb. I think I have a good idea of what an inner join of two SQL tables T1, T2 along a common field F is, but , for an exercise, I am looking for a precise definition. I am looking for a definition of this sort: (please correct if necessary or let me know if this is right)
Inner Join function of relational tables T1, T2 along a common column F:
It is a function that takes as inputs a pair T1, T2 of tables and spits out a table T3 so that:

1) F does not appear in T3.

2) T3 contains as columns : all columns other than F that are either in T1 or in T2 (or both). Each of these appears exactly once in T3.

Is this right?
Thanks.

2. Dec 15, 2015

### Buzz Bloom

Hi WWGD:

I think you are mistaken. See:

Regards,
Buzz

3. Dec 15, 2015

### WWGD

Thanks, Buzz, that helps, but the example does not seem to explain what happens when we join along a column that has repetitions; I don't mean non-atomic values, but I mean different records have the same value along the field using for joining (which may happen if the field along which we join is not a primary or candidate key in at least one of the tables). Do you know what happens in that case?

4. Dec 15, 2015

### Silicon Waffle

Your question is just difficult at the internal level of how such these clauses have been implemented and optimized overtime instead. I think it is Yes, there should be a third table for your display but that is the final result. The optimization I think is performed in the join process itself i.e via choices made by the engine optimizer of keys (index, foreign, clustered, etc), of column types (either same or different), and of picking one table over the other to be the inner storing buffer, usually the smaller table is preferred, as a cache container for comparison of each columns to be done between the two, etc.

5. Dec 15, 2015

### WWGD

Thank you, but isn't it possible to give a generalization of the table one gets by joining two tables along a common field? This is all ultimately set theory, so there should be a specific answer in terms of the rows and columns of T1, T2, shouldn't it? I understand the table created is a virtual table, but still, it should be definable.

6. Dec 16, 2015

### Samy_A

If you see T1 and T2 as sets of records, and you have a field $f_1$ in T1 and a field $f_2$ in T2 you want to use for the inner join, then I think
$T3=\{(r_1,r_2) | r_1 \in T1, r_2 \in T1, r_1(f_1)=r_2(f_2)\}$
T3 is just a subset of T1xT2.

7. Dec 16, 2015

### Buzz Bloom

Hi WWGD:

What Samy said is the key:
This is true for both inner and outer joins. The difference is in how the subsets are selected. The inner join does not use NULL as a matchable value, while the outer join does. In both cases, the subset consists of those records in the cross product that have a matching value for a specified field from each source table.

You question, "Do you know what happens in that case?" is ambiguous. If you mean, "What is the result in that case?" then the above is the answer. If you mean "What goes on internally in that case?" then Silicon's answer is a good summary.

Regards,
Buzz

8. Dec 16, 2015

### Samy_A

Correct, I should have added the condition that the field values are not NULL (been hating NULL's for some 25 years now).

Nitpick: I'm not sure that an outer join is a subset of T1xT2. If we take the outer join from T1 to T2, then we get as result the union of the records in T1 and T2 that match on the chosen field(s), and those records in T1 that have no matching record in T2 (the "T2-fields" in the outer join are then filled with NULL's: I don't know if that is by definition or just standard DB-implementation).

I guess the outer join is a subset of T1x(T2∪{dummy NULL record}).

Last edited: Dec 16, 2015
9. Dec 16, 2015

### Buzz Bloom

Hi Samy:

I have never used an outer join, so my memory of its details is rather vague. From the somewhat limited description in the article that I linked to in my post #2, I get the impression that the plain vanilla outer join matches a NULL to a NULL, but there are also a left-outer and a right-outer that matches a non-NULL from one of the tables to a NULL in the other table. However, I may be misinterpreting this.

Regards,
Buzz

10. Dec 16, 2015

### WWGD

Since we have been doing this post on inner joins, we might as well extended to other types of joins. Full, outer, etc. Also, does the word 'join' alone usually imply an inner join?

11. Dec 16, 2015

### Samy_A

I have used outer joins, but as left or right outer joins. The article you linked gives examples for left outer join, right outer join and inner join, but not for what you call the plain vanilla outer join. It may well be as you describe it, I don't know.

12. Dec 16, 2015

### Samy_A

Well, for a left outer join it would be something like this (using the same terminology as before):
$T3=\{(r_1,r_2) | r_1 \in T1, r_2 \in T2, r_1(f_1)=r_2(f_2)\} \cup \{(r_1,NULL) | r_1 \in T1, \{r_2 \in T2, r_1(f_1)=r_2(f_2)\}=\phi\}$

Right outer join would be the same, with the role of T1 and T2 switched. As I wrote above I don't know what a vanilla outer join should be.

In MySQL, join and inner join are the same.

I noticed a typo in my representation for an inner join. It should have been:
$T3=\{(r_1,r_2) | r_1 \in T1, r_2 \in T2, r_1(f_1)=r_2(f_2)\}$ (and not $r_2 \in T1$).

Last edited: Dec 16, 2015
13. Dec 16, 2015

### WWGD

Thanks, Samy. I think I finally understood, in a more informal way. This is the way to join T1, T2 along the field F :

We create a table T3 in which:
1) We include all columns in F, both from T1, T2, having common records,
i.e., let F_11, F_12,..,F_1n ; F_21, F_22,...., F_2m be the columns in T1, T2 respectively containing entries from F
Then F_1j , F_2k are included , and listed in separate columns if they are in the intersection $F_1. \cap F_2.$

2) We include all rows in which F_ij, F_2k are columns of T3, as in 1) above.

14. Dec 16, 2015

### WWGD

Yes, thanks, I got it, I realized my question did not make much sense: basically, we include all rows in which we have pairs $(f_{1i}, f_{2j}): f_{1i}=f_{2j}$, together with the pair itself. Of course, $f_{1i}, f_{2j}$ are entries from the common field in tables T1, T2 respectively.

In a more direct way: table contains columns common to both fields and all rows associated with those common fields. Still kind of a mouthful; will try to simplify it. But still, internally, we have some set theory and model theory, where queries get a result if there is a model for them. A query is a wff in 1st order Predicate Calculus, and the outputs are (possible) models for the wffs.