Need help with DBMS query processing? Check out Q1 and Q2 for assistance!

  • Thread starter Thread starter momentum
  • Start date Start date
  • Tags Tags
    Processing
Click For Summary

Discussion Overview

The discussion revolves around query processing in Database Management Systems (DBMS), specifically focusing on the concepts of attributes, relations, and the mechanics of performing an equi-join operation. Participants explore the implementation details of DBMS, including file structures and indexing methods, as well as the reading and processing of data blocks during query execution.

Discussion Character

  • Technical explanation
  • Conceptual clarification
  • Debate/contested

Main Points Raised

  • Some participants describe attributes A and B as columns in a spreadsheet or fields in a file, relating them to relations R and S.
  • There is a description of how a DBMS is implemented as files with records that may be sorted or unsorted.
  • Participants explain the process of performing an equi-join between relations R and S, detailing the steps of indexing and sorting.
  • One participant questions the term "pair of file blocks," seeking clarification on its meaning in the context of reading data from files.
  • Another participant elaborates that blocks are the smallest units read from the file system, and describes how pairs of blocks from R and S are read during the join process.
  • There are inquiries about the phrase "blocks are copied into memory in order," with some participants suggesting it refers to sequential reading of blocks from the files.
  • Participants discuss whether only matching records from the join are copied into memory, confirming that the equi-join aims to copy only those records that match.
  • One participant reflects on the outdated nature of the merge-join description, suggesting modern practices involve reading more data into memory and possibly creating temporary indexes instead of sorting entire files.

Areas of Agreement / Disagreement

Participants express varying levels of understanding regarding the concepts discussed, with some points clarified while others remain uncertain or contested, particularly around the mechanics of block reading and the implications of the equi-join process.

Contextual Notes

There are limitations in the discussion regarding the assumptions made about file structures and the specifics of the equi-join process, as well as the implications of modern versus historical practices in DBMS query processing.

momentum
Messages
111
Reaction score
0
This is written in my book.

Can you please reply to these queries in Q1 & Q2.

4Gl3cAB.jpg


I'm stuck on those
 

Attachments

  • 4Gl3cAB.jpg
    4Gl3cAB.jpg
    61.2 KB · Views: 745
Technology news on Phys.org
1) A and B are attributes of relations R and S respectively. If you are thinking in terms of a spread sheet, then you could call them columns. If you are thinking of the relation as files and records, then they would be fields.
2) A DBMS is implemented as file(s). The records in those files is in some order - sorted or not.

For example:
R may be a relation that lists your customer and S may be a relation that is a phone directory.
R is implemented as a file named RF with records ordered by the "customer number" field.
S is implemented as a file names SF with records ordered by the "last name" and "first name" fields.

And say we want to do a "EQUIJOIN" between relation R, attribute "business phone" and relation S, attribute "number".
So this is what we do:
1a) If we already have an index into file RF that is sorted by "business phone", then we skip to step 2.
1b) We build an index into the records in RF that includes field "business phone".
1c) We sort that index by "business phone".

2a) If we already have an index into file SF that is sorted by "number", then we skip to step 3.
2b) We build an index into the records in SF that includes field "number".
2c) We sort that index by "number".

3) We step through those two sorted indexes looking for matches.

Relational terms: relation, tuple, attribute
File terms: file, record, fields.
Spreadsheet terms: sheet, row, column
 
Last edited:
momentum said:
This is written in my book.

What book?
 
.Scott said:
1) A and B are attributes of relations R and S respectively. If you are thinking in terms of a spread sheet, then you could call them columns. If you are thinking of the relation as files and records, then they would be fields.
2) A DBMS is implemented as file(s). The records in those files is in some order - sorted or not.

For example:
R may be a relation that lists your customer and S may be a relation that is a phone directory.
R is implemented as a file named RF with records ordered by the "customer number" field.
S is implemented as a file names SF with records ordered by the "last name" and "first name" fields.

And say we want to do a "EQUIJOIN" between relation R, attribute "business phone" and relation S, attribute "number".
So this is what we do:
1a) If we already have an index into file RF that is sorted by "business phone", then we skip to step 2.
1b) We build an index into the records in RF that includes field "business phone".
1c) We sort that index by "business phone".

2a) If we already have an index into file SF that is sorted by "number", then we skip to step 3.
2b) We build an index into the records in SF that includes field "number".
2c) We sort that index by "number".

3) We step through those two sorted indexes looking for matches.

Relational terms: relation, tuple, attribute
File terms: file, record, fields.
Spreadsheet terms: sheet, row, column

okay. But it says "Pair of file blocks" (marked in red) . What is this ? Could you please elaborate this part a bit
T6yz7ID.jpg
 

Attachments

  • T6yz7ID.jpg
    T6yz7ID.jpg
    75 KB · Views: 391
momentum said:
okay. But it says "Pair of file blocks" (marked in red) . What is this ? Could you please elaborate this part a bit
What I described was an index sort. In their description, they are creating two files, one from relation R the other from S. Both are sorted.
They are then reading from each of these files, a block at a time. Blocks are the smallest unit that can be read from the file system - typically 512, 1K, 2K, or 4K bytes. So it will read one block from R and one block from S - thus a pair.
 
.Scott said:
What I described was an index sort. In their description, they are creating two files, one from relation R the other from S. Both are sorted.
They are then reading from each of these files, a block at a time. Blocks are the smallest unit that can be read from the file system - typically 512, 1K, 2K, or 4K bytes. So it will read one block from R and one block from S - thus a pair.

Okay. this is better.

Some more doubts on this.

They also said

"blocks are copied into memory in order " ---------> what does it mean by in order here ? Is not they are copied separately

"
records are scanned ... each for matching with the other file" -------> does it mean only the matching records (i.e output from join) are copied ? Not the entire R & S is copied ?
 
momentum said:
"blocks are copied into memory in order " ---------> what does it mean by in order here ? Is not they are copied separately

First the first block of R is read into memory. Once it's clear that there are no more mathches with S possible for this block, the next block is read from the file.

"records are scanned ... each for matching with the other file" -------> does it mean only the matching records (i.e output from join) are copied ? Not the entire R & S is copied ?

Only the matching records are copied. Thats is the idea with an equi-join.

The whole description of a merge-join looks like something of 50 years ago, with tape drives and very small memories. The only thing that seems to be relevant for the runtime, is the time to read or write a block. Nowadays you try to read as much as possible of the files into memory while sorting them. A temporary index will likely be created instead of sorting the files themselves, because only the index column would have to be loaded into memory.
 
momentum said:
"blocks are copied into memory in order " ---------> what does it mean by in order here ?
It probably means "in sequence". That is, blocks are read from each file in the same (linear) sequence in which they are stored: record #1, record #2, record #3, etc.
 
Thanks . That was very much helpful.

Please see this below.

KAqzdDC.jpg
 

Attachments

  • KAqzdDC.jpg
    KAqzdDC.jpg
    60.4 KB · Views: 302

Similar threads

  • · Replies 29 ·
Replies
29
Views
7K
  • · Replies 2 ·
Replies
2
Views
2K
Replies
9
Views
3K
  • · Replies 2 ·
Replies
2
Views
2K
  • · Replies 2 ·
Replies
2
Views
611
  • · Replies 3 ·
Replies
3
Views
4K
  • · Replies 4 ·
Replies
4
Views
2K
  • · Replies 5 ·
Replies
5
Views
2K
  • · Replies 2 ·
Replies
2
Views
4K
Replies
6
Views
4K