Searching through many tables in MySQL

Wrichik Basu · May 9, 2020

I have a database of bus routes in MySQL. Each table has three columns: location1, location2 and fare for going from location1 to location2. Consider one table of this database, route_a which goes via region1 → region2→ region3 → region4:

and another table, route_b which goes via region5 → region2 → region6 → region4 → region7:

So, if I want to go from region2 to region4, I have two bus routes in my database.

Consider the case where my database contains a large number of similar tables. I want to filter out programmatically which routes go through region2 and region4. How can I do this?

I am completely new to MySQL and am open to all suggestions. If there is a better way to format my tables so as to aid this type of search, please suggest so. My ultimate aim is to implement this in an application where the user will enter the starting point and destination, and the program will search through the database for all bus routes covering these two places and display them. Then the user will choose a route and the fare will be displayed.

Ibix · May 9, 2020

Combine your tables into one table with one more column holding the route ID? Then you can search that easily.

jedishrfu · May 9, 2020

Checkout how to do a select statement with table joins.

https://www.geeksforgeeks.org/sql-join-set-1-inner-left-right-and-full-joins/

Each sql statement is pretty much a program in itself.

jack action · May 9, 2020

You shouldn't have a table for each route. You have 3 elements (route, location & fare) that can all have their own table. Then you can make other tables (in the case presented, just one) that will link all of these elements together:

SQL:

CREATE TABLE IF NOT EXISTS `route`
(
    `route_id` INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
    `name` VARCHAR(50) NOT NULL
);

CREATE TABLE IF NOT EXISTS `location`
(
    `location_id` INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
    `name` VARCHAR(50) NOT NULL
);

CREATE TABLE IF NOT EXISTS `fare`
(
    `fare_id` INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
    `price` DECIMAL(4,2) NOT NULL,
    `currency` ENUM('USD', 'EUR', 'CAD') NOT NULL
);

CREATE TABLE IF NOT EXISTS `route_fare`
(
    `route_id` INT NOT NULL,
    `location1_id` INT NOT NULL,
    `location2_id` INT NOT NULL,
    `fare_id` INT NOT NULL,
    PRIMARY KEY (`route_id`, `location1_id`, `location2_id`),
    CONSTRAINT `route_id_fk` FOREIGN KEY (`route_id`) REFERENCES `route`(`route_id`) ON DELETE CASCADE ON UPDATE CASCADE,
    CONSTRAINT `location1_fk` FOREIGN KEY (`location1_id`) REFERENCES `location`(`location_id`) ON DELETE CASCADE ON UPDATE CASCADE,
    CONSTRAINT `location2_fk` FOREIGN KEY (`location2_id`) REFERENCES `location`(`location_id`) ON DELETE CASCADE ON UPDATE CASCADE,
    CONSTRAINT `fare_id_fk` FOREIGN KEY (`fare_id`) REFERENCES `fare`(`fare_id`) ON DELETE CASCADE ON UPDATE CASCADE
);

I haven't tried the code, so there may be some errors, but the idea is there. So in your `route` table you can add a column for any property of each route (I just added a `name` column which could contained 'route_a' or 'route_b'). Same thing for the `location` table, where you could add columns for latitude and longitude (or better a POINT spatial data type). For the `fare` table, I added a price value and a currency, as an example of what type of property a fare could have. Note that all ids of the PRIMARY KEYs, for any table, are integers.

Then, you create your table `route_fare` that links all of these data together. The table contains only the ids of the different values. The PRIMARY KEY makes a set of `route_id` , `location1_id` &`location2_id` unique (i.e. it can only have one fare). Furthermore, each FOREIGN KEY links the value from a column to the corresponding value in the appropriate table. The ON DELETE CASCADE means that if the route with route_id = 1 was deleted, then all rows in the `route_fare` table with route_id = 1 would also be deleted automatically. The ON UPDATE CASCADE means a similar thing if, for example, the route_id = 1 would be changed to route_id = 12: all corresponding route_id with value = 1 would also be updated automatically. Note how the `location1_id` &`location2_id` columns are both linked to the same `location_id` column of the `location` table.

With such an arrangement of tables, it will be a lot easier to expand your data set (adding properties to your route, location or fare) and you will also have a lot more fun with table joins as stated by @jedishrfu .

For your particular request the SELECT statement could look like this:

SQL:

SELECT
    r.`name`
FROM `route_fare` AS rf
JOIN `route` AS r USING (`route_id`)
JOIN `location` AS l1 ON rf.`location1` = l1.`location_id`
JOIN `location` AS l2 ON rf.`location2` = l2.`location_id`
WHERE
    l1.`name` = 'region2'
    AND l2.`name` = 'region4';

QuantumQuest · May 9, 2020

I have a database of bus routes in MySQL. Each table has three columns: location1, location2 and fare for going from location1 to location2. Consider one table of this database, route_a which goes via region1 → region2→ region3 → region4:

and another table, route_b which goes via region5 → region2 → region6 → region4 → region7:

So, if I want to go from region2 to region4, I have two bus routes in my database.

Consider the case where my database contains a large number of similar tables. I want to filter out programmatically which routes go through region2 and region4. How can I do this?

I am completely new to MySQL and am open to all suggestions. If there is a better way to format my tables so as to aid this type of search, please suggest so. My ultimate aim is to implement this in an application where the user will enter the starting point and destination, and the program will search through the database for all bus routes covering these two places and display them. Then the user will choose a route and the fare will be displayed.

I would recommend downloading MySQL Workbench, if you have not already done so. There, you can first design an E-R Model - you can do it in small steps as you are a beginner, by examining what entities you need and the relationships between them. This way, you can create a model with no redundancies, functional dependencies etc., which you'll learn along the way if you're taking a course on databases / RDBMS or even doing your own readings from some good tutorial / book. The point is that there are rules that you have to take care of, in order to construct an E-R Model for your database which you'll implement using tables.

You can take a look at this older post by me here, in order to get just some general idea about normal forms of tables - if it is of any help right now. In any case, in order to learn relational database concepts correctly don't hasten; take things one step at a time.

Svein · May 10, 2020

This kind of problem is studied in detail in data packet routing. I suggest you look up https://en.wikipedia.org/wiki/Routing_protocol.

Wrichik Basu · May 10, 2020

@jack action When you are creating the table route_fare and defining the foreign keys, are the entries copied from the respective tables to the new table, or do the entries of this new table just point to the cells of the other tables?

Wrichik Basu · May 10, 2020

jedishrfu said:

Checkout how to do a select statement with table joins.

https://www.geeksforgeeks.org/sql-join-set-1-inner-left-right-and-full-joins/

I can join the tables in the select statement, but consider the case where I have, say, around 100 tables. I would not consider it efficient if I have to join 100 tables for finding something at runtime. I would consider merging them beforehand.

Wrichik Basu · May 10, 2020

Ibix said:

Combine your tables into one table with one more column holding the route ID? Then you can search that easily.

The simplest idea perhaps, but I am thinking about the size of that merged table. I would certainly not want to pack a 50 MB table with my app.

Ibix · May 10, 2020

Wrichik Basu said:

The simplest idea perhaps, but I am thinking about the size of that merged table. I would certainly not want to pack a 50 MB table with my app.

Will it be any smaller stored as separate tables? If the route and location names are long, store them in a separate table with short IDs and just store the IDs in the route table.

jack action · May 10, 2020

Wrichik Basu said:

@jack action When you are creating the table route_fare and defining the foreign keys, are the entries copied from the respective tables to the new table, or do the entries of this new table just point to the cells of the other tables?

I'm not sure I fully understand your question, because I think you don't fully understand the concept of foreign key (It is hard to understand at first, when you don't know such a tool exists). You still have to fill the route_fare table; the foreign keys will just prevent you from entering wrong data. Read this tutorial to familiarize yourself with the concept.

Also, understand that foreign key is not an essential part of the concept I presented (i.e. having tables to define each of your object and tables to define the relations between your objects). Everything would work just the same without it. It's a nice add-on (you can even add it later, even if your tables are already filled). But it will help maintaining your tables, reducing [human] input errors.

Wrichik Basu said:

Ibix said:

Combine your tables into one table with one more column holding the route ID? Then you can search that easily.

The simplest idea perhaps, but I am thinking about the size of that merged table. I would certainly not want to pack a 50 MB table with my app.

What @Ibix is saying is basically the same thing as what I presented. A table filled with integers is the most efficient thing you can have, size-wise and speed-wise.

Speaking of integers, if you have a set of pre-defined values (like 'region2' and 'region4') consider using the ENUM data type (like I did with currency in table fare). In such a case, an integer is assigned to each value and that is what is stored instead of a string. It's a lot easier to read a table like that (strings as opposed to a bunch of meaningless integers) and you may even save a table join.

For example, your location & route_fare tables could be:

SQL:

CREATE TABLE IF NOT EXISTS `location`
(
    `location_id` ENUM('region1', 'region2', 'region3', 'region4', 'region5', 'region6', 'region7', 'region8') NOT NULL PRIMARY KEY
);

CREATE TABLE IF NOT EXISTS `route_fare`
(
    `route_id` INT NOT NULL,
    `location1_id` ENUM('region1', 'region2', 'region3', 'region4', 'region5', 'region6', 'region7', 'region8') NOT NULL,
    `location2_id` ENUM('region1', 'region2', 'region3', 'region4', 'region5', 'region6', 'region7', 'region8') NOT NULL,
    `fare_id` INT NOT NULL,
    PRIMARY KEY (`route_id`, `location1_id`, `location2_id`),
    CONSTRAINT `route_id_fk` FOREIGN KEY (`route_id`) REFERENCES `route`(`route_id`) ON DELETE CASCADE ON UPDATE CASCADE,
    CONSTRAINT `location1_fk` FOREIGN KEY (`location1_id`) REFERENCES `location`(`location_id`) ON DELETE CASCADE ON UPDATE CASCADE,
    CONSTRAINT `location2_fk` FOREIGN KEY (`location2_id`) REFERENCES `location`(`location_id`) ON DELETE CASCADE ON UPDATE CASCADE,
    CONSTRAINT `fare_id_fk` FOREIGN KEY (`fare_id`) REFERENCES `fare`(`fare_id`) ON DELETE CASCADE ON UPDATE CASCADE
);

And your query becomes simply:

SQL:

SELECT
    r.`name`
FROM `route_fare` AS rf
JOIN `route` AS r USING (`route_id`)
WHERE
   rf.`location1` = 'region2'
    AND rf.`location2` = 'region4';

It might look stupid to have the table location with a single column, but you will be glad you have it when you will decide in the future to add properties to your locations (like latitude & longitude, population, touristic rating, ...).

Wrichik Basu · May 10, 2020

jack action said:

I'm not sure I fully understand your question, because I think you don't fully understand the concept of foreign key (It is hard to understand at first, when you don't know such a tool exists). You still have to fill the route_fare table; the foreign keys will just prevent you from entering wrong data.

Yeah, it is a bit difficult to understand the first time. I had a wrong concept regarding foreign keys; thanks for straightening it.

pbuk · May 11, 2020

Wrichik Basu said:

I can join the tables in the select statement, but consider the case where I have, say, around 100 tables. I would not consider it efficient if I have to join 100 tables for finding something at runtime. I would consider merging them beforehand.

I think you are missing the point of database normalisation. If you have 100 tables then you would be dealing with many different things like routes, locations, fares, drivers, passengers, times etc., there wouldn't be a way to merge them.

And the only reason you would have 100 joins in a query would be if you wanted to find all the passengers that passed through location A on a number 37 bus driven by driver B on a Tuesday when the bus was either less than 1 year old or over 3 years old and was last serviced by engineer C...(keep going until you hit 100 ).

Wrichik Basu · May 12, 2020

pbuk said:

I think you are missing the point of database normalisation. If you have 100 tables then you would be dealing with many different things like routes, locations, fares, drivers, passengers, times etc., there wouldn't be a way to merge them.

And the only reason you would have 100 joins in a query would be if you wanted to find all the passengers that passed through location A on a number 37 bus driven by driver B on a Tuesday when the bus was either less than 1 year old or over 3 years old and was last serviced by engineer C...(keep going until you hit 100 ).

Initially I was trying to assign one table to each bus route. This table contained the two locations (starting point and destination), and the fare. Later I switched to what @jack action suggested above.

Wrichik Basu · May 12, 2020

I have one question. Consider the case of duplicates: region3 is also known by the name region4. So, someone can search for routes from region1 to region3 or region4; both should return the same results. This is not very difficult to implement; I would just add the two duplicate locations separately in the route_fare table and display the same fare for them. Say route_c joins region1 to region3 (aka region4).

Now, the user wants to view all the stops in route_c. Normally, this can be easily selected out of the route_fare table. But while displaying the list of all stops, I would like to indicate which two places are same; so the list would be something like:
region1 → region3/region4 → region6 → region2 → region7/region9 → ....

How can I indicate in the location table that two places are same?

pbuk · May 12, 2020

Wrichik Basu said:

How can I indicate in the location table that two places are same?

[Edited - wrote 1 to many but coded many to many!]
Two ways, either:

recognise that physical locations and location names are two different entities so need two different tables - this is the best solution if you are starting from scratch; or
add another table of location aliases recognising a that a location may have 0, 1 or many aliases - this is a common compromise as it means less change to the existing codebase in general, and in particular none at all to the existing locations schema.

Schema 1

locations	location_names
id	id
lat_long	fk_locations
...	name

Schema 2

locations	location_alias
id	id
name	fk_locations
lat_long	alias
...

jack action · May 12, 2020

This is exactly why I told it is always good to set up a table for an object, even if it has only one column that represents an arbitrary ID for the object. Let's go back to my example in post #11:

SQL:

CREATE TABLE IF NOT EXISTS `location`
(
    `location_id` ENUM('region1', 'region2', 'region3', 'region4', 'region5', 'region6', 'region7', 'region8') NOT NULL PRIMARY KEY
);

Now, you just discovered that you need to store new information about your locations: they may have different names. You actually have a list of names for every object (lists that may be of different length). You should recognized that this is an array object in any other programming language.

You could here simply add a string column with a list of names and search through those. That will require that you create a function to be used in your WHERE clause (MySQL have JSON data type that could be helpful here). But that type of solution is not very 'SQL'.

You should notice that a table in MySQL is an array. It simply a matter of having a name-value pair. You look for a name, it returns a value. So you add another table to 'extend' your location table. You may look at this table as an extra column name for the table location that can store an array:

SQL:

CREATE TABLE IF NOT EXISTS `location_name`
(
    `location_id` ENUM('region1', 'region2', 'region3', 'region4', 'region5', 'region6', 'region7', 'region8') NOT NULL,
    `name` VARCHAR(50) NOT NULL,
    PRIMARY KEY (`location_id`, `name`),
    CONSTRAINT `location_id_fk` FOREIGN KEY (`location_id`) REFERENCES `location`(`location_id`) ON DELETE CASCADE ON UPDATE CASCADE
);

The PRIMARY KEY makes sure you cannot enter the same name twice for the same location. But you could have the same name for two different locations. If you don't want that, you can add a UNIQUE KEY on the name column or simply set your name column as your PRIMARY KEY.

Now you can modify your SELECT request from post #11 to represent your new reality:

SQL:

SELECT
    r.`name`
FROM `route_fare` AS rf
JOIN `route` AS r USING (`route_id`)
JOIN `location` AS l1 ON rf.`location1` = l1.`location_id`
JOIN `location` AS l2 ON rf.`location2` = l2.`location_id`
JOIN `location_name` AS ln1 ON l1.`location_id` = ln1.`location_id`
JOIN `location_name` AS ln2 ON l2.`location_id` = ln2.`location_id`
WHERE
   ln1.`name` = 'region2'
    AND ln2.`name` = 'region4';

If you build your tables correctly in the beginning, you will never need to modify them. You just add to them as you add new information. Furthermore, making the correct KEYs (PRIMARY, UNIQUE, FOREIGN) along the way will mean that your requests will probably be optimized from the get-go.

Wrichik Basu · May 12, 2020

jack action said:

You should notice that a table in MySQL is an array. It simply a matter of having a name-value pair. You look for a name, it returns a value. So you add another table to 'extend' your location table. You may look at this table as an extra column name for the table location that can store an array:

In this new table, I will add only the duplicate names for the entries that I have already added in the location table, right? So, for example, region3 would be in the location table, while region4 would be in location_name table, both sharing the same location_id. If a place has multiple names, I can simply add them in the second table.

pbuk · May 12, 2020

jack action said:
SQL:
CREATE TABLE IF NOT EXISTS `location_name`
(
    `location_id` ENUM('region1', 'region2', 'region3', 'region4', 'region5', 'region6', 'region7', 'region8') NOT NULL,
    `name` VARCHAR(50) NOT NULL,
    PRIMARY KEY (`location_id`, `name`),
    CONSTRAINT `location_id_fk` FOREIGN KEY (`location_id`) REFERENCES `location`(`location_id`) ON DELETE CASCADE ON UPDATE CASCADE
);
The PRIMARY KEY makes sure you cannot enter the same name twice for the same location. But you could have the same name for two different locations. If you don't want that, you can add a UNIQUE KEY on the name column or simply set your name column as your PRIMARY KEY.

I can see the advantages in that approach but I have a personal aversion to mutable primary keys (for example what happens when you discover the name is misspelt?), admittedly based partly on the fact that I have been doing this stuff since before such things were possible.

I mention this just to highlight the point that in database design there is never one 'right' answer you can pull off the shelf, although there certainly are a few wrong ones. In the real world, normalisation is a journey not a destination.

(Edited to insert an example of when the proposed PK may not be immutable)

jack action · May 12, 2020

Wrichik Basu said:

In this new table, I will add only the duplicate names for the entries that I have already added in the location table, right? So, for example, region3 would be in the location table, while region4 would be in location_name table, both sharing the same location_id. If a place has multiple names, I can simply add them in the second table.

There are several approaches you could use. The one I used in my example assumed all names are of equal value, so you should put all of them in the location_name table. I kept the ENUM version has it changes nothing. It is still an integer to the eyes of MySQL. (It just shows you how easily expandable the table is.)

But with this new method, you would add the 'original' name and all of the duplicates. Say you had originally a location with location_id = 'region3'. Now that you created your location_name table, you want to have the duplicates 'region30' and 'region_300'. Your table entries would look like:

location_id	name
region3	region3
region3	region30
region3	region300

If you had set location_id to INT (with a name column) instead of ENUM in the location table, the location_name table would look like (assuming 'region3' has location_id = 3):

location_id	name
3	region3
3	region30
3	region300

In that case the name column of the location table becomes unnecessary.

But if you want to keep an 'official' version of the name then you can keep it in the location table and just add the duplicates in the location_name table. Then, it is only a matter of creating a different WHERE clause where you check both the name in the location table and the duplicates in the location_name table, like so:

SQL:

SELECT
    r.`name`
FROM `route_fare` AS rf
JOIN `route` AS r USING (`route_id`)
JOIN `location` AS l1 ON rf.`location1` = l1.`location_id`
JOIN `location` AS l2 ON rf.`location2` = l2.`location_id`
JOIN `location_name` AS ln1 ON l1.`location_id` = ln1.`location_id`
JOIN `location_name` AS ln2 ON l2.`location_id` = ln2.`location_id`
WHERE
  ( ln1.`name` = 'region2' || l1.`location_id` = 'region2')
    AND (ln2.`name` = 'region4' || l2.`location_id` = 'region4');

It really depends on what your property represents and what future use you may make of it.

jack action · May 12, 2020

pbuk said:

I can see the advantages in that approach but I have a personal aversion to mutable primary keys (for example what happens when you discover the name is misspelt?), admittedly based partly on the fact that I have been doing this stuff since before such things were possible.

I can agree with that. I struggle a long time with the question "Is ENUM evil?" that you can find everywhere on the web. At first, I was tempted to not used it. But when I was trying to optimize some tables, I often had MySQL proposing ENUM instead of string as a data type. I finally came to the conclusion to use them where these simple rules apply:

It must be a set with a more or less defined length (like countries, colors, sizes, etc.);
I don't use the string as a direct output to present to the client.

The second point is rather important, as sometimes you may have to present it in different languages or someone in the future wants to use the word 'big' instead of 'large'. So you should have a 'translator' somehow somewhere.

The big advantage is that it is a lot easier to study tables with strings rather than a bunch of meaningless integers, without always having to make complicated requests with a lot of JOIN statements. You can also have a code that is more readable. It is a lot easier to understand SELECT * FROM `shirt` WHERE `size` = 'big' rather than SELECT * FROM `shirt` WHERE `size` = 2, even if once the data is retrieved, 'big' is used to retrieved the string 'large' (anyway, nobody will use '2' either).

But I agree that the use for a primary key is not the best, especially if you have lot of foreign keys linking to that column on other tables.

Wrichik Basu · May 13, 2020

jack action said:
But if you want to keep an 'official' version of the name then you can keep it in the location table and just add the duplicates in the location_name table. Then, it is only a matter of creating a different WHERE clause where you check both the name in the location table and the duplicates in the location_name table, like so:
SQL:
SELECT
    r.`name`
FROM `route_fare` AS rf
JOIN `route` AS r USING (`route_id`)
JOIN `location` AS l1 ON rf.`location1` = l1.`location_id`
JOIN `location` AS l2 ON rf.`location2` = l2.`location_id`
JOIN `location_name` AS ln1 ON l1.`location_id` = ln1.`location_id`
JOIN `location_name` AS ln2 ON l2.`location_id` = ln2.`location_id`
WHERE
  ( ln1.`name` = 'region2' || l1.`location_id` = 'region2')
    AND (ln2.`name` = 'region4' || l2.`location_id` = 'region4');

I am using this approach. There are two things that need to be added to this code:

1. Use LEFT JOIN for joining the table location_name. Otherwise, the query would default to INNER JOIN and only those places would be taken into for which duplicates exist.

2. It is possible, as you have yourself shown, that a place has more than one duplicate names. In that case, the query would return the same route multiple times. So, the selection should be DISTINCT.

The overall code looks like this:

SQL:

SELECT DISTINCT
    r.`name`
FROM `route_fare` AS rf
JOIN `route` AS r USING (`route_id`)
JOIN `location` AS l1 ON rf.`location1` = l1.`location_id`
JOIN `location` AS l2 ON rf.`location2` = l2.`location_id`
LEFT JOIN `location_name` AS ln1 ON l1.`location_id` = ln1.`location_id`
LEFT JOIN `location_name` AS ln2 ON l2.`location_id` = ln2.`location_id`
WHERE
  ( ln1.`name` = 'region2' || l1.`location_id` = 'region2')
    AND (ln2.`name` = 'region4' || l2.`location_id` = 'region4');

Searching through many tables in MySQL

AI vs. Humans as Processors in an Environment

Sweetspot of data compression

How to show RS(U+TRS)* is equivalent to (R+SUT)SU?

HTML/CSS Problems with DNS records

PHP My website presents the visitor with the choice of opting out of using cookies....

Insights Revisiting the Velocity-Time Function

Insights Remote Operated Gate Control System

Insights AI Enriched Problem Solving

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect

Searching through many tables in MySQL

Similar threads