How are Most Databases Created?

WWGD · Feb 18, 2018

Hi just curious. Also considering the data , not just the structure. I am thinking of the larger ones
Are they scripted? Done with GUI ( And then scripted, or viceversa?) . I mean, I am trying to install Adventure works, with thousands of rows . What of those with thousands of fields? Not likely to be scripted, unless there ( by restoring it )are a few templates that are often copied?
Thanks.

jedishrfu · Feb 18, 2018

I’m not sure what you mean by scripted. Most database backed apps have used scripts to create the schema tables and to populate some tables with data leaving other tables empty by design. As an example, an ecommerce website would have tables with product information loaded but tables of customer address information and transactions are built as people use the website to buy stuff.

A common schema is the star schema composed of small dimension tables with predefined rows (a dimension table of US states and their abbreviations as an example) and a large fact table binding things together (each row might be a customer with a customer-id, name, address and an index to the state table)

WWGD · Feb 18, 2018

jedishrfu said:

I’m not sure what you mean by scripted. Most database backed apps have used scripts to create the schema tables and to populate some tables with data leaving other tables empty by design. As an example, an ecommerce website would have tables with product information loaded but tables of customer address information and transactions are built as people use the website to buy stuff.

A common schema is the star schema composed of small dimension tables with predefined rows (a dimension table of US states and their abbreviations as an example) and a large fact table binding things together (each row might be a customer with a customer-id, name, address and an index to the state table)

Thanks, I meant whether a template script is copied and then run/executed ( as a query), rather than written from scratch, especially the larger ones.

jedishrfu · Feb 18, 2018

Sometimes the data comes from another database and so they dump it to a file and then load into the new database with the same schema. Or they might load it and then apply some sql to transform it and load into new tables.

Look for ETL Processing to get an idea of how data is extracted, transformed and loaded into a database in a commercial environment.

https://en.m.wikipedia.org/wiki/Extract,_transform,_load

WWGD · Feb 18, 2018

Thanks; I have done the baby stuff like Excel's and SQL Server's Import/Export, but not the larger-scale data stuff; will check it out. I had worked only with databases with at most 10 fields, each containing at most 150 rows. And a bit of scraping, but I am not sure how I made it work.

jedishrfu · Feb 18, 2018

Okay, my experience is with ecommerce databases with hundreds of tables and millions of rows some of which were used in targeted ad campaigns.

You could also check out Codds rules for relational databases too. They suggest never to repeat data in your tables but instead reference it from other tables.

https://en.m.wikipedia.org/wiki/Codd's_12_rules

WWGD · Feb 18, 2018

jedishrfu said:

Okay, my experience is with ecommerce databases with hundreds of tables and millions of rows some of which were used in targeted ad campaigns.

You could also check out Codds rules for relational databases too. They suggest never to repeat data in your tables but instead reference it from other tables.

https://en.m.wikipedia.org/wiki/Codd's_12_rules

Thanks again; sorry to prolong this, but, what program/tchniques do you use for massive data import/export, using millions of rows?

jedishrfu · Feb 18, 2018

Your database software should have a load save option. For ETL, folks either roll their own or buy a third party product.

Here’s one such product

https://www.informationbuilders.com/etl-tools

Michael27 · Feb 19, 2018

Products like Oracle have an import and export function that allows you to copy a database including the data and the datastructures, the pl/sql and triggers etc.
However maintenance on a database which has lots of records can be very tricky.

Updating tables (fields, triggers, keys/indices) can cause data integrity failures which you need to address while you make the change. Using third party products can reduce the implementation and or migration of the database datastructures changes. We use our own product that parses the datastructures and then applies those changes in such a way that there will be a script following the changes which transforms datastructure and the data in such a way this will not lead to conflicts. This is all done in scripts and can takes hours (even days depending on the speed of the database/data volume)
Updating business rules (pl/sql) can also cause data to become invalid however these changes are less obvious and can be rather hard to detect. Even very well maintained relational models will have a lot of problems sorting out what needs to change in the database if those rules change. It usually lies with the developer to address this.

products that will allow you to maintain and design database without having to script are usually expensive and big
Examples are https://erwin.com/products/data-modeler/ for modelling and https://www-03.ibm.com/software/products/nl/enterprise for creating the database from the model. These are just examples to give you a feeling of what is out there. There are so many ways to skin a cat that it is pointless to list them all.

newjerseyrunner · Feb 20, 2018

For creating fresh databases, bash scripts tend to be used. It garentees sameness and also allows you to see version history because you can check your script into a subversion. Liquibase is also something commonly used because it actually allows for versioned databases that can be moved forwards and backwards in schema.

As for copying preexisting databases, I exclusively use mysqldump, which should come with any MySQL server package.

jedishrfu · Feb 20, 2018

We used SQL scripts to create our database schemas, tables, indexes and then used either a CSV formatted file or sql to load tables with specific values. The only times a bash script was used was to glue everything together and allow for some custom values to be inserted into key tables or to select the CSV file that we used as data input.

https://en.wikipedia.org/wiki/SQL_syntax

newjerseyrunner · Feb 20, 2018

Let me clarify that I simply used bash to run MySQL commands. The reason I do that rather than directly using a .sql file is because setting up the database is usually only one step of a much larger process and I like to be able to go from a generic fresh Linux install to a full slave with everything installed and set up by running one script (and by letting a tool like Jenkins do it for me.)

WWGD · Feb 20, 2018

Thanks all,
A followup. Do we then use bulk import to populate the database? I know we can use Import/Export Wizard, but, do we also use Bulk import?

Michael27 · Mar 14, 2018

I will assume you mean with bulk import features that will allow you to add a lot of rows of data at once using the database features like for instance MSSQL (BULK INSERT ...)
First of all this works with linear data very well. However most databases are as you know relational. When you start adding in bulk, the constraints of the database usually imply the order in which data needs to be added.
If you have foreign key relations you need to have said foreign keys added into the database before you add records that require them. This limits the way you can add data in bulk.
Another thing to consider is that inserting records (even in bulk) will set off triggers which sometimes require additional information to verify if a record is correct. This can hinder your approach seriously when adding data in bulk. Disabling the triggers may be your only option besides analysing the data and reordering the way you import data.
The approach we use is to add data using a standardized interface. It allows for adding data from system to system which have completely different origins/data/purposes. By using such an interface data can be added to the database from another host. It is slower but has a few good advantages.

The process can easily be repeated (making it possible to use different databases with different starting configurations)
Every data entry traced back to an interface file and error handling can be done very specifically.
Business Rules are always working
The interface explicitly states the data requirements (order, existence, multiplicity, absence, relationships etc.) making errors to occur before the data is inserted. Rolling back a failed transaction is much cheaper and faster.

rootone · Mar 18, 2018

Databases existed before computers. (Say library of Alexandria?)
Computers just are faster and not so much subject to being claimed as spoils of war.

WWGD · Mar 18, 2018

rootone said:

Databases existed before computers. (Say library of Alexandria?)
Computers just are faster and not so much subject to being claimed as spoils of war.

Yes, thanks, I should have been more precise. I was referring to modern RDBSs.

How are Most Databases Created?

Discussion Overview

Discussion Character

Main Points Raised

Areas of Agreement / Disagreement

Contextual Notes

Similar threads

Use of AI (ML/DL) in Science

Other than just FizzBuzz to test programmer candidates

Sweetspot of data compression

How to show RS(U+TRS)* is equivalent to (R+SUT)SU?

HTML/CSS Problems with DNS records

Insights Revisiting the Velocity-Time Function

Insights Remote Operated Gate Control System

Insights AI Enriched Problem Solving

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect