How are Most Databases Created?

  • Thread starter Thread starter WWGD
  • Start date Start date
AI Thread Summary
Most databases are created using scripts to establish schema tables and populate them with data, often utilizing templates for larger databases. Data is typically imported through ETL processes, where it is extracted, transformed, and loaded, sometimes using tools like mysqldump or bulk import features. The order of data insertion is crucial due to relational constraints, particularly with foreign keys, which can complicate bulk operations. Maintenance of large databases requires careful management of updates to avoid data integrity issues, often necessitating the use of specialized tools or scripts for version control. Overall, the creation and management of databases involve a combination of scripting, templating, and strategic data handling to ensure efficiency and accuracy.
WWGD
Science Advisor
Homework Helper
Messages
7,700
Reaction score
12,736
Hi just curious. Also considering the data , not just the structure. I am thinking of the larger ones
Are they scripted? Done with GUI ( And then scripted, or viceversa?) . I mean, I am trying to install Adventure works, with thousands of rows . What of those with thousands of fields? Not likely to be scripted, unless there ( by restoring it )are a few templates that are often copied?
Thanks.
 
Last edited:
Technology news on Phys.org
I’m not sure what you mean by scripted. Most database backed apps have used scripts to create the schema tables and to populate some tables with data leaving other tables empty by design. As an example, an ecommerce website would have tables with product information loaded but tables of customer address information and transactions are built as people use the website to buy stuff.

A common schema is the star schema composed of small dimension tables with predefined rows (a dimension table of US states and their abbreviations as an example) and a large fact table binding things together (each row might be a customer with a customer-id, name, address and an index to the state table)
 
  • Like
Likes WWGD
jedishrfu said:
I’m not sure what you mean by scripted. Most database backed apps have used scripts to create the schema tables and to populate some tables with data leaving other tables empty by design. As an example, an ecommerce website would have tables with product information loaded but tables of customer address information and transactions are built as people use the website to buy stuff.

A common schema is the star schema composed of small dimension tables with predefined rows (a dimension table of US states and their abbreviations as an example) and a large fact table binding things together (each row might be a customer with a customer-id, name, address and an index to the state table)

Thanks, I meant whether a template script is copied and then run/executed ( as a query), rather than written from scratch, especially the larger ones.
 
Sometimes the data comes from another database and so they dump it to a file and then load into the new database with the same schema. Or they might load it and then apply some sql to transform it and load into new tables.

Look for ETL Processing to get an idea of how data is extracted, transformed and loaded into a database in a commercial environment.

https://en.m.wikipedia.org/wiki/Extract,_transform,_load
 
  • Like
Likes FactChecker and WWGD
Thanks; I have done the baby stuff like Excel's and SQL Server's Import/Export, but not the larger-scale data stuff; will check it out. I had worked only with databases with at most 10 fields, each containing at most 150 rows. And a bit of scraping, but I am not sure how I made it work.
 
Okay, my experience is with ecommerce databases with hundreds of tables and millions of rows some of which were used in targeted ad campaigns.

You could also check out Codds rules for relational databases too. They suggest never to repeat data in your tables but instead reference it from other tables.

https://en.m.wikipedia.org/wiki/Codd's_12_rules
 
jedishrfu said:
Okay, my experience is with ecommerce databases with hundreds of tables and millions of rows some of which were used in targeted ad campaigns.

You could also check out Codds rules for relational databases too. They suggest never to repeat data in your tables but instead reference it from other tables.

https://en.m.wikipedia.org/wiki/Codd's_12_rules
Thanks again; sorry to prolong this, but, what program/tchniques do you use for massive data import/export, using millions of rows?
 
Products like Oracle have an import and export function that allows you to copy a database including the data and the datastructures, the pl/sql and triggers etc.
However maintenance on a database which has lots of records can be very tricky.
  • Updating tables (fields, triggers, keys/indices) can cause data integrity failures which you need to address while you make the change. Using third party products can reduce the implementation and or migration of the database datastructures changes. We use our own product that parses the datastructures and then applies those changes in such a way that there will be a script following the changes which transforms datastructure and the data in such a way this will not lead to conflicts. This is all done in scripts and can takes hours (even days depending on the speed of the database/data volume)
  • Updating business rules (pl/sql) can also cause data to become invalid however these changes are less obvious and can be rather hard to detect. Even very well maintained relational models will have a lot of problems sorting out what needs to change in the database if those rules change. It usually lies with the developer to address this.
products that will allow you to maintain and design database without having to script are usually expensive and big
Examples are https://erwin.com/products/data-modeler/ for modelling and https://www-03.ibm.com/software/products/nl/enterprise for creating the database from the model. These are just examples to give you a feeling of what is out there. There are so many ways to skin a cat that it is pointless to list them all.
 
  • Like
Likes WWGD
  • #10
For creating fresh databases, bash scripts tend to be used. It garentees sameness and also allows you to see version history because you can check your script into a subversion. Liquibase is also something commonly used because it actually allows for versioned databases that can be moved forwards and backwards in schema.

As for copying preexisting databases, I exclusively use mysqldump, which should come with any MySQL server package.
 
  • Like
Likes WWGD
  • #11
We used SQL scripts to create our database schemas, tables, indexes and then used either a CSV formatted file or sql to load tables with specific values. The only times a bash script was used was to glue everything together and allow for some custom values to be inserted into key tables or to select the CSV file that we used as data input.

https://en.wikipedia.org/wiki/SQL_syntax
 
  • Like
Likes WWGD
  • #12
Let me clarify that I simply used bash to run MySQL commands. The reason I do that rather than directly using a .sql file is because setting up the database is usually only one step of a much larger process and I like to be able to go from a generic fresh Linux install to a full slave with everything installed and set up by running one script (and by letting a tool like Jenkins do it for me.)
 
  • Like
Likes WWGD and jedishrfu
  • #13
Thanks all,
A followup. Do we then use bulk import to populate the database? I know we can use Import/Export Wizard, but, do we also use Bulk import?
 
  • #14
I will assume you mean with bulk import features that will allow you to add a lot of rows of data at once using the database features like for instance MSSQL (BULK INSERT ...)
First of all this works with linear data very well. However most databases are as you know relational. When you start adding in bulk, the constraints of the database usually imply the order in which data needs to be added.
If you have foreign key relations you need to have said foreign keys added into the database before you add records that require them. This limits the way you can add data in bulk.
Another thing to consider is that inserting records (even in bulk) will set off triggers which sometimes require additional information to verify if a record is correct. This can hinder your approach seriously when adding data in bulk. Disabling the triggers may be your only option besides analysing the data and reordering the way you import data.
The approach we use is to add data using a standardized interface. It allows for adding data from system to system which have completely different origins/data/purposes. By using such an interface data can be added to the database from another host. It is slower but has a few good advantages.
  1. The process can easily be repeated (making it possible to use different databases with different starting configurations)
  2. Every data entry traced back to an interface file and error handling can be done very specifically.
  3. Business Rules are always working
  4. The interface explicitly states the data requirements (order, existence, multiplicity, absence, relationships etc.) making errors to occur before the data is inserted. Rolling back a failed transaction is much cheaper and faster.
 
  • Like
Likes WWGD
  • #15
Databases existed before computers. (Say library of Alexandria?)
Computers just are faster and not so much subject to being claimed as spoils of war.
 
  • #16
rootone said:
Databases existed before computers. (Say library of Alexandria?)
Computers just are faster and not so much subject to being claimed as spoils of war.
Yes, thanks, I should have been more precise. I was referring to modern RDBSs.
 

Similar threads

Back
Top