How are Most Databases Created?

  • Thread starter WWGD
  • Start date
In summary: Products like Oracle have an import and export function that allows you to copy a database including the data and the datastructures, the pl/sql and triggers etc. However maintenance on a database which has lots of records can be very tricky. Updating tables (fields, triggers, keys/indices) can cause data integrity failures which you need to address while you make the change. Using third party products can reduce the implementation and or migration of the database datastructures changes. We use our own product that parses the datastructures and then applies those changes in such a way that there will be a script following the changes which transforms datastructure and the data in such a way this will not lead to conflicts
  • #1
WWGD
Science Advisor
Gold Member
7,003
10,423
Hi just curious. Also considering the data , not just the structure. I am thinking of the larger ones
Are they scripted? Done with GUI ( And then scripted, or viceversa?) . I mean, I am trying to install Adventure works, with thousands of rows . What of those with thousands of fields? Not likely to be scripted, unless there ( by restoring it )are a few templates that are often copied?
Thanks.
 
Last edited:
Technology news on Phys.org
  • #2
I’m not sure what you mean by scripted. Most database backed apps have used scripts to create the schema tables and to populate some tables with data leaving other tables empty by design. As an example, an ecommerce website would have tables with product information loaded but tables of customer address information and transactions are built as people use the website to buy stuff.

A common schema is the star schema composed of small dimension tables with predefined rows (a dimension table of US states and their abbreviations as an example) and a large fact table binding things together (each row might be a customer with a customer-id, name, address and an index to the state table)
 
  • Like
Likes WWGD
  • #3
jedishrfu said:
I’m not sure what you mean by scripted. Most database backed apps have used scripts to create the schema tables and to populate some tables with data leaving other tables empty by design. As an example, an ecommerce website would have tables with product information loaded but tables of customer address information and transactions are built as people use the website to buy stuff.

A common schema is the star schema composed of small dimension tables with predefined rows (a dimension table of US states and their abbreviations as an example) and a large fact table binding things together (each row might be a customer with a customer-id, name, address and an index to the state table)

Thanks, I meant whether a template script is copied and then run/executed ( as a query), rather than written from scratch, especially the larger ones.
 
  • #4
Sometimes the data comes from another database and so they dump it to a file and then load into the new database with the same schema. Or they might load it and then apply some sql to transform it and load into new tables.

Look for ETL Processing to get an idea of how data is extracted, transformed and loaded into a database in a commercial environment.

https://en.m.wikipedia.org/wiki/Extract,_transform,_load
 
  • Like
Likes FactChecker and WWGD
  • #5
Thanks; I have done the baby stuff like Excel's and SQL Server's Import/Export, but not the larger-scale data stuff; will check it out. I had worked only with databases with at most 10 fields, each containing at most 150 rows. And a bit of scraping, but I am not sure how I made it work.
 
  • #6
Okay, my experience is with ecommerce databases with hundreds of tables and millions of rows some of which were used in targeted ad campaigns.

You could also check out Codds rules for relational databases too. They suggest never to repeat data in your tables but instead reference it from other tables.

https://en.m.wikipedia.org/wiki/Codd's_12_rules
 
  • #7
jedishrfu said:
Okay, my experience is with ecommerce databases with hundreds of tables and millions of rows some of which were used in targeted ad campaigns.

You could also check out Codds rules for relational databases too. They suggest never to repeat data in your tables but instead reference it from other tables.

https://en.m.wikipedia.org/wiki/Codd's_12_rules
Thanks again; sorry to prolong this, but, what program/tchniques do you use for massive data import/export, using millions of rows?
 
  • #9
Products like Oracle have an import and export function that allows you to copy a database including the data and the datastructures, the pl/sql and triggers etc.
However maintenance on a database which has lots of records can be very tricky.
  • Updating tables (fields, triggers, keys/indices) can cause data integrity failures which you need to address while you make the change. Using third party products can reduce the implementation and or migration of the database datastructures changes. We use our own product that parses the datastructures and then applies those changes in such a way that there will be a script following the changes which transforms datastructure and the data in such a way this will not lead to conflicts. This is all done in scripts and can takes hours (even days depending on the speed of the database/data volume)
  • Updating business rules (pl/sql) can also cause data to become invalid however these changes are less obvious and can be rather hard to detect. Even very well maintained relational models will have a lot of problems sorting out what needs to change in the database if those rules change. It usually lies with the developer to address this.
products that will allow you to maintain and design database without having to script are usually expensive and big
Examples are https://erwin.com/products/data-modeler/ for modelling and https://www-03.ibm.com/software/products/nl/enterprise for creating the database from the model. These are just examples to give you a feeling of what is out there. There are so many ways to skin a cat that it is pointless to list them all.
 
  • Like
Likes WWGD
  • #10
For creating fresh databases, bash scripts tend to be used. It garentees sameness and also allows you to see version history because you can check your script into a subversion. Liquibase is also something commonly used because it actually allows for versioned databases that can be moved forwards and backwards in schema.

As for copying preexisting databases, I exclusively use mysqldump, which should come with any MySQL server package.
 
  • Like
Likes WWGD
  • #11
We used SQL scripts to create our database schemas, tables, indexes and then used either a CSV formatted file or sql to load tables with specific values. The only times a bash script was used was to glue everything together and allow for some custom values to be inserted into key tables or to select the CSV file that we used as data input.

https://en.wikipedia.org/wiki/SQL_syntax
 
  • Like
Likes WWGD
  • #12
Let me clarify that I simply used bash to run MySQL commands. The reason I do that rather than directly using a .sql file is because setting up the database is usually only one step of a much larger process and I like to be able to go from a generic fresh Linux install to a full slave with everything installed and set up by running one script (and by letting a tool like Jenkins do it for me.)
 
  • Like
Likes WWGD and jedishrfu
  • #13
Thanks all,
A followup. Do we then use bulk import to populate the database? I know we can use Import/Export Wizard, but, do we also use Bulk import?
 
  • #14
I will assume you mean with bulk import features that will allow you to add a lot of rows of data at once using the database features like for instance MSSQL (BULK INSERT ...)
First of all this works with linear data very well. However most databases are as you know relational. When you start adding in bulk, the constraints of the database usually imply the order in which data needs to be added.
If you have foreign key relations you need to have said foreign keys added into the database before you add records that require them. This limits the way you can add data in bulk.
Another thing to consider is that inserting records (even in bulk) will set off triggers which sometimes require additional information to verify if a record is correct. This can hinder your approach seriously when adding data in bulk. Disabling the triggers may be your only option besides analysing the data and reordering the way you import data.
The approach we use is to add data using a standardized interface. It allows for adding data from system to system which have completely different origins/data/purposes. By using such an interface data can be added to the database from another host. It is slower but has a few good advantages.
  1. The process can easily be repeated (making it possible to use different databases with different starting configurations)
  2. Every data entry traced back to an interface file and error handling can be done very specifically.
  3. Business Rules are always working
  4. The interface explicitly states the data requirements (order, existence, multiplicity, absence, relationships etc.) making errors to occur before the data is inserted. Rolling back a failed transaction is much cheaper and faster.
 
  • Like
Likes WWGD
  • #15
Databases existed before computers. (Say library of Alexandria?)
Computers just are faster and not so much subject to being claimed as spoils of war.
 
  • #16
rootone said:
Databases existed before computers. (Say library of Alexandria?)
Computers just are faster and not so much subject to being claimed as spoils of war.
Yes, thanks, I should have been more precise. I was referring to modern RDBSs.
 

1. How are most databases created?

Most databases are created using a specific database management system (DBMS) such as MySQL, Oracle, or Microsoft SQL Server. These systems allow users to create, store, and manage large amounts of data efficiently. Databases are typically created by defining a schema, which outlines the structure and organization of the data, and then adding data to the database through a user interface or programming language.

2. What is the purpose of a database?

The purpose of a database is to store and manage large amounts of data in an organized and efficient manner. Databases are used in various industries and applications, such as e-commerce, banking, healthcare, and more. They allow for quick and easy retrieval of information, data analysis, and data manipulation.

3. How is data organized in a database?

Data in a database is organized into tables, which consist of rows and columns. Each row represents a single data record, and each column represents a specific type of data or attribute. The tables are then linked together through relationships, which allow for efficient data retrieval and manipulation.

4. Can databases be created without a DBMS?

Technically, databases can be created without a DBMS by manually organizing and storing data in files or spreadsheets. However, this method is not recommended as it is inefficient and prone to errors. DBMSs provide a more secure and efficient way to store and manage data.

5. How are databases secured?

Databases can be secured through various methods, such as user authentication, encryption, and access control. User authentication ensures that only authorized users have access to the data. Encryption protects the data from being accessed by unauthorized parties. Access control allows the database administrator to set permissions for different users or user groups, limiting their access to certain data or functionalities within the database.

Similar threads

  • Programming and Computer Science
Replies
9
Views
1K
  • Programming and Computer Science
Replies
5
Views
5K
  • Programming and Computer Science
Replies
11
Views
987
  • Computing and Technology
Replies
8
Views
2K
  • Programming and Computer Science
Replies
8
Views
2K
  • Programming and Computer Science
Replies
5
Views
2K
  • Programming and Computer Science
Replies
12
Views
3K
  • Special and General Relativity
Replies
1
Views
2K
  • Sci-Fi Writing and World Building
Replies
19
Views
2K
  • Programming and Computer Science
Replies
4
Views
735
Back
Top