Request for Large Databases (100,000+ rows) to Practice with

  • Thread starter WWGD
  • Start date
  • Tags
    Request
In summary: For example, an index on a name would be great for a search, but would not be as good for a join. A hashing algorithm can be used to speed up a query by doing the work once instead of many times. In summary, if you want to work with larger scale databases, you may want to consider Oracle or IBM.
  • #1
WWGD
Science Advisor
Gold Member
7,017
10,534
Hi All,
I am trying to up my game in SQL Server in general. Specifically, my knowledge has been obtained so far by working with very small-scale databases ( fewer than 200 rows). Does anyone know of free larger-scale databases that are available?
Thanks.
 
Technology news on Phys.org
  • #2
Hmm. Before I retired, this was considered a small database. 100 million and up was getting large, but not very.
 
  • #3
I don't know about 100,000 rows but there's a test database on the MySQL website here that is described as 'large'. See under the 'Example databases' heading.
 
  • Like
Likes WWGD
  • #4
PAllen said:
Hmm. Before I retired, this was considered a small database. 100 million and up was getting large, but not very.
I guess the terms small and large depend on the machine available. With 100,000 I thought of an Oracle database to handle best, but I wouldn't like to handle 100,000,000 other than on a DB2. However, I think that even 100,000 are a big deal for MySQL. How about some csv data taken from anywhere or a continuous feed (rss I think) from a weather page and upload them piece by piece? The entire amount at once might be a disappointing experience.
 
  • Like
Likes WWGD
  • #5
andrewkirk said:
I don't know about 100,000 rows but there's a test database on the MySQL website here that is described as 'large'. See under the 'Example databases' heading.
If you look down into the doc for that dB described as large, it is, in fact, 4 million rows spread across 6 tables.
 
Last edited:
  • Like
Likes WWGD
  • #6
fresh_42 said:
I guess the terms small and large depend on the machine available. With 100,000 I thought of an Oracle database to handle best, but I wouldn't like to handle 100,000,000 other than on a DB2. However, I think that even 100,000 are a big deal for MySQL. How about some csv data taken from anywhere or a continuous feed (rss I think) from a weather page and upload them piece by piece? The entire amount at once might be a disappointing experience.
I see, so I think then I may not even need a scraper, but even Excel's import/export Wizard would do, it seems.
 
  • #7
Oracle is the database to know if you are set on making SQL your career path. Microsoft SQL is "Oracle-Lite" at best. IBM's DB/2 is also a decent RDBMS, just not as popular as Oracle or MS-SQL. All three RDBMS' have pretty much unlimited number of records (MS-SQL limits you to 16 terrabytes). Oracle allows up to 1,000 columns per record. MS-SQL limits you to 8,060 bytes per record. Oracle allows for recursive scripts, MS-SQL does not. Microsoft also has their own version of SQL, they no longer use the ANSI Standard for SQL (particularly with regard to their joins), but Oracle does.

You can get the Oracle Database 11g Express Edition for free for training and development purposes. Microsoft offers the same thing for MS-SQL Server Express. If you are really interested, you can also get a free copy of IBM's Db2 Express-C.

Also, check out Oracle's, Microsoft's, and IBM's developer websites. They will often include free tools for developers.
 
  • Like
Likes WWGD and fresh_42
  • #9
You could also just install your own Linux VM, apt-get or yum MySQL-server and insert whatever you want. Delete the vm was hen you are done.

Btw, 100,000 rows is a medium size database at best. I have over a billion entries in some of mine. The most important thing to understand is indexing and hashing when dealing with things that big. Different types of indexes are optimized for different types of data and queries.
 
  • Like
Likes WWGD

Related to Request for Large Databases (100,000+ rows) to Practice with

1. What is the purpose of requesting large databases to practice with?

The purpose of requesting large databases to practice with is to gain experience and proficiency in handling and analyzing large amounts of data. It also allows for the development and refinement of data management and analysis skills, which are essential for scientists in various fields.

2. Where can I find large databases to practice with?

There are several online resources where you can find large databases to practice with, such as Kaggle, Google Dataset Search, and Data.gov. You can also request access to large databases from research institutions or collaborate with other scientists who may have access to such databases.

3. What types of data are typically included in large databases?

Large databases can include various types of data, such as numerical, categorical, text, and image data. They can also include metadata, such as timestamps and location data, and may contain data from different sources and formats.

4. How should I prepare for working with large databases?

To prepare for working with large databases, it is essential to have a good understanding of data management and analysis techniques, as well as knowledge of programming languages and tools commonly used for data analysis, such as SQL, Python, and R. It is also important to have a clear research question or hypothesis to guide your analysis.

5. What are some challenges associated with working with large databases?

Working with large databases can present several challenges, such as data cleaning and preprocessing, which can be time-consuming and require specialized skills. It can also be challenging to manage and analyze large datasets without appropriate tools and computing resources. Additionally, working with sensitive or confidential data may require additional ethical considerations and precautions.

Similar threads

  • Programming and Computer Science
Replies
5
Views
2K
  • Programming and Computer Science
Replies
15
Views
1K
  • Programming and Computer Science
Replies
11
Views
2K
  • Set Theory, Logic, Probability, Statistics
Replies
5
Views
309
  • STEM Academic Advising
Replies
5
Views
1K
  • Sci-Fi Writing and World Building
Replies
19
Views
2K
  • Art, Music, History, and Linguistics
Replies
12
Views
2K
Replies
1
Views
830
  • General Discussion
Replies
6
Views
936
Back
Top