[SQL] Rule of Thumb for Queries: Length of String vs Speed?

In summary, the length of a query in terms of characters has little to no correlation with its efficiency in terms of execution time. Factors such as query structure and database structure play a much larger role in determining the speed of a query. Tools like the query performance advisor in Microsoft SQL Server can be helpful in optimizing the performance of a query by suggesting relationships and indices to add to the database.
  • #1
WWGD
Science Advisor
Gold Member
7,017
10,569
Hi all,
Just curious as to whether it is a good rule of thumb ( though not an actual rule) that out of two effective queries
(i.e., two queries q1, q2 that return the desired result) , the query that is shortest as a string, i.e., has the fewest total words, will run faster? I understand the speed will depend on many factors like clauses used (each of which has a big-O speed itself), size of database, etc. , still, is this good as a hard-and-fast rule that the shortest query will be faster? Is there a nice way of doing an approximation of running time?
 
Computer science news on Phys.org
  • #2
You are going to have to be more specific, and even then I'm not sure there is a good answer.
 
  • #3
But this is a "soft question". Maybe I could tighten it up to say: Given: two queries q_1, q_2 that are successful and generate the same output on the same , fixed, database. Assume q_1 , as a string of words is longer than q_2. Is the probability that the running time for q_1 is longer than the running time for q_2 larger than, say , 0.6 ? This is what I think is a rigorous formulation of a rule of thumb.
 
  • #4
WWGD said:
But this is a "soft question". Maybe I could tighten it up to say: Given: two queries q_1, q_2 that are successful and generate the same output on the same , fixed, database. Assume q_1 , as a string of words is longer than q_2. Is the probability that the running time for q_1 is longer than the running time for q_2 larger than, say , 0.6 ? This is what I think is a rigorous formulation of a rule of thumb.

The problem here is that there is no good prior distribution of queries, let alone the databases they work on.
If you select a string at random from all the N-character strings that happen to be valid SQL statements, you're almost certain to get an SQL query that no one would use in practice, and then you have to run it on random database?

I would expect the time for transmitting and compiling a query to be unimportant, compared to processing the data of the query, unless the query accessed very little data, so I don't think it would make sense trying to reduce the size of the query by shorter field names, reducing blank spaces and the like, unless you fired off thousands of query's per second. (Unfortunately some back-ends databases to web-servers seem to have this usage pattern)
 
  • Like
Likes harborsparrow and WWGD
  • #5
I think the following is a far more useful view of this.

Two biggies for Performance with a capital 'P':
The difference between a query using a full table scan versus one that uses indexing as a function of table size is far more important.
( I am defining table size as mostly the number of rows in the table. Most databases limit the number of columns per table.)
Database caching algorithms and cache sizes (you can tune these) are equally important for situations where the same data has to be fetched over and and over. Or queries get reused extensively.

Really tiny tables do not need indexes, really big ones do need them. And you need to write the query to let the optimizer use them, see your EXPLAIN PLAN output.

That's it. I see the motivation for your question, but is is analogous to asking why is .9999 not the same as 1. You answer math questions so you see a bunch of assumptions in the "math question" I gave as an example that are wrong. Your question has the same problems. You do not see it . Yet. You are talking about parsing cost which is usually miniscule compared with fetch times... Once a sql command has been parsed, DB's save the intermediate "compiled" form for reuse usually in a LRU buffer. Unused queries are flushed out after a while. This is why when you restart a massive database, you see a performance penalty until the data caches and the LRU caches are populated again. Data caches have orders of magnitude greater effect on performance. DB's also have a command which flushes those caches, so you can repeatably measure what is going on in testing.
Something like ALTER DATABASE FLUSH CACHE; FWIW:
We have a massive 2000+ line query with multiple unions, runs in about 30-60 ms against 10 tables, several of which have more than 1 billion rows.
No indexes used == does not complete running for 18 hours.
 
  • Like
Likes WWGD
  • #6
Thank you all for your help/patience, my background is mostly in Math, but I am trying to learn the CS and practical database aspects and asking maybe silly questions while I learn.
 
  • #7
The length of your query makes no real difference, microseconds at the most. Your query structure and your database structure is where bottlenecks usually show up.
 
  • #8
What database are you using?

The number of characters in the query has little, if any, relationship to the efficiency of the query.

In Microsoft SQL Server's free database management tool (SQL Server Management Studio), there is a query performance advisor that is extremely useful and powerful. It's smart enough to add relationships and indices to your database to speed up an existing query, and using it, you can easily compare the performance of two result-equivalent queries.
 
  • #9
harborsparrow said:
What database are you using?

The number of characters in the query has little, if any, relationship to the efficiency of the query.

In Microsoft SQL Server's free database management tool (SQL Server Management Studio), there is a query performance advisor that is extremely useful and powerful. It's smart enough to add relationships and indices to your database to speed up an existing query, and using it, you can easily compare the performance of two result-equivalent queries.
Thanks, it is MS2014 SQL server.
 

Related to [SQL] Rule of Thumb for Queries: Length of String vs Speed?

What is the rule of thumb for the length of strings in SQL queries?

The general rule of thumb is to keep string lengths as short as possible in SQL queries. This is because longer strings require more memory and processing time, which can slow down the execution of the query.

How does the length of a string affect the speed of an SQL query?

The longer the string, the more memory and processing power is needed to handle it. This can significantly slow down the execution of the query, especially if the query involves a large number of records.

Are there any exceptions to the rule of thumb for string length in SQL queries?

In some cases, such as when dealing with large amounts of data or when performing complex string operations, longer strings may be necessary. In these cases, it is important to find a balance between string length and query speed.

How can I optimize my SQL queries for string length?

One way to optimize string length in SQL queries is to use data types that are appropriate for the length of the string. For example, using VARCHAR instead of TEXT can improve query performance for shorter strings.

What steps can I take to improve the speed of my SQL queries?

Besides optimizing string length, there are several other steps you can take to improve the speed of your SQL queries. This includes using indexes, writing efficient query logic, and minimizing the number of records being queried.

Similar threads

  • Programming and Computer Science
Replies
5
Views
2K
  • Engineering and Comp Sci Homework Help
Replies
3
Views
958
  • Special and General Relativity
3
Replies
75
Views
3K
Replies
31
Views
6K
  • Introductory Physics Homework Help
Replies
2
Views
2K
  • Beyond the Standard Models
Replies
3
Views
3K
  • Beyond the Standard Models
Replies
2
Views
2K
  • Computing and Technology
2
Replies
44
Views
11K
Back
Top