[SQL] Rule of Thumb for Queries: Length of String vs Speed?

WWGD · Oct 27, 2015

Hi all,
Just curious as to whether it is a good rule of thumb ( though not an actual rule) that out of two effective queries
(i.e., two queries q1, q2 that return the desired result) , the query that is shortest as a string, i.e., has the fewest total words, will run faster? I understand the speed will depend on many factors like clauses used (each of which has a big-O speed itself), size of database, etc. , still, is this good as a hard-and-fast rule that the shortest query will be faster? Is there a nice way of doing an approximation of running time?

meBigGuy · Oct 31, 2015

You are going to have to be more specific, and even then I'm not sure there is a good answer.

WWGD · Oct 31, 2015

But this is a "soft question". Maybe I could tighten it up to say: Given: two queries q_1, q_2 that are successful and generate the same output on the same , fixed, database. Assume q_1 , as a string of words is longer than q_2. Is the probability that the running time for q_1 is longer than the running time for q_2 larger than, say , 0.6 ? This is what I think is a rigorous formulation of a rule of thumb.

willem2 · Oct 31, 2015

WWGD said:

But this is a "soft question". Maybe I could tighten it up to say: Given: two queries q_1, q_2 that are successful and generate the same output on the same , fixed, database. Assume q_1 , as a string of words is longer than q_2. Is the probability that the running time for q_1 is longer than the running time for q_2 larger than, say , 0.6 ? This is what I think is a rigorous formulation of a rule of thumb.

The problem here is that there is no good prior distribution of queries, let alone the databases they work on.
If you select a string at random from all the N-character strings that happen to be valid SQL statements, you're almost certain to get an SQL query that no one would use in practice, and then you have to run it on random database?

I would expect the time for transmitting and compiling a query to be unimportant, compared to processing the data of the query, unless the query accessed very little data, so I don't think it would make sense trying to reduce the size of the query by shorter field names, reducing blank spaces and the like, unless you fired off thousands of query's per second. (Unfortunately some back-ends databases to web-servers seem to have this usage pattern)

jim mcnamara · Oct 31, 2015

I think the following is a far more useful view of this.

Two biggies for Performance with a capital 'P':
The difference between a query using a full table scan versus one that uses indexing as a function of table size is far more important.
( I am defining table size as mostly the number of rows in the table. Most databases limit the number of columns per table.)
Database caching algorithms and cache sizes (you can tune these) are equally important for situations where the same data has to be fetched over and and over. Or queries get reused extensively.

Really tiny tables do not need indexes, really big ones do need them. And you need to write the query to let the optimizer use them, see your EXPLAIN PLAN output.

That's it. I see the motivation for your question, but is is analogous to asking why is .9999 not the same as 1. You answer math questions so you see a bunch of assumptions in the "math question" I gave as an example that are wrong. Your question has the same problems. You do not see it . Yet. You are talking about parsing cost which is usually miniscule compared with fetch times... Once a sql command has been parsed, DB's save the intermediate "compiled" form for reuse usually in a LRU buffer. Unused queries are flushed out after a while. This is why when you restart a massive database, you see a performance penalty until the data caches and the LRU caches are populated again. Data caches have orders of magnitude greater effect on performance. DB's also have a command which flushes those caches, so you can repeatably measure what is going on in testing.
Something like ALTER DATABASE FLUSH CACHE; FWIW:
We have a massive 2000+ line query with multiple unions, runs in about 30-60 ms against 10 tables, several of which have more than 1 billion rows.
No indexes used == does not complete running for 18 hours.

WWGD · Oct 31, 2015

Thank you all for your help/patience, my background is mostly in Math, but I am trying to learn the CS and practical database aspects and asking maybe silly questions while I learn.

newjerseyrunner · Nov 3, 2015

The length of your query makes no real difference, microseconds at the most. Your query structure and your database structure is where bottlenecks usually show up.

harborsparrow · Nov 21, 2015

What database are you using?

The number of characters in the query has little, if any, relationship to the efficiency of the query.

In Microsoft SQL Server's free database management tool (SQL Server Management Studio), there is a query performance advisor that is extremely useful and powerful. It's smart enough to add relationships and indices to your database to speed up an existing query, and using it, you can easily compare the performance of two result-equivalent queries.

WWGD · Nov 21, 2015

harborsparrow said:

What database are you using?

The number of characters in the query has little, if any, relationship to the efficiency of the query.

In Microsoft SQL Server's free database management tool (SQL Server Management Studio), there is a query performance advisor that is extremely useful and powerful. It's smart enough to add relationships and indices to your database to speed up an existing query, and using it, you can easily compare the performance of two result-equivalent queries.

Thanks, it is MS2014 SQL server.

[SQL] Rule of Thumb for Queries: Length of String vs Speed?

Discussion Overview

Discussion Character

Main Points Raised

Areas of Agreement / Disagreement

Contextual Notes

Who May Find This Useful

Similar threads

France to ditch Windows for Linux

Is This Music AI?

Gmail AI summaries

Help me build my server with a laptop that has a broken screen

AI Used In Peer Review

Insights Revisiting the Velocity-Time Function

Insights Remote Operated Gate Control System

Insights AI Enriched Problem Solving

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect