Dismiss Notice
Join Physics Forums Today!
The friendliest, high quality science and math community on the planet! Everyone who loves science is here!

[SQL] Rule of Thumb for Queries: Length of String vs Speed?

  1. Oct 27, 2015 #1

    WWGD

    User Avatar
    Science Advisor
    Gold Member

    Hi all,
    Just curious as to whether it is a good rule of thumb ( though not an actual rule) that out of two effective queries
    (i.e., two queries q1, q2 that return the desired result) , the query that is shortest as a string, i.e., has the fewest total words, will run faster? I understand the speed will depend on many factors like clauses used (each of which has a big-O speed itself), size of database, etc. , still, is this good as a hard-and-fast rule that the shortest query will be faster? Is there a nice way of doing an approximation of running time?
     
  2. jcsd
  3. Oct 31, 2015 #2

    meBigGuy

    User Avatar
    Gold Member

    You are going to have to be more specific, and even then I'm not sure there is a good answer.
     
  4. Oct 31, 2015 #3

    WWGD

    User Avatar
    Science Advisor
    Gold Member

    But this is a "soft question". Maybe I could tighten it up to say: Given: two queries q_1, q_2 that are successful and generate the same output on the same , fixed, database. Assume q_1 , as a string of words is longer than q_2. Is the probability that the running time for q_1 is longer than the running time for q_2 larger than, say , 0.6 ? This is what I think is a rigorous formulation of a rule of thumb.
     
  5. Oct 31, 2015 #4
    The problem here is that there is no good prior distribution of queries, let alone the databases they work on.
    If you select a string at random from all the N-character strings that happen to be valid SQL statements, you're almost certain to get an SQL query that no one would use in practice, and then you have to run it on random database???

    I would expect the time for transmitting and compiling a query to be unimportant, compared to processing the data of the query, unless the query accessed very little data, so I don't think it would make sense trying to reduce the size of the query by shorter field names, reducing blank spaces and the like, unless you fired off thousands of query's per second. (Unfortunately some back-ends databases to web-servers seem to have this usage pattern)
     
  6. Oct 31, 2015 #5

    jim mcnamara

    User Avatar

    Staff: Mentor

    I think the following is a far more useful view of this.

    Two biggies for Performance with a capital 'P':
    The difference between a query using a full table scan versus one that uses indexing as a function of table size is far more important.
    ( I am defining table size as mostly the number of rows in the table. Most databases limit the number of columns per table.)
    Database caching algorithms and cache sizes (you can tune these) are equally important for situations where the same data has to be fetched over and and over. Or queries get reused extensively.

    Really tiny tables do not need indexes, really big ones do need them. And you need to write the query to let the optimizer use them, see your EXPLAIN PLAN output.

    That's it. I see the motivation for your question, but is is analogous to asking why is .9999 not the same as 1. You answer math questions so you see a bunch of assumptions in the "math question" I gave as an example that are wrong. Your question has the same problems. You do not see it . Yet. You are talking about parsing cost which is usually miniscule compared with fetch times... Once a sql command has been parsed, DB's save the intermediate "compiled" form for reuse usually in a LRU buffer. Unused queries are flushed out after a while. This is why when you restart a massive database, you see a performance penalty until the data caches and the LRU caches are populated again. Data caches have orders of magnitude greater effect on performance. DB's also have a command which flushes those caches, so you can repeatably measure what is going on in testing.
    Something like ALTER DATABASE FLUSH CACHE;


    FWIW:
    We have a massive 2000+ line query with multiple unions, runs in about 30-60 ms against 10 tables, several of which have more than 1 billion rows.
    No indexes used == does not complete running for 18 hours.
     
  7. Oct 31, 2015 #6

    WWGD

    User Avatar
    Science Advisor
    Gold Member

    Thank you all for your help/patience, my background is mostly in Math, but I am trying to learn the CS and practical database aspects and asking maybe silly questions while I learn.
     
  8. Nov 3, 2015 #7
    The length of your query makes no real difference, microseconds at the most. Your query structure and your database structure is where bottlenecks usually show up.
     
  9. Nov 21, 2015 #8

    harborsparrow

    User Avatar
    Gold Member

    What database are you using?

    The number of characters in the query has little, if any, relationship to the efficiency of the query.

    In Microsoft SQL Server's free database management tool (SQL Server Management Studio), there is a query performance advisor that is extremely useful and powerful. It's smart enough to add relationships and indices to your database to speed up an existing query, and using it, you can easily compare the performance of two result-equivalent queries.
     
  10. Nov 21, 2015 #9

    WWGD

    User Avatar
    Science Advisor
    Gold Member

    Thanks, it is MS2014 SQL server.
     
Know someone interested in this topic? Share this thread via Reddit, Google+, Twitter, or Facebook




Similar Discussions: [SQL] Rule of Thumb for Queries: Length of String vs Speed?
  1. SQL columns (Replies: 5)

  2. SQL Help? (Replies: 8)

Loading...