note - I updated s0.cpp to use c++ string's swap member function to swap strings. This reduced the overall time, but the nearly linear curve remained about the same.
http://rcgldr.net/misc/sortb.zip
I left the old version in
http://rcgldr.net/misc/sortb0.zip if you still want that one.
Monte_Carlo said:
I just looked at s0.cpp and it seems like you've changed the size of each string to be non-linear with input element count. I mean if the input element count is n and the total size is c then you calculate length of string for input count as l = c / n so it's like a hyperbola - a non-linear relationship.
True, but the point of doing that is so the total size is the same: c ~= n x l ~= constant size. If you use l = 20000 - 10 n, then the total amount of data sorted is 199,000 bytes for n = 10 and n = 1990, while the total amount of data sorted is 10,000,000 bytes for n = 1000. I could set the size of each element to be some obscure function of the number of elements to confuse the results no matter what type of sort algorithm was used. This doesn't seem very useful to me, and it doesn't match the title of the graph, number of clock ticks versus input size.
To summarize my analysis of the graph in your first post, I would state that the algorithm can't be determined and it appears that the total amount of data is varying with the number of elements, since the graph doesn't follow what would be expected in a situation where the total amount of data is fixed or increases linearly with the number of elements to be sorted. If the amount of data varies as some unknown function of number of elements, then how can the algorithm be determined?
Monte_Carlo said:
Since you've got knowledge of performance profiling, do you know of a way to look at this potential quirk with cache? How would you approach this problem if you were asked to investigate the way this program interacts with cache levels? I don't know if any API which would do that, maybe you can help. I'll be wrapping my head around the s1.cpp and s2.cpp in the mean time.
There are tools to do this, but I haven't used profiling tools in over a decade. I don't know if the cache can be disabled to see how that is affecting the results. There's also the issue of cas latency effects with ram:
http://en.wikipedia.org/wiki/CAS_latency
To get around these issues, rather than benchmark by cpu time, you could add counters to count the number of comparasons, number of swaps, number of bytes compared, ... . This will give an alternate method to theorectically compare algorithms, while the cpu time gives a more real world comparason.
The profilers I used were mostly for embedded software (software in devices). There would be a fairly fast sample rate that would store an array of PC addresses into memory on a PC while running the embedded software in a device. Then the link map was used to generate a histogram of number of samples per function for all the functions, then sorted to see which functions were taking the most of the cpu time. Profilers for a PC would work in a similar fashion. An alternative approach that isn't real time and very slow would simply trace (breakpoint) on every instruction and store a psuedo time stamp in a huge array of data.
Monte_Carlo said:
The high performance timer I've never heard of!
The various timers aren't well documented in visual studio or at msdn unless you already have some idea of the name of the timer you want. There are waitable timers and multi-media timers, as well as the high resolution counter.
Use QueryPerformanceFrequency() to get the frequency for QueryPerformanceCounter(). This is query only, you can't trigger event or functions based on this counter.
Waitable and high resolution timers:
msdn timers.htm
Multi media timers:
msdn multimedia timers.htm
timeBeginPeriod():
msdn timeBeginPeriod.htm
Note that timeBeginPeriod() affects both waitable and multi-media timer resolution. On my 3 systems, timeBeginPeriod(1) results in a timer frequency of 1024hz, which the multi-media timers fake as 1000hz by adding an extra delay on the 42nd, 84th, and 125th tick (125+3) (1000/1024) = 125ms, repeating this pattern every 125ms. The sequence timeBeginPeriod(1), SetWaitableTimer(,,1,,,); results in a 512hz timer on those same 3 systems, but without faking 500hz with extra delays. It should result in a 1024hz timer, but it ends up at 512hz.