How to do a running median in n*log2(n)?

Click For Summary
The discussion revolves around optimizing a solution for calculating the median of a list of integers, initially implemented with a binary insertion method that results in a time complexity of O(n^2). The user seeks to improve performance to avoid timeouts in a HackerRank problem. Suggestions include using data structures that allow for O(log n) insertions, such as indexable skiplists or binary heaps. The concept of maintaining two balanced heaps—one min heap for larger elements and one max heap for smaller elements—is proposed as an efficient way to calculate the median in O(1) time while ensuring that insertions and rebalancing occur in O(log n) time. The discussion emphasizes the need for a more efficient algorithm while exploring various data structures that could facilitate this improvement.
SlurrerOfSpeech
Messages
141
Reaction score
11
So I came up with the following solution which is n*log2(n) since the input is n elements and I use a binary insertion and calculating the median is constant time. This is passes all the test cases in the tests that don't time out. I need to figure out how to make it faster so I beat the timeouts. I'm pretty much out of ideas.

Code:
using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;
using System.Diagnostics.Contracts;

namespace Solution
{
    static class Extensions
    {
        public static void BinaryInsertion(this List<int> list, int val)
        {
            int index = list.BinarySearch(val);
            if(index < 0) index = ~index;
            list.Insert(index, val);   
        }
       
        public static double Median(this List<int> list)
        {
            Contract.Requires(list.Count > 0, "Method only valid for non-empty lists");
            int i = list.Count / 2;
            double d = list.Count % 2 == 1 ? (double)list[i] : (double)(list[i-1] + list[i]) / 2.0;
            return Math.Round(d, 1);
        }
    }
   
    class Solution
    {
        static void Main(string[] args)
        {
            int n = Int32.Parse(Console.ReadLine());
            List<int> list = new List<int>();
            for(int i = 0; i < n; ++i)
            {
                int val = Int32.Parse(Console.ReadLine());
                list.InsertInOrder(val);
                Console.WriteLine(list.Median());
            }
        }
    }
}
 
Technology news on Phys.org
How early does it time out? How much do you have to speed up the program?

Where is this InsertInOrder method and where does your BinaryInsertion get used?
 
It's a HackerRank problem, so I don't know exactly how much I need to speed it up. I did figure out that the Insert method is O(n), so in fact my program is O(n2)
 
A log(n) insertion method should work then.
Note that the base of the logarithm doesn't matter for the complexity class, as it just changes the prefactor.
 
mfb said:
A log(n) insertion method should work then.
Note that the base of the logarithm doesn't matter for the complexity class, as it just changes the prefactor.

What type of data structure allows for log(n) insertion?
 
Indexable skiplist seems to be an efficient way forward [1]. Note that some people interpret "running" as "rolling" in which case the median on a long series is calculated as the median over a sliding window of fixed size.

[1] https://rhettinger.wordpress.com/tag/running-median/
 
A binary heap can do insertion in O(log(n)) in the worst case.

Heapsort is O(n log(n)), and similar to your problem.
 
Ok, I think I've figured out a way to keep track of the median(s) while building a binary tree. I'll have to handroll a class though. I'll let you know if it works.
 
  • #10
The simplest answer to this problem would seem to be have two different heaps, each with k items (or if you have an odd number of total items then one heap has k items, and the other has k + 1). I call this having two balanced heaps.

Have one heap be a min heap, with all the 'big items', and the other be a max heap, with all the 'small items'. So long as your heaps are balanced you can get the median in O(1) at any given point in time. As a items comes in, you can insert and re-balance the heaps in O(log(n)) time.

Explicit Binary trees can work as well, though heaps are simpler and use less memory (i.e. they are actually an array that we think of as a binary tree).
 
  • Like
Likes mfb

Similar threads

  • · Replies 2 ·
Replies
2
Views
1K
  • · Replies 0 ·
Replies
0
Views
640
  • · Replies 9 ·
Replies
9
Views
3K
  • · Replies 10 ·
Replies
10
Views
2K
Replies
3
Views
1K
  • · Replies 35 ·
2
Replies
35
Views
4K
  • · Replies 5 ·
Replies
5
Views
1K
  • · Replies 13 ·
Replies
13
Views
5K
  • · Replies 3 ·
Replies
3
Views
3K
  • · Replies 21 ·
Replies
21
Views
3K