How to do a running median in n*log2(n)?

Click For Summary

Discussion Overview

The discussion revolves around finding an efficient method to compute a running median for a sequence of numbers, specifically targeting a time complexity of O(n log n). Participants explore various data structures and algorithms to optimize the insertion and median calculation processes.

Discussion Character

  • Technical explanation
  • Debate/contested
  • Mathematical reasoning

Main Points Raised

  • One participant proposes a solution using binary insertion for maintaining a sorted list, claiming it achieves O(n log n) complexity, but later realizes the insertion method is O(n), leading to an overall O(n²) complexity.
  • Another participant suggests that a log(n) insertion method is necessary and notes that the base of the logarithm does not affect the complexity class.
  • There is a mention of an indexable skiplist as a potential data structure for achieving log(n) insertion.
  • A binary heap is proposed as an alternative, which allows for O(log n) insertion and is compared to heapsort's O(n log n) complexity.
  • One participant discusses a method of tracking the median while building a binary tree, indicating a need for a custom implementation.
  • A suggestion is made to use two balanced heaps (a min heap and a max heap) to maintain the median efficiently, allowing for O(1) median retrieval and O(log n) insertion time.

Areas of Agreement / Disagreement

Participants express multiple competing views on the best data structure and method for achieving the desired time complexity, with no consensus reached on a single solution.

Contextual Notes

Some participants interpret "running" median as "rolling," suggesting that the median may need to be calculated over a sliding window of fixed size, which introduces additional complexity considerations.

SlurrerOfSpeech
Messages
141
Reaction score
11
So I came up with the following solution which is n*log2(n) since the input is n elements and I use a binary insertion and calculating the median is constant time. This is passes all the test cases in the tests that don't time out. I need to figure out how to make it faster so I beat the timeouts. I'm pretty much out of ideas.

Code:
using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;
using System.Diagnostics.Contracts;

namespace Solution
{
    static class Extensions
    {
        public static void BinaryInsertion(this List<int> list, int val)
        {
            int index = list.BinarySearch(val);
            if(index < 0) index = ~index;
            list.Insert(index, val);   
        }
       
        public static double Median(this List<int> list)
        {
            Contract.Requires(list.Count > 0, "Method only valid for non-empty lists");
            int i = list.Count / 2;
            double d = list.Count % 2 == 1 ? (double)list[i] : (double)(list[i-1] + list[i]) / 2.0;
            return Math.Round(d, 1);
        }
    }
   
    class Solution
    {
        static void Main(string[] args)
        {
            int n = Int32.Parse(Console.ReadLine());
            List<int> list = new List<int>();
            for(int i = 0; i < n; ++i)
            {
                int val = Int32.Parse(Console.ReadLine());
                list.InsertInOrder(val);
                Console.WriteLine(list.Median());
            }
        }
    }
}
 
Technology news on Phys.org
How early does it time out? How much do you have to speed up the program?

Where is this InsertInOrder method and where does your BinaryInsertion get used?
 
It's a HackerRank problem, so I don't know exactly how much I need to speed it up. I did figure out that the Insert method is O(n), so in fact my program is O(n2)
 
A log(n) insertion method should work then.
Note that the base of the logarithm doesn't matter for the complexity class, as it just changes the prefactor.
 
mfb said:
A log(n) insertion method should work then.
Note that the base of the logarithm doesn't matter for the complexity class, as it just changes the prefactor.

What type of data structure allows for log(n) insertion?
 
Indexable skiplist seems to be an efficient way forward [1]. Note that some people interpret "running" as "rolling" in which case the median on a long series is calculated as the median over a sliding window of fixed size.

[1] https://rhettinger.wordpress.com/tag/running-median/
 
A binary heap can do insertion in O(log(n)) in the worst case.

Heapsort is O(n log(n)), and similar to your problem.
 
Ok, I think I've figured out a way to keep track of the median(s) while building a binary tree. I'll have to handroll a class though. I'll let you know if it works.
 
  • #10
The simplest answer to this problem would seem to be have two different heaps, each with k items (or if you have an odd number of total items then one heap has k items, and the other has k + 1). I call this having two balanced heaps.

Have one heap be a min heap, with all the 'big items', and the other be a max heap, with all the 'small items'. So long as your heaps are balanced you can get the median in O(1) at any given point in time. As a items comes in, you can insert and re-balance the heaps in O(log(n)) time.

Explicit Binary trees can work as well, though heaps are simpler and use less memory (i.e. they are actually an array that we think of as a binary tree).
 
  • Like
Likes   Reactions: mfb

Similar threads

  • · Replies 2 ·
Replies
2
Views
1K
  • · Replies 1 ·
Replies
1
Views
1K
  • · Replies 9 ·
Replies
9
Views
3K
  • · Replies 10 ·
Replies
10
Views
2K
Replies
3
Views
1K
  • · Replies 35 ·
2
Replies
35
Views
4K
  • · Replies 5 ·
Replies
5
Views
2K
  • · Replies 13 ·
Replies
13
Views
5K
  • · Replies 3 ·
Replies
3
Views
3K
  • · Replies 21 ·
Replies
21
Views
3K