How to do a running median in n*log2(n)?

In summary, the author came up with a solution that is n*log2(n) and it passes all the test cases in the tests that don't time out. They need to figure out how to make it faster so they can beat the timeouts. They are pretty much out of ideas.
  • #1
SlurrerOfSpeech
141
11
So I came up with the following solution which is n*log2(n) since the input is n elements and I use a binary insertion and calculating the median is constant time. This is passes all the test cases in the tests that don't time out. I need to figure out how to make it faster so I beat the timeouts. I'm pretty much out of ideas.

Code:
using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;
using System.Diagnostics.Contracts;

namespace Solution
{
    static class Extensions
    {
        public static void BinaryInsertion(this List<int> list, int val)
        {
            int index = list.BinarySearch(val);
            if(index < 0) index = ~index;
            list.Insert(index, val);   
        }
       
        public static double Median(this List<int> list)
        {
            Contract.Requires(list.Count > 0, "Method only valid for non-empty lists");
            int i = list.Count / 2;
            double d = list.Count % 2 == 1 ? (double)list[i] : (double)(list[i-1] + list[i]) / 2.0;
            return Math.Round(d, 1);
        }
    }
   
    class Solution
    {
        static void Main(string[] args)
        {
            int n = Int32.Parse(Console.ReadLine());
            List<int> list = new List<int>();
            for(int i = 0; i < n; ++i)
            {
                int val = Int32.Parse(Console.ReadLine());
                list.InsertInOrder(val);
                Console.WriteLine(list.Median());
            }
        }
    }
}
 
Technology news on Phys.org
  • #2
How early does it time out? How much do you have to speed up the program?

Where is this InsertInOrder method and where does your BinaryInsertion get used?
 
  • #3
It's a HackerRank problem, so I don't know exactly how much I need to speed it up. I did figure out that the Insert method is O(n), so in fact my program is O(n2)
 
  • #4
A log(n) insertion method should work then.
Note that the base of the logarithm doesn't matter for the complexity class, as it just changes the prefactor.
 
  • #5
mfb said:
A log(n) insertion method should work then.
Note that the base of the logarithm doesn't matter for the complexity class, as it just changes the prefactor.

What type of data structure allows for log(n) insertion?
 
  • #6
Indexable skiplist seems to be an efficient way forward [1]. Note that some people interpret "running" as "rolling" in which case the median on a long series is calculated as the median over a sliding window of fixed size.

[1] https://rhettinger.wordpress.com/tag/running-median/
 
  • #7
A binary heap can do insertion in O(log(n)) in the worst case.

Heapsort is O(n log(n)), and similar to your problem.
 
  • #8
Ok, I think I've figured out a way to keep track of the median(s) while building a binary tree. I'll have to handroll a class though. I'll let you know if it works.
 
  • #10
The simplest answer to this problem would seem to be have two different heaps, each with k items (or if you have an odd number of total items then one heap has k items, and the other has k + 1). I call this having two balanced heaps.

Have one heap be a min heap, with all the 'big items', and the other be a max heap, with all the 'small items'. So long as your heaps are balanced you can get the median in O(1) at any given point in time. As a items comes in, you can insert and re-balance the heaps in O(log(n)) time.

Explicit Binary trees can work as well, though heaps are simpler and use less memory (i.e. they are actually an array that we think of as a binary tree).
 
  • Like
Likes mfb

FAQ: How to do a running median in n*log2(n)?

1. What is a running median?

A running median is a statistical measure that represents the middle value in a continuously updating dataset. It is calculated by arranging the data in ascending or descending order and finding the middle value. In case of an even number of data points, the average of the two middle values is taken as the median.

2. Why is n*log2(n) used for calculating the running median?

The time complexity of an algorithm is determined by the number of operations required to solve a problem of a particular size. The n*log2(n) time complexity is used for calculating the running median because it is the most efficient approach for large datasets and ensures faster execution compared to other methods.

3. How is the running median calculated using n*log2(n)?

The running median can be calculated using the following steps:

1. Sort the dataset in ascending or descending order using a sorting algorithm with n*log2(n) time complexity, such as merge sort or quicksort.

2. If the number of data points is odd, the middle value is taken as the median. If the number of data points is even, the average of the two middle values is taken as the median.

3. Repeat the above steps as new data points are added or removed from the dataset.

4. Are there any limitations of using n*log2(n) for the running median?

The n*log2(n) approach for calculating the running median is efficient for large datasets but may not be suitable for real-time applications with continuously updating data. In such cases, a more efficient algorithm with constant time complexity, such as the sliding window approach, may be used instead.

5. Can the running median be calculated in linear time?

Yes, the running median can also be calculated in linear time, i.e. O(n), using the sliding window approach. This method involves maintaining a sorted list of the data points within a fixed window and updating the list as new data points are added or removed. However, this method may not be suitable for large datasets as it requires more space and is less efficient than the n*log2(n) approach.

Similar threads

Replies
10
Views
1K
Replies
35
Views
3K
Replies
5
Views
1K
Replies
13
Views
4K
Replies
3
Views
2K
Back
Top