# Find the median of a set of sorted lists, Computer Science

• chy1013m1
In summary, the task is to develop a divide and conquer algorithm that efficiently finds the median of a set of sorted lists. The algorithm should not have a runtime of n*log(n) or n^2 and should not rely on probabilistic methods. Additionally, the set of lists must have no overlapping elements. A possible solution could involve sorting the lists by their medians, dividing them into smaller sublists, and removing elements to balance the sizes of the sublists.

## Homework Statement

Given a set of sorted lists {X1, X2, ..., Xn} each Xi is in R^n , devise a divide and conquer algo that
finds the median of X1$$\bigcup$$ X2 $$\bigcup$$ ... $$\bigcup$$ Xn efficiently.
(n * log(n), n^2 is not acceptable, also no probabilistic, expected runtime)
Also, Xi $$\bigcap$$ Xj = $$\phi$$ $$\forall$$ i, j s.t. i$$\neq$$j

## Homework Equations

already know a procedure that finds the median of 2 equal sized, sorted lists in O(log(n)) time

## The Attempt at a Solution

- sort the lists by its median values, re-label them to be X1, X2..., Xn
- divide the lists into L = {X1, ..., X[floor(n/2)]} and U = {X_ceil(n/2) ... , Xn}
- find the med points of L, call it "ml", and the med point of U, call it "mu"
- remove elements < ml from L, elements > mu from U, get new set of L', and U'.
but at this point, the lists could be no equal in size, if we do a merge it could blow up the runtime,
since merge is theta(n)

Last edited:
X is list ; M is median ; r is # of elements less than M.
X1 : (M1,r1)
X2 : (M2,r2)
X3 : (M3,r3)
..
Xn : (Mn,rn)

^
larger r's
r
r x x
r x x xx x
r x x x xx x x
r xxx xxx xx x x x
r xxx xxx xxxxxx x
m_m_m_m_m_mm___m___> larger medians

can you notice anything about that data?

Last edited:
and we have to do it n times.

To avoid this, we can use the following algorithm:

1. Initialize L' and U' as empty lists.
2. For each list Xi in L, add the element at index floor(n/2) to L'.
3. For each list Xi in U, add the element at index ceil(n/2) to U'.
4. Find the median of L' and U' using the known algorithm for finding median of two equal sized, sorted lists.
5. If the median of L' is less than the median of U', then the median of the original set of lists is in the upper half of L and the lower half of U.
6. If the median of L' is greater than the median of U', then the median of the original set of lists is in the upper half of U and the lower half of L.
7. Repeat steps 2-6 until the median of the original set of lists is found.

This algorithm has a runtime of O(nlog(n)) since each step involves finding the median of two equal sized, sorted lists which takes O(log(n)) time, and there are n lists in total. Therefore, this algorithm satisfies the given constraints of O(nlog(n)) runtime and no probabilistic or expected runtime.

## What is the definition of median?

The median of a set of data is the middle value when the data is arranged in ascending or descending order. It divides the data into two equal parts, with half of the data points being less than the median and half being greater than the median.

## How do I find the median of a set of sorted lists?

To find the median, first arrange the data in ascending or descending order. If the number of data points is odd, the median will be the middle value. If the number of data points is even, the median will be the average of the two middle values.

## Why is finding the median important in computer science?

Finding the median is important in computer science because it helps in analyzing and understanding large sets of data. It is also used in various algorithms and statistical calculations.

## What is the time complexity of finding the median of a set of sorted lists?

The time complexity of finding the median of a set of sorted lists is O(1) for a sorted array or list, as the median can be directly accessed. For an unsorted array or list, the time complexity is O(nlogn) as it involves sorting the data first.

## Can the median be calculated for any type of data?

Yes, the median can be calculated for any type of data as long as it can be sorted. This includes numerical data, alphabetical data, and even data with a mix of both. However, for data that is heavily skewed or has extreme outliers, the median may not be a representative measure of central tendency.