Increasing the efficiency of a python code.

adjacent · Sep 8, 2014

Here is a python function to check whether a given list or string contains duplicate values:

Code:

def has_duplicates(x):
	t = []
	for i in x:
		if i not in t:
			t.append(i)           
	if t != list(x):              
		return True
	else:
		return False

But I am sure this approach will be slow for large lists. Is there any other, faster way to do this?

Integrand · Sep 8, 2014

adjacent said:
Here is a python function to check whether a given list or string contains duplicate values:
Code:
def has_duplicates(x):
	t = []
	for i in x:
		if i not in t:
			t.append(i)           
	if t != list(x):              
		return True
	else:
		return False
But I am sure this approach will be slow for large lists. Is there any other, faster way to do this?

You could use something like this:

Code:

def has_duplicates(seq):
    return len(seq) != len(set(seq))

AlephZero · Sep 8, 2014

If Python has a simple-minded implementation of lists, that will take n/2 comparisons on average for each "i not in t" test, if the list has n elements. So the total time will be proportional to n². You could check that out experimentally.

It might be faster to use a dictionary instead of a list. There is an OrderedDict that remembers the order you added the elements, if that ordering is important for some reason. That should give you a time roughly proportional to n.

Another idea would be to sort the lists first and then check for duplicates. The time to do that should be proportional to n log n.

The "best" option to choose depends on what the whole application needs to do, not necessarily on optimizing just one function.

Daverz · Sep 8, 2014

There are a zillion answers for this on stackoverflow. Integrand has a good solution if you are only interested in whether there are duplicates.

For the efficiency of the built-ins, see https://wiki.python.org/moin/TimeComplexity.
See also: https://wiki.python.org/moin/PythonSpeed/PerformanceTips

To estimate the efficiency of your routine, think about the worst case scenario: two duplicates at the end of the list. "i not in t" looks simple, but each check requires a complete traversal of t (it's an O(n) operation). If n=len(x), there are (n-1) append calls and (n-1) * ( 1 + 2 + ... + n-1) comparisons. That's (n-1)*(n-1)/2*n comparisons. Our complexity is O(n**3). Ouch.

If we use a set for t instead of a list, and t.add(i) instead of append, then "i not in t" is O(1) instead of O(n), so the algorithm is O(n). For that matter, you don't need the if statement, just add() the value to the set, and duplicates are squashed for you. Then you can see that

Code:

t = set()
for i in x:
    t.add(i)

is the same as t = set(x), though set(x) is about twice as fast. Which is the method Integrand gave.

So...TIP: when using the "in" operator on a longish collection of items in an inner loop, convert the collection to a set first.

adjacent · Sep 9, 2014

Thanks guys. I haven't studied sets yet. I will try that method when I study it.

Daverz · Sep 9, 2014

adjacent said:

Thanks guys. I haven't studied sets yet. I will try that method when I study it.

Try it with a dictionary, which is how one would have done it before Python had sets. The "in" operator is O(1) for dictionaries.

adjacent · Sep 10, 2014

I have learned sets. It's exactly what I needed. No duplicate values

Increasing the efficiency of a python code.

Discussion Overview

Discussion Character

Main Points Raised

Areas of Agreement / Disagreement

Contextual Notes

Who May Find This Useful

Similar threads

Sweetspot of data compression

Other than just FizzBuzz to test programmer candidates

How to show RS(U+TRS)* is equivalent to (R+SUT)SU?

HTML/CSS Problems with DNS records

PHP My website presents the visitor with the choice of opting out of using cookies....

Insights Revisiting the Velocity-Time Function

Insights Remote Operated Gate Control System

Insights AI Enriched Problem Solving

Insights Thinking Outside The Box Versus Knowing What’s In The Box

Insights Why Entangled Photon-Polarization Qubits Violate Bell’s Inequality

Insights Quantum Entanglement is a Kinematic Fact, not a Dynamical Effect