Python: Removing lower versions from a 2D list

You’re using a second loop to check if the current “key” that you’re checking exists in the list. This is slowing down your code. Why? Because, as your code demonstrates, checking for membership in lists is slow. Really slow, because you need to iterate over the entire list, which means it’s an O(N) operation, so the time depends linearly on the size of the list.

Instead, you could simply change the list to a dictionary. Lookup in a dictionary is an O(1) operation, so the lookup happens in constant (or nearly constant) time regardless of the size of the dictionary. When you do this, there’s no longer a need for two loops. Here’s an idea:

def remove_duplicates_new(duplicate): 
    final_dict = {}
    case_sensitive_keys = {}
    for k, v in duplicate: 
        klower = k.lower()
        vint = int(v)
        old_val = final_dict.get(klower, 0) # Get the key k, with a default of zero if the key doesn't exist
        if vint > old_val:
            # Replace if current value is greater than old value
            final_dict[klower] = vint
            case_sensitive_keys[klower] = k
    # Now we're done looping, so create the list
    final_list = [[case_sensitive_keys[k], str(v)] for k, v in final_dict.items()]
    return final_list

To compare, let’s make a test list with 10000 elements. The “keys” are random numbers between 1 and 100, so we’re bound to get a whole bunch of duplicates.:

import random
import timeit

testList = [[str(random.randint(1, 100)), str(random.randint(1, 10))] for i in range(10000)]

timeit.timeit('remove_duplicates(testList)', setup='from __main__ import testList, remove_duplicates', number=10)
# Output: 1.1064800999999989

timeit.timeit('remove_duplicates_new(testList)', setup='from __main__ import testList, remove_duplicates_new', number=10)
# Output:  0.03743689999998878

Hot damn! That’s a ~30x speedup!

CLICK HERE to find out more related problems solutions.

Leave a Comment

Your email address will not be published.

Scroll to Top