Forums

deleting repeatedness in lists and of lists

Hi everyone :)

superlist=[[0,8],[1],[2],[3,6],[4],[5],[6,3,6],[7],[8,0,8],[9]]

I am trying to create a function on a list of lists that will do the following:

1) Deletes a repeat list( i.e. deletes a list that contains a number that has already been used in another list) from left to right so in the superlist it will identify that the first list contains a '0' and an '8' so any further list containing any of those two elements will be deleted. Therefore the penultimate list will be deleted.

So for the example above, if the function were called reduce() then, reduce(superlist) = [[0,8],[1],[2],[3,6],[4],[5],[7],[9]]

Thank you

[Edited mistake thanks to Cart]

I think your example contains a mistake, because the list [7] should also be in the reduced version, if I've understood your description correctly. Also, you don't state whether a list should be removed if it shares a number only with a previous list that was itself removed - I'm assuming that only lists which survive the removal process should be considered for duplicates.

I think this is actually quite a good example for introducing list comprehensions and generators as a gradual sequence of increasingly concise examples. Starting with the straightforward implementation:

def deduplicate(superlist):
    seen = set()
    result = []
    for sublist in superlist:
        for item in sublist:
            if item in seen:
                break
        else:
            result.append(sublist)
            seen.update(sublist)
    return result

You can compress it down a bit using a single comprehension:

def deduplicate(superlist):
    seen = set()
    result = []
    for sublist in superlist:
        if not any(item in seen for item in sublist):
            result.append(sublist)
            seen.update(sublist)
    return result

You can make that cleaner by converting it to a generator:

def deduplicate(superlist):
    seen = set()
    for sublist in superlist:
        if not any(item in seen for item in sublist):
            yield sublist
            seen.update(sublist)

You can nest comprehensions to make things even more concise:

def deduplicate(superlist):
    seen = set()
    for sublist in (sublist for sublist in superlist if not any(item in seen for item in sublist)):
        yield sublist
        seen.update(sublist)

And you can use the sort of obfuscatory obscurantism beloved of C programmers to get it down even further:

def deduplicate(superlist):
    seen = set()
    return (seen.update(sublist) or sublist for sublist in superlist if not any(item in seen for item in sublist))

I leave it as an exercise to the reader to decide where on that scale sanity ends and self-indulgence begins!

Wow. Thanks a lot Cartroo, and thank you for helping me on the other thread as well.

You're welcome. They sound suspiciously like they might be homework questions to me, but I was in a magnanimous mood. (^_^)

Also, I thought this progression might be educational to other people who're a bit hazy on generators and list comprehensions, and how they might be used, and it's an excuse to use the seldom spotted for...else construction.

You might not want to use the final version in production code, but it's worth taking a look to try and understand what's going on. You might also consider what the difference would be if the outermost brackets of the return value were changed to square brackets.

Its not homework, the problem of reducing is an impasse of sorting out something bigger haha :)

Well, OK, I'll let you off then. (^_^)