Understanding 5.5 deleting duplicates in sorted array (python)?


#1

I don’t quite get the general idea behind the code, why the array gets one short element in the end just from updating. How does the extra space get removed?

My understanding is that overall we have 2 “pointers” moving in the array, i and write_index. So write_index progresses together with i, and when there’s a duplicate found, write_index will stay behind and “hold” the duplicate. The index i will move forward and find a non duplicate, in which case the if statement will activate once more and update the duplicated element where write_index is, into to the non-duplicate. Is this correct?

But at the end of it, where does the deleting of the element come from? The solution works very well however, so I’m confused.

def delete_duplicates(A):
    if not A:
        return 0

    write_index = 1
    for i in range(1, len(A)):
        if A[write_index - 1] != A[i]:
            A[write_index] = A[i]
            write_index += 1
    return write_index

#2

What do you mean by “the array gets one short element in the end just from updating”, and by “How does the extra space get removed?”?

Is the code in your post yours, or from the book?

Please give an example using an array of numbers, showing both correct and incorrect behavior.


#3

Hey, @smalldata99!

It took me a while to understand this solution too and in case you’re still trying to understand what’s going on, here’s how I see it:

think of write_index as an “available spot” and also keep in mind that both i and write_index are positions (indexes) and not the actual value from the array.

So, you are correct: i and write_index are pointers moving in the array and they have the same value until you encounter a duplicate item. When the first duplicate item is found, the if block will not be executed and the value of write_index will not increase. write_index doesn’t hold the duplicate value but the index of the position of the duplicate value in the array, what I’m calling “available spot”. This will result in the “available spot” being reserved for the next unique value found in the array, in other words, for the next value that enters the if block.

One thing that helped me a lot was to put lots of print statements showing what value each variable had and the status of the array at each iteration.

I hope this helps clarify.


#4

:smile: Thanks! I actually asked those questions so I could help you. Glad you figured it out.