Better time complexity for problem 13.1

#1

Why waste time sorting all of the strings in problem 13.1? Wouldn’t it be better to use the number of occurrences of each letter in the alphabet as a unique identifier for each anagram group? i.e, associate each string with a 26-element array of integers where A[0] is the number of occurrences of the letter ‘a’, etc.

Thanks,
Kevin

0 Likes

#2

Hey Kevin,

You are right, sorting is not necessary and sorting is just a simple way to get an unique identifier. Because it is really hard to create a hash function for a 26-element array of integers because eventually you still need to use this 26-element array as the key of the hash table of anagrams.

Please let me know if you have any problem.

0 Likes

#3

I feel that this is essentially a counting sort. A quicker way of sorting under this situation.

0 Likes

#4

Hi Tsung-Hsien,

Thanks for your reply. I wonder why it is harder to create a hash function for an integer array than for a string since a string is represented in memory as an array of integers. In this case each element of the array is fairly small, let’s say 4 bits (I don’t think there are any english words where the same letter appears more than 16 times). So, the entire array could be represented with 13 bytes. Couldn’t we just represent that as a 13-letter word?

Thanks,
Kevin

0 Likes

#5

Hey Kevin,

It is totally possible to create such hash function, and the reason we use that is because we need to explicitly create a hash function for that array in C++. Back to your proposal, I think it works and the only problem is you need to explicitly create the hash function and there is definitely nothing wrong with that.

0 Likes

#6

I think the difference between sorting and the integer hash function is negligible, because in practice there are no words which are very long (e.g., >100 characters).

0 Likes