Clarification on 13.15

#1

Hi
Regarding the problem 13.15 “Pair users by attributes”. The solution hints that when the set of attributes has high cardinality, a possible solution to represent a subset of attributes for an user is to : sort the attributes, concatenate the sorted subset, hash the resulting string.
However, doesn’t that pose a problem for attributes with overlapping prefix/suffix?

For example
user id: {attributes lexicographically sorted} : concatenated string
User 0 : {pl, sql} : plsql
User 1: {pls, ql} : plsql

Both concatenated strings will hash to the same value, and on check will be equal although their original attributes differ…

Did I misunderstand the proposed method ?
It may be the case that the implied missing piece, is that on hash collision we directly compare the initial sets and not the concatenated strings. If that were to be the case, it may make sense to say so in the solution :wink:

Thanks.

0 Likes

#2

I think this problem can be solved by putting some unique delimiters to concatenate those attributes strings. For example, you can use comma (,) in your example to get a non-ambiguous sting representation.

BTW, we remove this problem because it does not very well-defined. However, it is good to know how to use hash table and sorting to get a canonical representation from this problem still.

0 Likes