Hi
Regarding the problem 13.15 “Pair users by attributes”. The solution hints that when the set of attributes has high cardinality, a possible solution to represent a subset of attributes for an user is to : sort the attributes, concatenate the sorted subset, hash the resulting string.
However, doesn’t that pose a problem for attributes with overlapping prefix/suffix?
For example
user id: {attributes lexicographically sorted} : concatenated string
User 0 : {pl, sql} : plsql
User 1: {pls, ql} : plsql
Both concatenated strings will hash to the same value, and on check will be equal although their original attributes differ…
Did I misunderstand the proposed method ?
It may be the case that the implied missing piece, is that on hash collision we directly compare the initial sets and not the concatenated strings. If that were to be the case, it may make sense to say so in the solution
Thanks.