pasteur.hierarchy.make_grouping#

pasteur.hierarchy.make_grouping(counts, head, common=0)[source]#

Converts the hierarchical attribute level tree provided into a node-to-group mapping, where group[i][j] = z, where i is the height of the mapping j is node j and z is the group the node is associated at that height.

counts provides the class densities. It doesn’t need to be normalized and some of its values may be negative.

Reason: if counts is differentially private, then some values will have negative probability after adding noise. If we clip to 0, then the mean added value of noise will become positive, and large groups of small classes will have their probability increase by m*n (where m is the noise scale, n the number of groups).

common is the number of values in the beginning of the domain are shared with other columns in the attribute. Those values will never be merged. This also means that the minimum domain of this column will be common + 1.

Return type:

ndarray