Data anonymization irreversibly transforms data in a privacy-preserving way. The outcome is still clear data that can be of use for the CSPs and external users, but with a lower accuracy (and, thus, a lower disclosure risk) than the original data. Data anonymization is performed once at the storage stage; after that, any queries on the data (search, retrieval, calculations) are transparent to CLARUS and the CSP, even though they may result in approximate results.

Two types of anonymization mechanisms have been designed:

  • Data coarsening: it systematically generalizes input records (independently, one at a time) according to a user-defined coarsening level. Since coarsened data are less detailed than the original ones, disclosure risk is minimized.
  • Data microaggregation: it clusters a fixed number k of similar records together and replaces them with average values; thus, it transforms the whole dataset in a monolithic, global way (it cannot be applied independently to each record). Since the k microaggregated records within each cluster are indistinguishable, the re-identification probability is lowered to 1/k.
(Efficiently*) Supported operations: 
Performance impact on local premises (per data size): 
Linear for data coarsening during storage
Quadratic for microaggregation during storage
Zero in all other operations
Quasi-linear for microaggregation during storage
Data accuracy preservation: 
Partial. Depends on the operation requested on the data
Same accuracy for CLARUS users and the CSP
Access of non-CLARUS users: 
Medium for data coarsening
High for microaggregation
Transparent for both the users and the CSP