Bayesian Estimation of Attribute Disclosure Risk of PrivBayes

Authors

Trudslev, Frederik Marinus ; Bachmann, Silas Oliver Torup

Term

4. term

Education

Computer Science, Master

Publication year

2024

Submitted on

2024-06-14

Pages

Abstract

wed low risk of disclosing continuous attributes for all $\varepsilon$-values. Furthermore, the results also demonstrated that the attribute disclosure risk for continuous attributes was not directly influenced by the amount of noise injected by PrivBayes. Despite this, we believe that Bayesian modelling provides an estimation where we are able to adjust the knowledge available to the adversary, which can provide more accurate results in the vision of protecting individuals' sensitive attributes.

Synthesizing is becoming ever more important as a means of providing anonymous data. However, guaranteeing the anonymity of individuals using synthetic data is still an open problem, which is reflected by the number of synthetic data generators (SDG) and privacy metrics that have been proposed in recent years. One mathematical framework that is often used to ensure privacy of SGDs is differential privacy (DP) which guarantees that adding or removing any individual from a real dataset does not significantly change the distribution. However, the guarantees provided by DP causes concern in fields where all attribute values must remain secret, as attacks that attempts to guess an unknown attribute value of a given individual are prevalent. This reinforces the need for privacy metrics that models this type of attack. Despite this, many privacy metrics mostly only consider an adversary's knowledge of the synthetic dataset, but in reality an adversary might have knowledge beyond the synthetic data, such as how the synthetic data was generated or knowledge of an individual not included in the real dataset. A reason for this might be the use of frequentist statistics rather than Bayesian statistics, where the latter provides the ability to continuously update one's beliefs with new information, making it a more natural fit for modelling of different degrees of adversary knowledge. Hornby \& Hu has implemented a Bayesian variation of calculating the risk of attribute inference attack, which accounts for an adversary's auxiliary knowledge as well as knowledge about the synthesisation method used. From this, we propose an implementation that couples PrivBayes, a differentially private Bayesian SDG, to Hornby \& Hu with the purpose of investigating Hornby \& Hu's ability to assess the risk of disclosing continuous attribute values for a synthetic dataset generated by the DP method PrivBayes given two different scenarios with different datasets. One, where we use different $\varepsilon$-values for DP and another, where we inject outliers into the real dataset. Despite the extra information, our results showed low risk of disclosing continuous attributes for all $\varepsilon$-values. Furthermore, the results also demonstrated that the attribute disclosure risk for continuous attributes was not directly influenced by the amount of noise injected by PrivBayes. Despite this, we believe that Bayesian modelling provides an estimation where we are able to adjust the knowledge available to the adversary, which can provide more accurate results in the vision of protecting individuals' sensitive attributes.

Keywords

EHR ; Synthetic data ; SDG ; Differential Privacy ; DP ; Bayesian Statistics ; UwU ; Bayes Theorem

Documents

Download
View record in AAU Student Projects

A master's thesis from Aalborg University

Bayesian Estimation of Attribute Disclosure Risk of PrivBayes