AAU Student Projects - visit Aalborg University's student projects portal
A master thesis from Aalborg University

Bayesian Estimation of Attribute Disclosure Risk of PrivBayes

Author(s)

Term

4. term

Education

Publication year

2024

Submitted on

2024-06-14

Pages

42 pages

Abstract

wed low risk of disclosing continuous attributes for all $\varepsilon$-values. Furthermore, the results also demonstrated that the attribute disclosure risk for continuous attributes was not directly influenced by the amount of noise injected by PrivBayes. Despite this, we believe that Bayesian modelling provides an estimation where we are able to adjust the knowledge available to the adversary, which can provide more accurate results in the vision of protecting individuals' sensitive attributes.

Synthesizing is becoming ever more important as a means of providing anonymous data. However, guaranteeing the anonymity of individuals using synthetic data is still an open problem, which is reflected by the number of synthetic data generators (SDG) and privacy metrics that have been proposed in recent years. One mathematical framework that is often used to ensure privacy of SGDs is differential privacy (DP) which guarantees that adding or removing any individual from a real dataset does not significantly change the distribution. However, the guarantees provided by DP causes concern in fields where all attribute values must remain secret, as attacks that attempts to guess an unknown attribute value of a given individual are prevalent. This reinforces the need for privacy metrics that models this type of attack. Despite this, many privacy metrics mostly only consider an adversary's knowledge of the synthetic dataset, but in reality an adversary might have knowledge beyond the synthetic data, such as how the synthetic data was generated or knowledge of an individual not included in the real dataset. A reason for this might be the use of frequentist statistics rather than Bayesian statistics, where the latter provides the ability to continuously update one's beliefs with new information, making it a more natural fit for modelling of different degrees of adversary knowledge. Hornby \& Hu has implemented a Bayesian variation of calculating the risk of attribute inference attack, which accounts for an adversary's auxiliary knowledge as well as knowledge about the synthesisation method used. From this, we propose an implementation that couples PrivBayes, a differentially private Bayesian SDG, to Hornby \& Hu with the purpose of investigating Hornby \& Hu's ability to assess the risk of disclosing continuous attribute values for a synthetic dataset generated by the DP method PrivBayes given two different scenarios with different datasets. One, where we use different $\varepsilon$-values for DP and another, where we inject outliers into the real dataset. Despite the extra information, our results showed low risk of disclosing continuous attributes for all $\varepsilon$-values. Furthermore, the results also demonstrated that the attribute disclosure risk for continuous attributes was not directly influenced by the amount of noise injected by PrivBayes. Despite this, we believe that Bayesian modelling provides an estimation where we are able to adjust the knowledge available to the adversary, which can provide more accurate results in the vision of protecting individuals' sensitive attributes.

Keywords

Documents


Colophon: This page is part of the AAU Student Projects portal, which is run by Aalborg University. Here, you can find and download publicly available bachelor's theses and master's projects from across the university dating from 2008 onwards. Student projects from before 2008 are available in printed form at Aalborg University Library.

If you have any questions about AAU Student Projects or the research registration, dissemination and analysis at Aalborg University, please feel free to contact the VBN team. You can also find more information in the AAU Student Projects FAQs.