Inference-risk

Understanding Inference Risk

Inference risk is a privacy measure that assesses the possibility of an attacker deducing sensitive information about individuals in the original dataset using the synthetic dataset. A high inference risk implies that the synthetic data, despite not containing the sensitive information directly, may reveal patterns or correlations that allow an attacker to make accurate inferences about this protected information.

Pseudo-code Implementation

Identify Target Records: Select records from the original dataset for the attacker to target.
Find Nearest Neighbors: For each target record, locate its nearest neighbor in the synthetic dataset based on non-sensitive attributes.
Infer Sensitive Information: Use the nearest neighbor's attributes in the synthetic dataset to predict the sensitive attribute of the target record.
Calculate Risk: Determine the proportion of correct inferences made by the attacker. This proportion represents the inference risk.

Overview of the Process

Example with Mocked Results

Original Dataset

age	gender	income
25	Male	50K
30	Female	60K
28	Male	55K

Synthetic Dataset

age	gender
24	Male
31	Female
29	Male

Target Records

(25, Male)
(30, Female)

Nearest Neighbors in Synthetic Data

(25, Male) → (24, Male)
(30, Female) → (31, Female)

Inferred and Actual Income

age	gender	inferred_income	actual_income
25	Male	50K	50K
30	Female	60K	60K

Result Interpretation

The attacker successfully inferred the income for both target individuals. For instance, the nearest neighbor of (25, Male) in the synthetic dataset is (24, Male). Assuming similar individuals have similar incomes, the attacker infers an income of 50K, which is correct. The inference risk here is 2/2 = 1.0 (100%), indicating a high privacy risk.

Deeper Walkthrough with Code Reference

The InferenceEvaluator class in the anonymeter library is used to assess the inference risk. It's initialized with the original, synthetic datasets, aux_cols (non-sensitive attributes used for inference), and the secret (sensitive attribute to be inferred). The evaluate() method performs the risk evaluation.

class InferenceEvaluator:
    # ... (rest of the code) ...

    def evaluate(self, n_jobs: int = -2) -> "InferenceEvaluator":
        # ... (rest of the code) ...

        self._n_success = _run_attack(
            target=self._ori,  # Original dataset
            syn=self._syn,  # Synthetic dataset
            n_attacks=self._n_attacks,  # Number of attack attempts
            aux_cols=self._aux_cols,  # Non-sensitive attributes used for inference
            secret=self._secret,  # Sensitive attribute to be inferred
            n_jobs=n_jobs,  # Number of parallel jobs
            naive=False,  # Flag for naive attack (random guessing)
            regression=self._regression,  # Flag for regression-type inference
        )

        # ... (rest of the code) ...

Inside the evaluate() method, the \_run_attack function performs the core inference analysis. It finds nearest neighbors in the synthetic dataset based on aux_cols and then uses the secret attribute of these neighbors to infer the sensitive information of target records. This analysis forms the basis for calculating the inference risk.

Understanding Inference Risk​

Pseudo-code Implementation​

Overview of the Process​

Example with Mocked Results​

Deeper Walkthrough with Code Reference​

Understanding Inference Risk

Pseudo-code Implementation

Overview of the Process

Example with Mocked Results

Deeper Walkthrough with Code Reference