Example Attacks On Data

Introduction & Motivations

Organizations often share data to improve:

Fraud detection: Identifying suspicious patterns.
Customer service: Tailoring products or services to individual preferences.
Market insights: Analyzing spending habits to inform strategic decisions.
Collaborative research: Working with third parties or academic institutions.

However, bad actors—ranging from disgruntled insiders to opportunistic cybercriminals—can exploit data sharing workflows to execute attacks that compromise privacy and security.

Types of Bad Actors

Bad actors may come from inside the organization or outside of it. They may be motivated by financial gain, blackmail, competitive advantage, or mere curiosity.

Insider Threats

Malicious Employees
- Motivation: Personal gain, sabotage, or selling sensitive data.
- Example: An employee with privileged access re-identifies transactions, then sells that data to a competitor or to identity thieves.
Curious Employees
- Motivation: Curiosity or lack of awareness.
- Example: An employee snoops on transaction data (out of personal curiosity) and inadvertently leaks private information.

External Hackers

Cybercriminals
- Motivation: Identity theft, fraud, ransom, black market resale of data.
- Example: A hacker breaches a dataset, re-identifies users, and sells PII on the dark web.
Competitors
- Motivation: Competitive intelligence, targeted marketing.
- Example: A rival bank re-identifies certain high-value customers to offer them targeted deals.

Data Brokers

Information Aggregators
- Motivation: Create and sell detailed profiles to advertisers or third parties.
- Example: A broker collects “anonymized” data from multiple sources, merges them to build comprehensive user dossiers, and resells the enriched info.

Re-identification (Linkage) Attacks

Re-identification or linkage attacks occur when seemingly anonymized data is cross-referenced with auxiliary information, making it possible to pinpoint individuals. Despite removing names or direct identifiers, persistent “quasi-identifiers”—such as dates, locations, or unique behavior patterns—can unravel anonymity.

What Is a Linkage Attack?

A linkage attack happens when an attacker leverages external information (auxiliary data) to connect anonymized data in one dataset to personally identifiable information in another. This can reveal an individual’s spending habits, financial products, or other private details—even if the organization has taken steps to obscure direct identifiers.

Simple Example of a Linkage Attack

Scenario

A high-profile individual (e.g., a local politician) is a person of interest. An attacker wants to identify the politician’s transactions in an anonymized dataset of credit card purchases.

Anonymized Credit Card Transactions:

Transaction ID	Date	Location	Amount
1	2023-09-23	Bakery A	$10
2	2023-09-23	Restaurant B	$50
3	2023-09-24	Grocery C	$30
4	2023-09-25	Clothing D	$100

Auxiliary Data (from Public Social Media):

Post ID	Date	Mentioned Location	Mentioned Activity
A	2023-09-23	Bakery A	"Enjoyed a pastry today"
B	2023-09-23	Restaurant B	"Dinner at B's Bistro"

Linkage Attack Process

Gather External Clues:
- The attacker notices the politician’s social media posts about visiting Bakery A and Restaurant B on September 23.
Cross-reference with Anonymized Data:
- The attacker lines up the dates and locations from the anonymized dataset with public posts.
Identify Transactions:
- The transactions at Bakery A and Restaurant B on September 23 (Transaction IDs #1 and #2) are likely the politician’s.

Transaction ID	Date	Location	Amount	Likely Identity
1	2023-09-23	Bakery A	$10	Possibly the Politician
2	2023-09-23	Restaurant B	$50	Possibly the Politician

Inference & Motivation

The attacker re-identifies the politician’s spending habits.
Potential use-cases: Blackmail, selling to tabloids, or political adversaries.

Conclusion on Linkage Attacks

Even when data appears anonymized, publicly available information (e.g., social media posts) can unravel anonymity. This underscores the importance of robust de-identification techniques and continuous risk assessment—particularly around what external sources might be used to link datasets.

Membership Inference Attacks

A membership inference attack occurs when an adversary seeks to determine if a specific individual is included in a dataset (or used to train a machine learning model). In financial contexts, mere membership in a dataset (e.g., a list of loan applicants) can reveal highly sensitive information.

Scenario: Membership Inference in a Credit Risk Model

Context: A financial institution trains a machine learning model on historical loan applicants’ financial behavior.
Attacker’s Goal: Figure out if a high-profile individual—say, a well-known CEO—applied for a loan.

Attack Method

External Information:
- The attacker knows some of the CEO’s financial characteristics from public records or rumors.
Querying the Model:
- The attacker crafts synthetic inputs that closely match the CEO’s profile (credit score, income range, typical loan amount).
- The attacker observes the model’s output probabilities.
Analysis of Outputs:
- If the model’s responses significantly shift for inputs that match the CEO’s profile, the attacker infers that the CEO was included in the training set.

Why This Matters

Breaching Privacy: Simply confirming that a person applied for a loan can reveal private financial needs or struggles.
Exploitable Insight: Competitors or malicious actors could use this information for blackmail, targeted marketing, or negotiation leverage.

Example Data Tables

Training Dataset (Simplified):

Applicant ID	Credit Score	Income	Loan Amount	Approved	Defaulted
1	750	$100,000	$20,000	Yes	No
2	640	$50,000	$10,000	Yes	Yes
3	800	$120,000	$25,000	Yes	No
4	680	$60,000	$15,000	No	N/A

Attacker’s Synthetic Queries:

Credit Score	Income	Loan Amount	Model Output Probability
750	$100,000	$20,000	0.85
755	$105,000	$20,000	0.83
800	$120,000	$25,000	0.92
790	$115,000	$22,000	0.90

Conclusion on Membership Inference

Membership inference attacks highlight the need for privacy-preserving machine learning techniques (e.g., differential privacy, secure multi-party computation) to ensure that data about whether or not someone is present in a dataset remains protected.

Attribute Inference Attacks

An attribute inference attack aims to deduce sensitive attributes (e.g., income level, health status) from data or models—even if those attributes are not explicitly in the shared dataset.

Scenario: Inferring Income Level from Transaction Data

Context: A financial institution uses transaction data to detect fraud or offer personalized recommendations. Income level is not directly included in the dataset, but an insider or third party still attempts to infer it.

Attack Method

Access to Transaction Data:
- The attacker sees transaction amounts, merchant types, and purchase frequencies but no explicit income column.
Building a Hypothesis:
- The attacker presumes certain patterns (luxury retail, high monthly spending, frequent travel) correlate with higher income.
Analyzing Patterns:
- By focusing on average transaction sizes, frequency of high-end purchases, and travel/dining habits, the attacker can approximate an individual’s income bracket.
Infer the Attribute:
- Customer X who makes $500 weekly purchases at luxury retailers might be labeled “High Income.”

Example Data Tables

Anonymized Transaction Data:

Customer ID	Transaction Amount	Merchant Type	Frequency of Transactions
1	$500	Luxury Retailer	Weekly
2	$50	Grocery Store	Daily
3	$2000	International Travel	Monthly
4	$150	Restaurant	Weekly

Attacker’s Inference:

Customer ID	Inferred Income Level
1	High
2	Low
3	High
4	Middle

Conclusion on Attribute Inference

This attack demonstrates that indirect information (transaction patterns) can reveal highly sensitive attributes. Limiting granular access to transaction data, employing robust anonymization or noise injection, and monitoring for unusual access can help mitigate such risks.

Model Extraction Attack

Mechanism: Repeatedly querying a machine learning model to replicate its functionality or approximate its parameters.
Impact: Theft of intellectual property and possible exposure of underlying training data.
Example: A competitor queries a credit-scoring API at scale, reconstructing a surrogate model.

Data Poisoning Attack

Mechanism: Injecting malicious or bogus entries into a dataset used for training or analysis.
Impact: Corrupted models or analytics, leading to erroneous decisions and potential sabotage.
Example: Fake transactions inserted into a fraud detection dataset, thereby reducing the system’s accuracy.

Data Correlation or Linking Attack

Mechanism: Matching quasi-identifiers (e.g., hashed emails, partial phone numbers) across multiple datasets to build a more complete picture of an individual.
Impact: Amplified privacy breaches, as partial data from multiple sources merges to produce a complete dossier.
Example: Correlating location data from a banking app with an online retailer’s purchase logs.

Summary & Mitigation

Data sharing is essential for leveraging insights, but each time data leaves its vault—whether for internal use, third-party collaboration, or public release—it may be vulnerable to:

Re-identification / Linkage Attacks
Membership Inference
Attribute Inference
Model Extraction
Data Poisoning
Data Correlation / Linking

Bottom Line
Addressing data sharing attacks requires a multi-layered defense strategy, combining technological solutions (encryption, anonymization) with organizational and legal measures. By understanding the anatomy of these attacks and proactively mitigating them, organizations can harness data’s potential without compromising customer trust and privacy.

Introduction & Motivations​

Types of Bad Actors​

Insider Threats​

External Hackers​

Data Brokers​

Re-identification (Linkage) Attacks​

What Is a Linkage Attack?​

Simple Example of a Linkage Attack​

Scenario​

Linkage Attack Process​

Inference & Motivation​

Conclusion on Linkage Attacks​

Membership Inference Attacks​

Scenario: Membership Inference in a Credit Risk Model​

Attack Method​

Why This Matters​

Example Data Tables​

Conclusion on Membership Inference​

Attribute Inference Attacks​

Scenario: Inferring Income Level from Transaction Data​

Attack Method​

Example Data Tables​

Conclusion on Attribute Inference​

Other Common Data Sharing Attacks​

Model Extraction Attack​

Data Poisoning Attack​

Data Correlation or Linking Attack​

Summary & Mitigation​

Introduction & Motivations

Types of Bad Actors

Insider Threats

External Hackers

Data Brokers

Re-identification (Linkage) Attacks

What Is a Linkage Attack?

Simple Example of a Linkage Attack

Scenario

Linkage Attack Process

Inference & Motivation

Conclusion on Linkage Attacks

Membership Inference Attacks

Scenario: Membership Inference in a Credit Risk Model

Attack Method

Why This Matters

Example Data Tables

Conclusion on Membership Inference

Attribute Inference Attacks

Scenario: Inferring Income Level from Transaction Data

Attack Method

Example Data Tables

Conclusion on Attribute Inference

Other Common Data Sharing Attacks

Model Extraction Attack

Data Poisoning Attack

Data Correlation or Linking Attack

Summary & Mitigation