Skip to main content

Example Attacks On Data

Introduction & Motivations

Organizations often share data to improve:

  • Fraud detection: Identifying suspicious patterns.
  • Customer service: Tailoring products or services to individual preferences.
  • Market insights: Analyzing spending habits to inform strategic decisions.
  • Collaborative research: Working with third parties or academic institutions.

However, bad actors—ranging from disgruntled insiders to opportunistic cybercriminals—can exploit data sharing workflows to execute attacks that compromise privacy and security.


Types of Bad Actors

Bad actors may come from inside the organization or outside of it. They may be motivated by financial gain, blackmail, competitive advantage, or mere curiosity.

Insider Threats

  1. Malicious Employees

    • Motivation: Personal gain, sabotage, or selling sensitive data.
    • Example: An employee with privileged access re-identifies transactions, then sells that data to a competitor or to identity thieves.
  2. Curious Employees

    • Motivation: Curiosity or lack of awareness.
    • Example: An employee snoops on transaction data (out of personal curiosity) and inadvertently leaks private information.

External Hackers

  1. Cybercriminals

    • Motivation: Identity theft, fraud, ransom, black market resale of data.
    • Example: A hacker breaches a dataset, re-identifies users, and sells PII on the dark web.
  2. Competitors

    • Motivation: Competitive intelligence, targeted marketing.
    • Example: A rival bank re-identifies certain high-value customers to offer them targeted deals.

Data Brokers

  1. Information Aggregators
    • Motivation: Create and sell detailed profiles to advertisers or third parties.
    • Example: A broker collects “anonymized” data from multiple sources, merges them to build comprehensive user dossiers, and resells the enriched info.

Re-identification (Linkage) Attacks

Re-identification or linkage attacks occur when seemingly anonymized data is cross-referenced with auxiliary information, making it possible to pinpoint individuals. Despite removing names or direct identifiers, persistent “quasi-identifiers”—such as dates, locations, or unique behavior patterns—can unravel anonymity.

What Is a Linkage Attack?

A linkage attack happens when an attacker leverages external information (auxiliary data) to connect anonymized data in one dataset to personally identifiable information in another. This can reveal an individual’s spending habits, financial products, or other private details—even if the organization has taken steps to obscure direct identifiers.


Simple Example of a Linkage Attack

Scenario

A high-profile individual (e.g., a local politician) is a person of interest. An attacker wants to identify the politician’s transactions in an anonymized dataset of credit card purchases.

Anonymized Credit Card Transactions:

Transaction IDDateLocationAmount
12023-09-23Bakery A$10
22023-09-23Restaurant B$50
32023-09-24Grocery C$30
42023-09-25Clothing D$100

Auxiliary Data (from Public Social Media):

Post IDDateMentioned LocationMentioned Activity
A2023-09-23Bakery A"Enjoyed a pastry today"
B2023-09-23Restaurant B"Dinner at B's Bistro"

Linkage Attack Process

  1. Gather External Clues:

    • The attacker notices the politician’s social media posts about visiting Bakery A and Restaurant B on September 23.
  2. Cross-reference with Anonymized Data:

    • The attacker lines up the dates and locations from the anonymized dataset with public posts.
  3. Identify Transactions:

    • The transactions at Bakery A and Restaurant B on September 23 (Transaction IDs #1 and #2) are likely the politician’s.
Transaction IDDateLocationAmountLikely Identity
12023-09-23Bakery A$10Possibly the Politician
22023-09-23Restaurant B$50Possibly the Politician

Inference & Motivation

  • The attacker re-identifies the politician’s spending habits.
  • Potential use-cases: Blackmail, selling to tabloids, or political adversaries.

Conclusion on Linkage Attacks

Even when data appears anonymized, publicly available information (e.g., social media posts) can unravel anonymity. This underscores the importance of robust de-identification techniques and continuous risk assessment—particularly around what external sources might be used to link datasets.


Membership Inference Attacks

A membership inference attack occurs when an adversary seeks to determine if a specific individual is included in a dataset (or used to train a machine learning model). In financial contexts, mere membership in a dataset (e.g., a list of loan applicants) can reveal highly sensitive information.

Scenario: Membership Inference in a Credit Risk Model

  • Context: A financial institution trains a machine learning model on historical loan applicants’ financial behavior.
  • Attacker’s Goal: Figure out if a high-profile individual—say, a well-known CEO—applied for a loan.

Attack Method

  1. External Information:

    • The attacker knows some of the CEO’s financial characteristics from public records or rumors.
  2. Querying the Model:

    • The attacker crafts synthetic inputs that closely match the CEO’s profile (credit score, income range, typical loan amount).
    • The attacker observes the model’s output probabilities.
  3. Analysis of Outputs:

    • If the model’s responses significantly shift for inputs that match the CEO’s profile, the attacker infers that the CEO was included in the training set.

Why This Matters

  • Breaching Privacy: Simply confirming that a person applied for a loan can reveal private financial needs or struggles.
  • Exploitable Insight: Competitors or malicious actors could use this information for blackmail, targeted marketing, or negotiation leverage.

Example Data Tables

Training Dataset (Simplified):

Applicant IDCredit ScoreIncomeLoan AmountApprovedDefaulted
1750$100,000$20,000YesNo
2640$50,000$10,000YesYes
3800$120,000$25,000YesNo
4680$60,000$15,000NoN/A

Attacker’s Synthetic Queries:

Credit ScoreIncomeLoan AmountModel Output Probability
750$100,000$20,0000.85
755$105,000$20,0000.83
800$120,000$25,0000.92
790$115,000$22,0000.90

Conclusion on Membership Inference

Membership inference attacks highlight the need for privacy-preserving machine learning techniques (e.g., differential privacy, secure multi-party computation) to ensure that data about whether or not someone is present in a dataset remains protected.


Attribute Inference Attacks

An attribute inference attack aims to deduce sensitive attributes (e.g., income level, health status) from data or models—even if those attributes are not explicitly in the shared dataset.

Scenario: Inferring Income Level from Transaction Data

  • Context: A financial institution uses transaction data to detect fraud or offer personalized recommendations. Income level is not directly included in the dataset, but an insider or third party still attempts to infer it.

Attack Method

  1. Access to Transaction Data:

    • The attacker sees transaction amounts, merchant types, and purchase frequencies but no explicit income column.
  2. Building a Hypothesis:

    • The attacker presumes certain patterns (luxury retail, high monthly spending, frequent travel) correlate with higher income.
  3. Analyzing Patterns:

    • By focusing on average transaction sizes, frequency of high-end purchases, and travel/dining habits, the attacker can approximate an individual’s income bracket.
  4. Infer the Attribute:

    • Customer X who makes $500 weekly purchases at luxury retailers might be labeled “High Income.”

Example Data Tables

Anonymized Transaction Data:

Customer IDTransaction AmountMerchant TypeFrequency of Transactions
1$500Luxury RetailerWeekly
2$50Grocery StoreDaily
3$2000International TravelMonthly
4$150RestaurantWeekly

Attacker’s Inference:

Customer IDInferred Income Level
1High
2Low
3High
4Middle

Conclusion on Attribute Inference

This attack demonstrates that indirect information (transaction patterns) can reveal highly sensitive attributes. Limiting granular access to transaction data, employing robust anonymization or noise injection, and monitoring for unusual access can help mitigate such risks.


Other Common Data Sharing Attacks

Model Extraction Attack

  • Mechanism: Repeatedly querying a machine learning model to replicate its functionality or approximate its parameters.
  • Impact: Theft of intellectual property and possible exposure of underlying training data.
  • Example: A competitor queries a credit-scoring API at scale, reconstructing a surrogate model.

Data Poisoning Attack

  • Mechanism: Injecting malicious or bogus entries into a dataset used for training or analysis.
  • Impact: Corrupted models or analytics, leading to erroneous decisions and potential sabotage.
  • Example: Fake transactions inserted into a fraud detection dataset, thereby reducing the system’s accuracy.

Data Correlation or Linking Attack

  • Mechanism: Matching quasi-identifiers (e.g., hashed emails, partial phone numbers) across multiple datasets to build a more complete picture of an individual.
  • Impact: Amplified privacy breaches, as partial data from multiple sources merges to produce a complete dossier.
  • Example: Correlating location data from a banking app with an online retailer’s purchase logs.

Summary & Mitigation

Data sharing is essential for leveraging insights, but each time data leaves its vault—whether for internal use, third-party collaboration, or public release—it may be vulnerable to:

  1. Re-identification / Linkage Attacks
  2. Membership Inference
  3. Attribute Inference
  4. Model Extraction
  5. Data Poisoning
  6. Data Correlation / Linking

Bottom Line
Addressing data sharing attacks requires a multi-layered defense strategy, combining technological solutions (encryption, anonymization) with organizational and legal measures. By understanding the anatomy of these attacks and proactively mitigating them, organizations can harness data’s potential without compromising customer trust and privacy.