...

Social Media Can Technology Facebook Privacy

by user

on
Category: Documents
8

views

Report

Comments

Transcript

Social Media Can Technology Facebook Privacy
Social
Media
Can
Technology
Facebook
Privacy
Save Privacy?
Latanya Sweeney
dataprivacylab.org
[email protected]
George Gasparis
Rachel Lally
Disclaimer
The views and opinions in this presentation
represent my own and are not necessarily
those of HHS
or the Obama Administration.
These views are for the benefit of public
discourse and public education,
and are not necessarily an opinion
regarding any position I may take on
related issues decided
by the HIT Policy Committee.
Big Data
Need Data
Privacy and Utility
Old and New
Big Data
g
h
i
j
k
l
m
n
o
p
Clayton, P., et al. For The Record. National Academy Press,1997. !
L. Sweeney, 2011. !
Big Data
$
$
$
Big Data
n
i
a
P
Need Data
Government Collects Data!
1995
1999
Problem:
Expensive
healthcare
Problem:
Dead
beat dads
Problem:
Expensive atrisk children
Problem:
Measles
epidemic
Solution:
Hospital
discharge
data
Solution:
Directory
of new
Hires
Solution:
Electronic
birth
certificate
Solution:
Childhood
Immunization
registry
L. Sweeney. Information Explosion. Confidentiality, Disclosure, and Data Access:
Theory and Practical Applications for Statistical Agencies, L. Zayatz, P. Doyle, J.
Theeuwes and J. Lane (eds), Urban Institute, Washington, DC, 2001.
Hospital Discharge Data, fields 1-12
# Field description Size
1 HOSPITAL ID NUMBER 12
2 PATIENT DATE OF BIRTH
(MMDDYYYY) 8
3 SEX 1
4 ADMIT DATE (MMDYYYY) 8
5 DISCHARGE DATE (MMDDYYYY) 8
6 ADMIT SOURCE 1
7 ADMIT TYPE 1
8 LENGTH OF STAY (DAYS) 4
9 PATIENT STATUS 2
10 PRINCIPAL DIAGNOSIS CODE 6
11 SECONDARY DIAGNOSIS CODE - 1 6
12 SECONDARY DIAGNOSIS CODE - 2 6
Copyright © 2002 Sweeney
Hospital Discharge Data, fields 12-25
# Field description Size
13 SECONDARY DIAGNOSIS CODE - 3 6
14 SECONDARY DIAGNOSIS CODE - 4 6
15 SECONDARY DIAGNOSIS CODE - 5 6
16 SECONDARY DIAGNOSIS CODE - 6 6
17 SECONDARY DIAGNOSIS CODE - 7 6
18 SECONDARY DIAGNOSIS CODE - 8 6
19 PRINCIPAL PROCEDURE CODE 7
20 SECONDARY PROCEDURE CODE - 1 7
21 SECONDARY PROCEDURE CODE - 2 7
22 SECONDARY PROCEDURE CODE - 3 7
23 SECONDARY PROCEDURE CODE - 4 7
24 SECONDARY PROCEDURE CODE - 5 7
25 DRG CODE 3
Copyright © 2002 Sweeney
Hospital Discharge Data, fields 26-37
# Field description Size
26 MDC CODE 2
27 TOTAL CHARGES 9
28 ROOM AND BOARD CHARGES 9
29 ANCILLARY CHARGES 9
30 ANESTHESIOLOGY CHARGES 9
31 PHARMACY CHARGES 9
32 RADIOLOGY CHARGES 9
33 CLINICAL LAB CHARGES 9
34 LABOR-DELIVERY CHARGES 9
35 OPERATING ROOM CHARGES 9
36 ONCOLOGY CHARGES 9
37 OTHER CHARGES 9
Copyright © 2002 Sweeney
Hospital Discharge Data, fields 38-50
# Field description Size
38 NEWBORN INDICATOR 1
39 PAYER ID 1 9
40 TYPE CODE 1 1
41 PAYER ID 2 9
42 TYPE CODE 2 1
43 PAYER ID 3 9
44 TYPE CODE 3 1
45 PATIENT ZIP CODE 5
46 Patient Origin COUNTY 3
47 Patient Origin PLANNING AREA 3
48 Patient Origin HSA 2
49 PATIENT CONTROL NUMBER
50 HOSPITAL HSA 2
Copyright © 2002 Sweeney
Government Collects Data!
1995
1999
Problem:
Expensive
healthcare
Problem:
Dead
beat dads
Problem:
Expensive atrisk children
Problem:
Measles
epidemic
Solution:
Hospital
discharge
data
Solution:
Directory
of new
Hires
Solution:
Electronic
birth
certificate
Solution:
Childhood
Immunization
registry
L. Sweeney. Information Explosion. Confidentiality, Disclosure, and Data Access:
Theory and Practical Applications for Statistical Agencies, L. Zayatz, P. Doyle, J.
Theeuwes and J. Lane (eds), Urban Institute, Washington, DC, 2001.
Government Collects Data!
1995
1999
Problem:
Expensive
healthcare
Problem:
Dead
beat dads
Problem:
Expensive atrisk children
Problem:
Measles
epidemic
Solution:
Hospital
discharge
data
Solution:
Directory
of new
Hires
Solution:
Electronic
birth
certificate
Solution:
Childhood
Immunization
registry
L. Sweeney. Information Explosion. Confidentiality, Disclosure, and Data Access:
Theory and Practical Applications for Statistical Agencies, L. Zayatz, P. Doyle, J.
Theeuwes and J. Lane (eds), Urban Institute, Washington, DC, 2001.
Government Collects Data!
1995
1999
Problem:
Expensive
healthcare
Problem:
Dead
beat dads
Problem:
Expensive atrisk children
Problem:
Measles
epidemic
Solution:
Hospital
discharge
data
Solution:
Directory
of new
Hires
Solution:
Electronic
birth
certificate
Solution:
Childhood
Immunization
registry
L. Sweeney. Information Explosion. Confidentiality, Disclosure, and Data Access:
Theory and Practical Applications for Statistical Agencies, L. Zayatz, P. Doyle, J.
Theeuwes and J. Lane (eds), Urban Institute, Washington, DC, 2001.
Can we
understand
our
customers?
Loyalty card
programs
Privacy and Utility
Privacy!
Utility!
Traditional Belief System!
Technical Solutions!
Subject
…
Recipient
Provider
“No privacy, get over it”
Ann 10/2/61 02139 cardiac
Abe 7/14/61 02139 cancer
Al
3/8/61 02138 liver
Provider
Recipient
“Can’t release data”
Recipient
Provider
“Share data with privacy”
A*
A*
A*
Recipient
1961
1961
1961
0213* cardiac
0213* cancer
0213* liver
Provider
Old and New
Old way: De-identification
Name
Visit date
Diagnoses
Procedures
Dataset
Address
ZIP
Birth
date
Sex
Date
registered
Party
affiliation
Date last
voted
Voter List
Sweeney Simple Demographics Often Identify People Uniquely. 2000. dataprivacylab.org/projects/identifiability/index.html
18.1%
0.04%
58.4%
3.6%
0.04%
87%
3.7%
0.04%
0.00004%
County
Town/
Place
ZIP
5-digit
Gender
Date of Birth
Mon/Yr Birth
Year of Birth
Ad Hoc Schemes that Don’t Work
Mouth Only
 Grayscale
 Black & White
Single Bar Mask
T- Mask
Ordinal
Threshold
Ad Hoc Schemes that Don’t Work
Random
 Grayscale
 Black & White
Negative
 Grayscale
 Black & White
Pixelation
Mr. Potato Head
Pixelation – recognition improved!
De-identification
k-Anonymity
-Pixel
-Eigen
k=
2
3
5
10
50
100
De-identification
Specific Use
Image!
Fitted Mesh!
Overlayed!
Reconstruction!
De-identification
Risk Assessment
Privacert.com
Description
* Date of visit (month, day and year)
Transaction#
Unique patient identifier
* Patient 5-digit ZIP code
* Month, day and Year of Birth
* Gender
Unique Provider ID
Provider 5-digit ZIP code
* ICD9 diagnosis code 1
* ICD9 diagnosis code 2
* ICD9 diagnosis code 3
* ICD9 diagnosis code 4
* ICD9 diagnosis code 5
* ICD9 diagnosis code 6
Year
3-digit ZIP
Syndromic
groups
Description
Name
Date of visit (month, day and year)
Patient 5-digit ZIP code
ICD9 diagnosis code 1
ICD9 diagnosis code 2
ICD9 diagnosis code 3
ICD9 diagnosis code 4
ICD9 diagnosis code 5
ICD9 diagnosis code 6
Month, day and Year of Birth
Gender
Date
ZIP
Dx1
Dx2
Dx3
Dx4
Dx5
Dx6
DOB
Sex
Year
3-digit ZIP
Year
Cumulative Percentage of Patients
100.0%
90.0%
80.0%
70.0%
60.0%
Unaltered
50.0%
Safe
40.0%
30.0%
20.0%
10.0%
0.0%
1
6
11
16
Binsize
21
26
Description
Date of visit (month, day and year)
Patient 5-digit ZIP code
ICD9 diagnosis code 1
ICD9 diagnosis code 2
ICD9 diagnosis code 3
ICD9 diagnosis code 4
ICD9 diagnosis code 5
ICD9 diagnosis code 6
Month, day and Year of Birth
Gender
Name
Date
ZIP
Dx1
Dx2
Dx3 to month
DOB
Dx4 of birth?
year
Dx5
Dx6
DOB
Sex
and
Cumulative Percentage of Patients
100.0%
90.0%
80.0%
70.0%
60.0%
MonDayYr
50.0%
Safe
40.0%
MonYr
30.0%
20.0%
10.0%
0.0%
1
6
11
16
Binsize
21
26
Description
Date of visit (month, day and year)
Patient 5-digit ZIP code
ICD9 diagnosis code 1
ICD9 diagnosis code 2
ICD9 diagnosis code 3
ICD9 diagnosis code 4
ICD9 diagnosis code 5
ICD9 diagnosis code 6
Month, day and Year of Birth
Gender
Name
Date
ZIP
Dx1
Dx2
Dx3
Generalize
Dx4
Dx5
Dx6
DOB
Sex
more?
Cumulative Percentage of Patients
100.0%
90.0%
80.0%
70.0%
MonDayYr
60.0%
Safe
50.0%
MonYr
40.0%
Year
Age
30.0%
20.0%
No change!
10.0%
0.0%
1
6
11
16
Binsize
21
26
Name
Visit date
Diagnoses
ZIP
Month &
Year of
Birth
Sex
Bio-Surveillance Data
Address
Date
registered
Party
affiliation
Date last
voted
Voter List
Name
ZIP
5-year
age
range
ZIP
Visit date
Diagnoses
Sex
Bio-Surveillance
Address
Date
Month
& Year of registered
Birth
Party
Sex
affiliation
Hospital Data
Date last
voted
Voter List
Description
Name
Date of visit (month, day and year)
Patient 5-digit ZIP code
Syndrome
subclass
ICD9
diagnosis
codefor
1 dx1
Syndrome
subclass
ICD9
diagnosis
codefor
2 dx2
Syndrome
subclass
ICD9
diagnosis
codefor
3 dx3
ICD9
diagnosis
codefor
4 dx4
Syndrome
subclass
Syndrome
subclass
ICD9
diagnosis
codefor
5 dx5
Syndrome
subclass
ICD9
diagnosis
codefor
6 dx6
Month,
and Year of Birth
Year ofday
birth
Gender
Date
ZIP
Dx1
Dx2
Dx3
Dx4
Dx5
Dx6
DOB
Sex
Cumulative Percentage of Patients
120.0%
100.0%
80.0%
Unaltered
60.0%
Safe
HIPAA CERTIFIED!
40.0%
Altered
20.0%
0.0%
0
5
10
15
20
-20.0%
Binsize
25
30
35
Old way: Informed Consent
1
1
1
1
1
1
2
2
3
1
3
1
1
1
2
2
3
4
4
5
1
2
2
3
1
3
1
1
1
2
2
3
4
4
5
Informed Consent
Open Consent
Informed Consent
Personal Access Control
Patient Control
MyDataCan.org
P
E
R
M
I
S
S
I
O
N
S
Rx
Refill
I
N
D
I
C
E
S
D
I
R
E
C
T
O
R
Y
ER
Peace
of Mind
EHR
Applications (Apps) API
Rx
D
B
A
P
I
New Dx, Rx
jsmith
New Rx
jsmith
Current
Rx’s,
Allergies,
Dx’s,
Emergency
contact
jsmith
PHR (private)
jsmith
MyDataCan
Current
Rx’s jsmith
Old way: Aggregation
Aggregation
Differential Privacy
Aggregation
Multi-party Computation
PrivaMix
multiple sources produce
an anonymous linked dataset
privacy.cs.cmu.edu
Simplified Multiplication Example
3
Mult(
, 13) = 39
Mult(
7
, 13) = 91
Shelter1
13
39 69
91 253
Planning
Office
Mult(69, 13)=
8
Mult(253, 13) 97
= 3289
Shelter2
23
3
Mult(
Mult(
, 23) = 69
11
, 23) = 253
69 39
25391
897
=
)
, 23 = 2093
9
3
(
t
l
Mu 91, 23)
Mult(
Shelter1 Shelter2
39
897
91
91
2093
897
3289
69
253
Planning Office Learns
Planning
Office
Completely
Mixed
Shelter1 Shelter2
Dedentifiers
897
39
2093
91
3289
69
253
Simplified Multiplication Example
3
Mult(
, 13) = 39
Shelter1
13
23)*13
3(3* *13
Planning
Office
(3 * 13) * 23 = 897
(3 * 23) * 13 = 897
Shelter2
23
3
Mult(
, 23) = 69
3
2
*
)
33
(33 ** 12
Shelter1 Shelter2
39
897
897
69
How To Think About
Privacy in What You Do
high
Priority 2
Priority 1
medium
Priority 4
Priority 3
Priority 2
Priority 5
Priority 4
Priority 3
small
medium
small
Likelihood of Impact
Priority 3
high
Likelihood of Occurrence
TM
PrivacyScholar
[email protected]
Social Network Datasets
On Facebook?
Facebook: Political view
Facebook: Interested In
Gender
Race, Ethnicity
Facebook: number of friends (school)
Facebook: total number of friends
Primary academic major*
Facebook: number of Picture Friends
Dorm*
Facebook: Favorite Movies*
Facebook: Favorite Music*
Network linkages of roommates Facebook: Favorite Books*
Facebook: linkages of friends
Student Directory
(like a public phone book)
Impute gender
from common names
Use distributions
to identify dorms
Use roommates
to identify people
Big Data
Need Data
Privacy and Utility
Old and New
Can
Technology
Technology
Save
Privacy?
Can
Save
Privacy
Latanya Sweeney
dataprivacylab.org
[email protected]
Fly UP