Comments
Transcript
Social Media Can Technology Facebook Privacy
Social Media Can Technology Facebook Privacy Save Privacy? Latanya Sweeney dataprivacylab.org [email protected] George Gasparis Rachel Lally Disclaimer The views and opinions in this presentation represent my own and are not necessarily those of HHS or the Obama Administration. These views are for the benefit of public discourse and public education, and are not necessarily an opinion regarding any position I may take on related issues decided by the HIT Policy Committee. Big Data Need Data Privacy and Utility Old and New Big Data g h i j k l m n o p Clayton, P., et al. For The Record. National Academy Press,1997. ! L. Sweeney, 2011. ! Big Data $ $ $ Big Data n i a P Need Data Government Collects Data! 1995 1999 Problem: Expensive healthcare Problem: Dead beat dads Problem: Expensive atrisk children Problem: Measles epidemic Solution: Hospital discharge data Solution: Directory of new Hires Solution: Electronic birth certificate Solution: Childhood Immunization registry L. Sweeney. Information Explosion. Confidentiality, Disclosure, and Data Access: Theory and Practical Applications for Statistical Agencies, L. Zayatz, P. Doyle, J. Theeuwes and J. Lane (eds), Urban Institute, Washington, DC, 2001. Hospital Discharge Data, fields 1-12 # Field description Size 1 HOSPITAL ID NUMBER 12 2 PATIENT DATE OF BIRTH (MMDDYYYY) 8 3 SEX 1 4 ADMIT DATE (MMDYYYY) 8 5 DISCHARGE DATE (MMDDYYYY) 8 6 ADMIT SOURCE 1 7 ADMIT TYPE 1 8 LENGTH OF STAY (DAYS) 4 9 PATIENT STATUS 2 10 PRINCIPAL DIAGNOSIS CODE 6 11 SECONDARY DIAGNOSIS CODE - 1 6 12 SECONDARY DIAGNOSIS CODE - 2 6 Copyright © 2002 Sweeney Hospital Discharge Data, fields 12-25 # Field description Size 13 SECONDARY DIAGNOSIS CODE - 3 6 14 SECONDARY DIAGNOSIS CODE - 4 6 15 SECONDARY DIAGNOSIS CODE - 5 6 16 SECONDARY DIAGNOSIS CODE - 6 6 17 SECONDARY DIAGNOSIS CODE - 7 6 18 SECONDARY DIAGNOSIS CODE - 8 6 19 PRINCIPAL PROCEDURE CODE 7 20 SECONDARY PROCEDURE CODE - 1 7 21 SECONDARY PROCEDURE CODE - 2 7 22 SECONDARY PROCEDURE CODE - 3 7 23 SECONDARY PROCEDURE CODE - 4 7 24 SECONDARY PROCEDURE CODE - 5 7 25 DRG CODE 3 Copyright © 2002 Sweeney Hospital Discharge Data, fields 26-37 # Field description Size 26 MDC CODE 2 27 TOTAL CHARGES 9 28 ROOM AND BOARD CHARGES 9 29 ANCILLARY CHARGES 9 30 ANESTHESIOLOGY CHARGES 9 31 PHARMACY CHARGES 9 32 RADIOLOGY CHARGES 9 33 CLINICAL LAB CHARGES 9 34 LABOR-DELIVERY CHARGES 9 35 OPERATING ROOM CHARGES 9 36 ONCOLOGY CHARGES 9 37 OTHER CHARGES 9 Copyright © 2002 Sweeney Hospital Discharge Data, fields 38-50 # Field description Size 38 NEWBORN INDICATOR 1 39 PAYER ID 1 9 40 TYPE CODE 1 1 41 PAYER ID 2 9 42 TYPE CODE 2 1 43 PAYER ID 3 9 44 TYPE CODE 3 1 45 PATIENT ZIP CODE 5 46 Patient Origin COUNTY 3 47 Patient Origin PLANNING AREA 3 48 Patient Origin HSA 2 49 PATIENT CONTROL NUMBER 50 HOSPITAL HSA 2 Copyright © 2002 Sweeney Government Collects Data! 1995 1999 Problem: Expensive healthcare Problem: Dead beat dads Problem: Expensive atrisk children Problem: Measles epidemic Solution: Hospital discharge data Solution: Directory of new Hires Solution: Electronic birth certificate Solution: Childhood Immunization registry L. Sweeney. Information Explosion. Confidentiality, Disclosure, and Data Access: Theory and Practical Applications for Statistical Agencies, L. Zayatz, P. Doyle, J. Theeuwes and J. Lane (eds), Urban Institute, Washington, DC, 2001. Government Collects Data! 1995 1999 Problem: Expensive healthcare Problem: Dead beat dads Problem: Expensive atrisk children Problem: Measles epidemic Solution: Hospital discharge data Solution: Directory of new Hires Solution: Electronic birth certificate Solution: Childhood Immunization registry L. Sweeney. Information Explosion. Confidentiality, Disclosure, and Data Access: Theory and Practical Applications for Statistical Agencies, L. Zayatz, P. Doyle, J. Theeuwes and J. Lane (eds), Urban Institute, Washington, DC, 2001. Government Collects Data! 1995 1999 Problem: Expensive healthcare Problem: Dead beat dads Problem: Expensive atrisk children Problem: Measles epidemic Solution: Hospital discharge data Solution: Directory of new Hires Solution: Electronic birth certificate Solution: Childhood Immunization registry L. Sweeney. Information Explosion. Confidentiality, Disclosure, and Data Access: Theory and Practical Applications for Statistical Agencies, L. Zayatz, P. Doyle, J. Theeuwes and J. Lane (eds), Urban Institute, Washington, DC, 2001. Can we understand our customers? Loyalty card programs Privacy and Utility Privacy! Utility! Traditional Belief System! Technical Solutions! Subject … Recipient Provider “No privacy, get over it” Ann 10/2/61 02139 cardiac Abe 7/14/61 02139 cancer Al 3/8/61 02138 liver Provider Recipient “Can’t release data” Recipient Provider “Share data with privacy” A* A* A* Recipient 1961 1961 1961 0213* cardiac 0213* cancer 0213* liver Provider Old and New Old way: De-identification Name Visit date Diagnoses Procedures Dataset Address ZIP Birth date Sex Date registered Party affiliation Date last voted Voter List Sweeney Simple Demographics Often Identify People Uniquely. 2000. dataprivacylab.org/projects/identifiability/index.html 18.1% 0.04% 58.4% 3.6% 0.04% 87% 3.7% 0.04% 0.00004% County Town/ Place ZIP 5-digit Gender Date of Birth Mon/Yr Birth Year of Birth Ad Hoc Schemes that Don’t Work Mouth Only Grayscale Black & White Single Bar Mask T- Mask Ordinal Threshold Ad Hoc Schemes that Don’t Work Random Grayscale Black & White Negative Grayscale Black & White Pixelation Mr. Potato Head Pixelation – recognition improved! De-identification k-Anonymity -Pixel -Eigen k= 2 3 5 10 50 100 De-identification Specific Use Image! Fitted Mesh! Overlayed! Reconstruction! De-identification Risk Assessment Privacert.com Description * Date of visit (month, day and year) Transaction# Unique patient identifier * Patient 5-digit ZIP code * Month, day and Year of Birth * Gender Unique Provider ID Provider 5-digit ZIP code * ICD9 diagnosis code 1 * ICD9 diagnosis code 2 * ICD9 diagnosis code 3 * ICD9 diagnosis code 4 * ICD9 diagnosis code 5 * ICD9 diagnosis code 6 Year 3-digit ZIP Syndromic groups Description Name Date of visit (month, day and year) Patient 5-digit ZIP code ICD9 diagnosis code 1 ICD9 diagnosis code 2 ICD9 diagnosis code 3 ICD9 diagnosis code 4 ICD9 diagnosis code 5 ICD9 diagnosis code 6 Month, day and Year of Birth Gender Date ZIP Dx1 Dx2 Dx3 Dx4 Dx5 Dx6 DOB Sex Year 3-digit ZIP Year Cumulative Percentage of Patients 100.0% 90.0% 80.0% 70.0% 60.0% Unaltered 50.0% Safe 40.0% 30.0% 20.0% 10.0% 0.0% 1 6 11 16 Binsize 21 26 Description Date of visit (month, day and year) Patient 5-digit ZIP code ICD9 diagnosis code 1 ICD9 diagnosis code 2 ICD9 diagnosis code 3 ICD9 diagnosis code 4 ICD9 diagnosis code 5 ICD9 diagnosis code 6 Month, day and Year of Birth Gender Name Date ZIP Dx1 Dx2 Dx3 to month DOB Dx4 of birth? year Dx5 Dx6 DOB Sex and Cumulative Percentage of Patients 100.0% 90.0% 80.0% 70.0% 60.0% MonDayYr 50.0% Safe 40.0% MonYr 30.0% 20.0% 10.0% 0.0% 1 6 11 16 Binsize 21 26 Description Date of visit (month, day and year) Patient 5-digit ZIP code ICD9 diagnosis code 1 ICD9 diagnosis code 2 ICD9 diagnosis code 3 ICD9 diagnosis code 4 ICD9 diagnosis code 5 ICD9 diagnosis code 6 Month, day and Year of Birth Gender Name Date ZIP Dx1 Dx2 Dx3 Generalize Dx4 Dx5 Dx6 DOB Sex more? Cumulative Percentage of Patients 100.0% 90.0% 80.0% 70.0% MonDayYr 60.0% Safe 50.0% MonYr 40.0% Year Age 30.0% 20.0% No change! 10.0% 0.0% 1 6 11 16 Binsize 21 26 Name Visit date Diagnoses ZIP Month & Year of Birth Sex Bio-Surveillance Data Address Date registered Party affiliation Date last voted Voter List Name ZIP 5-year age range ZIP Visit date Diagnoses Sex Bio-Surveillance Address Date Month & Year of registered Birth Party Sex affiliation Hospital Data Date last voted Voter List Description Name Date of visit (month, day and year) Patient 5-digit ZIP code Syndrome subclass ICD9 diagnosis codefor 1 dx1 Syndrome subclass ICD9 diagnosis codefor 2 dx2 Syndrome subclass ICD9 diagnosis codefor 3 dx3 ICD9 diagnosis codefor 4 dx4 Syndrome subclass Syndrome subclass ICD9 diagnosis codefor 5 dx5 Syndrome subclass ICD9 diagnosis codefor 6 dx6 Month, and Year of Birth Year ofday birth Gender Date ZIP Dx1 Dx2 Dx3 Dx4 Dx5 Dx6 DOB Sex Cumulative Percentage of Patients 120.0% 100.0% 80.0% Unaltered 60.0% Safe HIPAA CERTIFIED! 40.0% Altered 20.0% 0.0% 0 5 10 15 20 -20.0% Binsize 25 30 35 Old way: Informed Consent 1 1 1 1 1 1 2 2 3 1 3 1 1 1 2 2 3 4 4 5 1 2 2 3 1 3 1 1 1 2 2 3 4 4 5 Informed Consent Open Consent Informed Consent Personal Access Control Patient Control MyDataCan.org P E R M I S S I O N S Rx Refill I N D I C E S D I R E C T O R Y ER Peace of Mind EHR Applications (Apps) API Rx D B A P I New Dx, Rx jsmith New Rx jsmith Current Rx’s, Allergies, Dx’s, Emergency contact jsmith PHR (private) jsmith MyDataCan Current Rx’s jsmith Old way: Aggregation Aggregation Differential Privacy Aggregation Multi-party Computation PrivaMix multiple sources produce an anonymous linked dataset privacy.cs.cmu.edu Simplified Multiplication Example 3 Mult( , 13) = 39 Mult( 7 , 13) = 91 Shelter1 13 39 69 91 253 Planning Office Mult(69, 13)= 8 Mult(253, 13) 97 = 3289 Shelter2 23 3 Mult( Mult( , 23) = 69 11 , 23) = 253 69 39 25391 897 = ) , 23 = 2093 9 3 ( t l Mu 91, 23) Mult( Shelter1 Shelter2 39 897 91 91 2093 897 3289 69 253 Planning Office Learns Planning Office Completely Mixed Shelter1 Shelter2 Dedentifiers 897 39 2093 91 3289 69 253 Simplified Multiplication Example 3 Mult( , 13) = 39 Shelter1 13 23)*13 3(3* *13 Planning Office (3 * 13) * 23 = 897 (3 * 23) * 13 = 897 Shelter2 23 3 Mult( , 23) = 69 3 2 * ) 33 (33 ** 12 Shelter1 Shelter2 39 897 897 69 How To Think About Privacy in What You Do high Priority 2 Priority 1 medium Priority 4 Priority 3 Priority 2 Priority 5 Priority 4 Priority 3 small medium small Likelihood of Impact Priority 3 high Likelihood of Occurrence TM PrivacyScholar [email protected] Social Network Datasets On Facebook? Facebook: Political view Facebook: Interested In Gender Race, Ethnicity Facebook: number of friends (school) Facebook: total number of friends Primary academic major* Facebook: number of Picture Friends Dorm* Facebook: Favorite Movies* Facebook: Favorite Music* Network linkages of roommates Facebook: Favorite Books* Facebook: linkages of friends Student Directory (like a public phone book) Impute gender from common names Use distributions to identify dorms Use roommates to identify people Big Data Need Data Privacy and Utility Old and New Can Technology Technology Save Privacy? Can Save Privacy Latanya Sweeney dataprivacylab.org [email protected]