160 likes | 363 Views
Workshop in information security Privacy in a Demographic Database. Lecturer: Dr. Eran Tromer Teaching assistant: Mr. Nir Atias Advisor: Prof. Kobbi Nissim. Razi Mukatren Golan Salman. Israel Central Bureau of Statistics (CBS, הלמ"ס ).
E N D
Workshop in information security Privacy in a Demographic Database Lecturer: Dr. Eran Tromer Teaching assistant: Mr. Nir Atias Advisor: Prof. Kobbi Nissim Razi Mukatren Golan Salman
Israel Central Bureau of Statistics(CBS, הלמ"ס) • Annually holds a comprehensive survey, provides information on the role of Israel's population welfare and living conditions. • Order of the size of the survey is about 7000 people. • The survey referred to ‘The Social Survey’. • Very comprehensive survey (68 A4 pages), contains a lot of questions. • The results of the survey published online on the CBS website.
Some of the questions appeared in the survey. • What was the subject of your studies toward a first degree (B.A.)? • Do you own a dwelling? • Are you satisfied, in general, with the area in which you live? • Are you satisfied with the amount of parks, public gardens or greenery in the area in which you live? • Did you study in a learning institution towards an academic degree? • In what year did you complete your studies toward a third degree (Phd.)? • When you speak, do you mix languages? • Are you satisfied with your relationships with family members?
The way the data appeared:The website allows to see the results divided and filtered according to different categories Up to two filters and four variables.
The question we discussed in this project: Giving the data in the website, whether is it possible to retrieve all the answers of individual participants. Let’s look at the following table created after the following selection: Filter A - Status / Widower Filter B - Military service / Yes Variable A - Sex Variable B - number of children.
Identifying unique records:Every 1 in the table created after choice of two filters and two variables, represents a participant who took part in the survey. Now, we can extract his records. How? Recall that the table on the previous page we got after choosing two filters and two variables. Now, we’ll create the table with the same two filters, the same two variables, in addition to another new variable. A widower, served in the army, without children and male.
Querying unique records:Same as the previous one, only with the addition of a third attribute (employment status) From the Previous table: A widower, served in the army, without children and male. From this table: Employed. This is 5 things.
At this point we know 5 things: How can we get the others? Repeatedly query the 3rd attribute
After getting the records from the CBS, we can't compare them to any database, to make sure they are correct. So, how can we know that the records extracted from the CBS website reflect the real data? • We created a random database of 7064 records using the attribute distribution observed in the real database. • Our algorithm ran on those samples and extracted records from them. • Because the sample is random, we can make sure that the records extracted are true. • The random samples helped us understand that the algorithm is working. Let’s look at the real result
The results: • From the 7064 records in the real survey, we managed to extract 1005 records. Why did we stopped on 1005? • Each records include personal information of a person, who received a promise of confidentiality. • How can we be sure that there 1000 records you retrieved indeed correspond to answers given by real responders? • Answer: we cannot be 100% sure at this time, but the fact we managed to re-identify one record and confirm her data is a good indication.
Head hunting:Attempt to find one of the people who took part in the survey of 2011. • Friends and family. • Forums in the Internet. • The data we extracted. • Facebook.
At the same time we tried to look at the records and look for specific details which will help us to find a participate in the survey. • Then we noticed the following information: • A Muslim woman, 30 years old and single. • Monthly salary of over 21,000 ₪. • Lives in a small village in the north. • Commute time to work: an hour and a half. • Razi suspected he knew the woman. To make it sure, he contacted her and asked if previously participated in the CBS poll. She said yes.
She is a group manager at leading High-Tech company, lives in KfarIksal Some of the details we managed to find out about Tal were personal: Personal gross incoming Family gross incoming Health condition Military service Religion Personal feelings No. of cars in the family And more… Morning Yom Kippur eve, 7.10.2011: Representative of the CBS visited her house. He emphasize that the survey is anonymous. When we presented her the information she was unhappy that I know her salary
Conclusions • The survey answers of individual participants can be easily retrieved with a bit of programming. • Note: We didn't attack or analyze the security of the web site implementation. • We only used statistical and logical analysis of the published data. • 2. Participants are unaware of the potential violation of their privacy.