160 likes | 171 Views
Privacy in a Demographic Database. Razi Mokatren Golan Salman. Israel Central Bureau of Statistics (CBS, הלמ"ס ). 1. Governmental unit that works under the auspices of the Prime Minister's Office.
E N D
Privacy in a Demographic Database Razi Mokatren Golan Salman
Israel Central Bureau of Statistics(CBS, הלמ"ס) 1. Governmental unit that works under the auspices of the Prime Minister's Office. 2. Annually holds a comprehensive survey, provide information on the role of Israel's population welfare and living conditions. 3. Order of the size of the survey is about 7000 people. 4. The results of the survey publish online on the CBS website.
The way the data appeared:The website allow to see the results when it’s divided and filtered according to different categories Up to two filters and four variables.
The question we’ll deal in this project: Giving the data in the website, whether is it possible to restore the answers of one of the participants. Let’s look at the following table created after the following selection: Filter A - Status / Widower Filter B - Military service / Yes Variable A - Sex Variable B - number of children.
We found the following legal:Every 1 in the table created after choice of two filters and two variables represents a participant who took part in the survey. Now, we can restore his records. How? Recall that the table on the previous page we got after choosing two filters and two variables. Now, we’ll create the table with the same two filters, the same two variables in addition to another new variable A widower, served in the army, without children and male.
Let’s look at the following table:Same as the previous one, only with the addition of a third variable (employment status) From the Previous table: A widower, served in the army, without children and male. From this table: Employed. This is 5 things.
In this point we know 5 things: How can we get the others? In loop, we’ll switch the third parameter. How can we make sure that the records are correct?
After getting the records from the CBS, we can't compare them to any database, to make sure they are correct. So, how can we know that the records extracted from the CBS website reflect the real data? • We created a lot of random samples of size 7064. • -Our algorithm ran on those samples and extracted records from them. • -Because the sample is random, we can make sure that the records extracted are true.
The Results: The random samples helped us understand that the algorithm is working. Let’s look at the real result: From the 7064 records in the real survey, we managed to restore amount of XXX records. Each records include personal information of a person, who received a promise of Confidentiality.
Once we realized that we can restore the records, we went to the nextdestination: Attempt to find one of the people who took part in the survey of 2011. • Friends and family. • Forums in the Internet. • The data we extracted. • Facebook.
At the same time we tried to look at the records and look for specific details which will help us to find a participate in the survey. Then we noticed the following information: 1. A Muslim woman, 30 years old and single. 2. Monthly salary of over 21,000 ₪. 3. Lives in a small village in the north (probably a village). 4. Commute time to work: an hour and a half. Razi suspected he knew the girl. To make sure he contacted her and asked if previously participated in the CBS poll. She said yes.
Nada Shaladi, 30 years old, live in Kpar Eicsel. Morning Yom Kippur eve, 7.10.2011: Representative of the CBSknocked on Nada door. He emphasize that the survey is anonymous. When we presented to Nada the information we have, she couldn’t believe it.
Some of the details we managed to find out about Nada. Did you serve in the army? No What was your gross salary last month, before deductions, from all places of work? (in NIS) More then 21,000 NIS What is you Religion? Moslem How long does it usually take you to get to workplace? 60-89 minutes
Return to the question we tried to answer:Is it possible to restore the answers of a particular person from the data appeared in the website? This project prove that the answer to the question is YES. What does it say about the security in the CBS website?
We could see in the CBS website severe privacy of the survey participants. 1. Even tough it’s not Immediate, a person who want to find out some personal details of a specific participant, could easily achieve it. 2. Most of the participants don’t aware to the fact that their personal data exposed to all in the website. 3. It is not clear whether the CBS people aware to the failure.
The question we’ll deal in this project: Giving the data in the website, whether is it possible to restore the answers of one of the participants.