460 likes | 705 Views
Deoxyribonucleic acid (DNA) Biometrics CPSC 4600 Biometrics and Cryptography. DNA. DNA analysis is no longer confined to genetic and medical research. Criminal Forensics:
E N D
Deoxyribonucleic acid (DNA)BiometricsCPSC 4600 Biometrics and Cryptography
DNA • DNA analysis is no longer confined to genetic and medical research. • Criminal Forensics: • Forensic science relies heavily on the ability of DNA to identify the source of biological substances and determine who is most likely to have committed a crime. • This ability to identify an individual is enhanced by the variety of substances that contain DNA, including blood, hair, urine, bone, teeth, and tissues.
DNA • Criminal Forensics: • Using saliva, the FBI were able to match DNA samples from letters mailed to relatives by Theodore Kaczynski with DNA obtained from stamps on letters mailed by the Unabomber (University and Airline Bomber). • Identification of specimens using DNA has had other benefits, in one third of the cases where this technique has been used, DNA analysis has been able to exonerate people wrongly accused of crimes.
DNA • Establishing paternity • DNA analysis is now a common tool for establishing paternity, and it has been called on to identify remains after tragedies such as airline accidents. • Investigating migration of human beings and genetic disease • Anthropologists are using DNA analysis to study the migration of human beings across the oceans. • Historians employ these techniques to identify genetic disease in famous individuals. • Tracking endangered species • Wildlife biologists use the variation of DNA sequences between species to track endangered species.
Features of DNA • DNA is composed of FOUR different chemical building blocks called "bases". These four bases are: • adenine (A) • guanine (G) • thymine (T) • cytosine (C) • They are joined together in one strand by strong covalent bonds. These two strands are held together in a double helix because bases with complementary shapes can pair with each other.
Features of DNA (cont’d) • Adenine is able to pair with Thymine and Guanine pairs with Cytosine. • Complementary base pairs are found along the entire length of the DNA duplex. • The complementary nature of the two strands provides a basis for copying genetic information and for passing this information on to offspring.
Features of DNA (cont’d) • Information is stored in DNA in the sequence of bases just as information can be stored in a book in the sequence of letters. • Each human cell contains approximately 3 billion base pairs of DNA organized in 23 pairs of chromosomes. • Every person inherits one set of 23 chromosomes from the mother and one set of 23 chromosomes from the father.
Techniques used for DNA fingerprinting • Isolating the DNA in question from the rest of the cellular material in the nucleus. • Cutting the DNA into several pieces of different sizes. • Sorting the DNA pieces by size. • Denaturing the DNA, so that all of the DNA is rendered single-stranded. This can be done either by heating or chemically treating the DNA in the gel. • Blotting the DNA. • DNA sequence is detected: AGGCCTC • More: http://protist.biology.washington.edu/fingerprint/dnaintro.html
Polymerase Chain Reaction (PCR) for DNA Fingerprinting • Often DNA samples obtained from crime scenes are too small in quantity or too degraded by sunlight or high temperature to be analyzed by the restriction fragment length polymorphism (RFLP) method. • These samples are subjected to a different fingerprinting technique known as PCR. • PCR is a valuable technique because it provides a method for producing millions of copies of small regions of DNA.
DNA Matching -- Sequence Alignment AGGCTATCACCTGACCTCCAGGCCGATGCCC TAGCTATCACGACCGCGGTCGATTTGCCCGAC -AGGCTATCACCTGACCTCCAGGCCGA--TGCCC--- TAG-CTATCAC--GACCGC--GGTCGATTTGCCCGAC Definition Given two strings x = x1x2...xM, y = y1y2…yN, an alignment is an assignment of gaps to positions 0,…, N in x, and 0,…, N in y, so as to line up each letter in one sequence with either a letter, or a gap in the other sequence
What is a good alignment? Alignment: The “best” way to match the letters of one sequence with those of the other How do we define “best”? Alignment: A hypothesis that the two sequences come from a common ancestor through sequence edits Parsimonious explanation: Find the minimum number of edits that transform one sequence into the other
Scoring Function • Sequence edits: AGGCCTC • Mutations AGGACTC • Insertions AGGGCCTC • Deletions AGG .CTC Scoring Function: Match: +m Mismatch: -s Gap: -d Score F = (# matches) m - (# mismatches) s – (#gaps) d
How do we compute the best alignment? AGTGCCCTGGAACCCTGACGGTGGGTCACAAAACTTCTGGA M Too many possible alignments: O( 2M+N) AGTGACCTGGGAAGACCCTGACCCTGGGTCACAAAACTC N
DNA Matching -- Dot matrix method • The dot matrix method (dot plot method) is a graphical way of comparing two sequences. • In a dot matrix, two sequences to be compared are represented as horizontal and vertical axes of a two-dimensional diagram. • The comparison is done by scanning each residue of one sequence for similarity with all residues in the other sequence.
Dot matrix method • If a residue match is found, a dot is placed within the graph. Otherwise, the matrix positions will be left blank. • When the two sequences have substantial regions of similarity, many dots line up to form contiguous diagonal lines, which reveal the sequence alignment. • If there are interruptions in the middle of a diagonal line, they will indicate insertions and deletions. Parallel diagonal lines represent repetition. Basically Diagonal lines = alignment Non-diagonal lines = gaps
Dynamic Programming Dynamic programming is a method that determines optimal alignment between two sequences. Suppose we wish to align x1……xM y1……yN Let F(i,j) = optimal score of aligning x1……xi y1……yj
Dynamic Programming (cont’d) Three steps: 1. creates a two-dimensional alignment grid as in the dot matrix method. . 2. accumulates scores in the matrix for matches and mismatches b/w sequences. 3. traces back through matrix in reverse order to identify the highest scoring path.
F(i-1, j-1) F(i-1, j) Dynamic Programming (cont’d) +m/-s -d F( i, j-1) F(i, j) Notice three possible cases: • xi aligns to yj x1……xi-1 xi y1……yj-1 yj 2. xi aligns to a gap x1……xi-1 xi y1……yj - • yj aligns to a gap x1……xi - y1……yj-1 yj -d m, if xi = yj F(i,j) = F(i-1, j-1) + -s, if not F(i,j) = F(i-1, j) - d Match: +m Mismatch: -s Gap: -d F(i,j) = F(i, j-1) - d
Dynamic Programming (cont’d) • How do we know which case is correct? Inductive assumption: F(i, j-1), F(i-1, j), F(i-1, j-1) are optimal Then, F(i-1, j-1) + s(xi, yj) F(i, j) = max F(i-1, j) – d F( i, j-1) – d Where s(xi, yj) = m, if xi = yj; -s, if not Match: +m Mismatch: -s Gap: -d
Intuitive understanding of the algorithm F(i-1, j-1) F(i-1, j) F(i, j) is the maximum score from one of the three directions. +m/-s -d F( i, j-1) F(i, j) -d Match: +m Mismatch: -s Gap: -d
Example x = AGTA m = 1 y = ATA s = 1 d = 1 F(i,j) i = 0 1 2 3 4 Optimal Alignment: F(4,3) = 2 AGTA A-TA j = 0 -1 -2 1 1 0 2 0 0 1 0 3 -1 -1 0 2
Example x = AGTA m = 1 y = ATA s = -1 d = -1 F(i,j) i = 0 1 2 3 4 Optimal Alignment: F(4,3) = 2 AGTA A-TA j = 0 1 2 3 Score= 3 match + 0 mismatch + 1 gap = 3x1 + 0x(-1) + 1x(-1) = 2
The Needleman-Wunsch Matrix x1 ……………………………… xM Every nondecreasing path from (0,0) to (M, N) corresponds to an alignment of the two sequences y1 ……………………………… yN Can think of it as a divide-and-conquer algorithm
The Needleman-Wunsch Algorithm • Initialization. • F(0, 0) = 0 • F(0, j) = - j d • F(i, 0) = - i d • Main Iteration. Filling-in partial alignments • For each i = 1……M For each j = 1……N F(i-1,j) – d [case 1] F(i, j) = max F(i, j-1) – d [case 2] F(i-1, j-1) + s(xi, yj) [case 3] UP, if [case 1] Ptr(i,j) = LEFT if [case 2] DIAG if [case 3] • Termination. F(M, N) is the optimal score, and from Ptr(M, N) can trace back optimal alignment
Performance • Time: O(NM) • Space: O(NM)
The local alignment problem Given two strings x = x1……xM, y = y1……yN Find substrings x’, y’ whose similarity (optimal global alignment value) is maximum e.g. x = aaaacccccgggg y = cccgggaaccaacc
The Smith-Waterman algorithm Idea: Ignore badly aligning regions Modifications to Needleman-Wunsch: Initialization: F(0, j) = F(i, 0) = 0 0 Iteration: F(i, j) = max F(i – 1, j) – d F(i, j – 1) – d F(i – 1, j – 1) + s(xi, yj)
The Smith-Waterman algorithm Termination: • If we want the best local alignment… FOPT = maxi,j F(i, j) • If we want all local alignments scoring > t For all i, j find F(i, j) > t, and trace back
A T C T C G T A T G A T G 0 0 0 0 0 0 0 0 0 0 0 0 0 0 G 0 T 0 2 1 2 1 1 4 3 2 1 1 3 2 0 C 0 0 1 4 3 4 3 3 3 2 1 0 2 2 T 0 0 2 3 6 5 4 5 4 5 4 3 2 1 A 2 2 5 5 4 4 7 6 5 6 5 4 0 2 T 0 1 4 3 4 4 4 6 5 9 8 7 8 7 C 0 0 3 6 5 6 5 5 5 8 8 7 7 7 A 2 5 5 5 5 4 7 7 7 10 9 8 0 2 C 0 1 1 4 4 7 6 5 6 6 6 9 9 8 Smith-Waterman Algorithm (Example) m, if xi = yj S(i,j) = -s, if not • Align S1=ATCTCGTATGATGS2=GTCTATCAC 0 0 0 0 0 0 2 1 0 0 2 1 0 2 2 • d=1 4 3 5 7 9 8 10 A T C T C G T A T G A T G G T C T A T C A C
An example of Smith Waterman A T T G C Align with DP: A G G C Match: m = 1 Gap: d = -1 Mismatch: s = 0
An example of Smith Waterman 0 1 0 Match: 1 Gap: -1 Mismatch: 0
0 1 0 0 1 0 1 1 0 1 0 1 0 0 2 1 0 0 2 1 0 0 An example of Smith Waterman Match: 1 Gap: -1 Mismatch: 0 Score= 3 match + 1 mismatch + 1 gap = 3x1 + 1x0 + 1x(-1) = 2
Issues and concerns • Excessive concern with the biometric may have an eclipsing effect on the performance of the technology. One could: • plant DNA at the scene of the crime • associate another's identity with his biometrics, thereby impersonating without arousing suspicion • interfere with the interface between a biometric device and the host system, so that a "fail" message gets converted to a "pass".
Identity theft and privacy issues • Two types of privacy concerns: • Informational privacy. Relates to the unauthorized collection, storage, and usage of biometric information. For example, if someone’s iris scan is stolen it allows someone else to access personal information or financial accounts, the damage could be irreversible. • Personal privacy. Relates to an inherent discomfort individuals may feel when encountering biometric technology. • The former one is more critical.
Defining Application-Specific Privacy Risk: The BioPrivacy Impact Framework • Certain types of biometric deployments are more prone than others to lead to privacy-invasive uses, while other types of deployments have little or no bearing on privacy. • Biometrics, in and of themselves, are neither a protector nor an enemy of privacy. • The type of deployment determines the relation between biometrics and privacy.
Biometric Deployments • Overt versus Covert • User awareness and consent, • Notices and signs • A covert system can not permanently store biometric info collected from individuals who do not match watch lists. • Opt-in versus Mandatory • Mandatory system runs greater privacy risks than a voluntary or opt-in system. • Choice over whether one wants to provide one’s personal info is a central privacy principle.
Biometric Deployments • Verification versus Identification • Identification (1:N) is more susceptible to privacy-related abuse than a system only capable of 1:1 matching. • Fixed Duration versus Indefinite Duration • When deployed for an indefinite duration, the risk increases. • Public Sector versus Private Sector • Data in public sector are more likely to be misused.
Biometric Deployments • Citizen, Employee, Traveler, Student, Customer, Individual • User ownership versus Institutional Ownership of Biometric Data • Personal Storage versus Storage in Template Database
Sociological concerns • Physical concerns: • Biometric technology can cause physical harm to an individual using the methods, or instruments are unsanitary. • Personal information concerns: • whether our personal information taken through biometric methods can be misused, tampered with, or sold, e.g. by criminals stealing, rearranging or copying the biometric data. • The data obtained using biometrics can be used in unauthorized ways without the individual's consent.
Sociological concerns • Society fears in using biometrics will continue over time. As the public becomes more educated on the practices, and the methods are being more widely used, these concerns will become more and more evident. • Biometric technology is being used at border crossings that have electronic readers that are able to read the chip in the cards and verify the information present in the card and on the passport. • Biometric method allows for the increase in efficiency and accuracy of identifying people at the border crossing. CANPASS, by Canada Customs is currently being used by some major airports that have kiosks set up to take digital pictures of a person’s eye as a means of identification.
Conclusions • Despite these misgivings, biometric systems have the potential to identify individuals with a very high degree of certainty. • Forensic DNA evidence enjoys a particularly high degree of public trust at present • Also substantial claims are being made in respect of iris recognition technology, which has the capacity to discriminate between individuals with identical DNA, such as monozygotic twins.