210 likes | 458 Views
CFTR – gene cloning and initial bioinformatic analysis Riordan et 12(*) et Tsui (1989) Science 245:1066. Carlow IT Bioinformatics November 2006. * Including Francis Collins, later leader of the Human Genome Sequencing Project. Cystic fibrosis. Horrible inherited disease
E N D
CFTR – gene cloningand initial bioinformatic analysisRiordan et 12(*) et Tsui (1989)Science 245:1066 Carlow IT BioinformaticsNovember 2006 * Including Francis Collins, later leader of the Human Genome Sequencing Project
Cystic fibrosis • Horrible inherited disease • Affecting lung, pancreas, sweat-glands • Abnormally high trans-membrane electrical potential • Decreased Cl- ion membrane transport • Often associated with failure to respond to ATP dependent kinase • no phosphorylation: no function
More symptoms etc. • Difficult breathing • Early death (1959 6mths, 2006 38yrs) • More prone to infections (thicker mucus) • Can do pre-natal diagnosis or sweat test • "Woe is the child who tastes salty from a kiss on the brow, for he is cursed, and soon must die“ German proverb 1700s • We modify AMPs defensins: can make one effective in high salt environment??
Genetics & epidemiology • Located on chr 7q31.2 180Kb gene • 1 in 25 europeans carries a CFTR mutation so 1:2500 live birth have the disease • Males and female equally affected • Life expect higher in males – nobody knows why • Why so common? • Cholera toxin requires normal CFTR • Also possible connexion with typhus
Mapping • Genetic association with markers pinpoints chromosome 7 • Chromosome walking to zero in • NO genome sequence in those days
Clone and sequence • Why bother? • because we can! • ? can predict features/functions • ? Can compare CF v normal to identify mutation • Working with cDNA not genomic • Generate cDNA libraries from cells & cell-lines • Screen for cDNAs that hybridise with known CFTR fragment • Eventually (much hard work) got 19 overlapping cDNA clones
Fig 1 19 normal clones 2 CF clones
Fig3 - where expressed Patchy expression profile
Gene sequence • Clones span 6.1kb of RNA • ORF protein of 1480 amino acids • So bigger than 300AA average • In 1989 << 1000 human genes sequenced • Bioinformatic analysis possible then: • Start codon, consensus seq for transl start + AUG • 2nd structure prediction • Hydropathy plot • Homology searches (pre BLAST) • Glycosylation, Ser, Thr kinase sites
Start of ORF • 5’- AGACCAUGCA-3’ in CFTR • 5’-(CC)[A/G]CCAUGG(G) consensus • Convinced? • I’m not
The sequence 1 Exon splice Trscr Start AA count RNA count 2 TM domains Pred kinase sites
The sequence 2 First ATP Binding fold Is underlined Delta F 508 circled
Protein analysis Whole protein is two similar halves each with 6 membrane Spanning domains (hydropathic peaks) and two NBFs (hydrophilic regions) and a charged R region
DF508 Fig6 – homology/similarity Conserved, hydrophobic Aromatic position at 508 Comparing two conserved regions in CFTR and other proteins: some with Two, some with one similar region, multidrug resistance, transporters etc.
Structure of the fold • Two halves similar structure but low AA conservation (best is only 27/66 identities) • Others in family have much tighter conservation • No signal peptide says that orientation of first TM domain is (i – o) • External loops very short • …except between TM7 and TM8 where there is N glycosylation site
More… • R domain is one exon 69/241 residues are polar alternating +ve and –ve charge regions • Also most of the phosphorylation kinase sites • All family members secrete something: • Chloride (CFTR) • Pigment (drosophila white gene) • lytic peptide (E. coli hemolysin) • …so what about the “function unknown” mbpX gene in liverwort chloroplasts ?
More… • Hypothesise that CFTR is the ion channel • 10/12 of TM domains have >1 +ve AA • ie. amphipathic helix • cf. brain Na+ channel & GABA-R Cl- channel • Contrast p-glycoprotein • Closely realted but no +ve TM AAs • Big protein – maybe also other functions
Fig 7 a composite model Glycosylation
Conclude • From very little data and very small DB N=bases N=seqs • 1988 23,800,000 20,5791989 34,762,585 28,7911990 49,179,285 39,533 • 2000 11,101,066,288 10,106,023 • to compare with can make predictions about structure and function that have stood the test of time.
Postscript • DF508 may be about delivery of protein to the membrane • Functions fine if you trick cells to deliver! • By 1995 300 different mutations identified in the gene • Last month 1531 different mutations at • http://www.genet.sickkids.on.ca/cftr/StatisticsPage.html • With human genome, SNPs, ESTs much easier to interpret sequence information