1k likes | 2.27k Views
Slides for the "Typography and Color Theory for Bioinformatics" session.<br>IT Basic for Bioinformatics, Spring 2019<br>Interdisciplinary Program in Bioinformatics, Seoul National University
E N D
생물정보학을 위한 IT기초 2019년 1학기 생물정보학을 위한 타이포그래피, 색상론 Typography and Color Theory for Bioinformatics 장혜식 서울대학교 생명과학부
Typography for Bioinformatics
Typo graphy
© Melissa Harrison Typo?
Goenner / imgur.com gembarrett / twitter.com
Goenner / imgur.com gembarrett / twitter.com TheOnlyUsernameLeftThatIsntTaken / imgur.com
Systematic Identification of Factors for Provirus Silencing in Embryonic Stem Cells Abstract, Embryonic stem cells (ESCs) repress the expression of exogenous proviruses and endogenous retroviruses (ERVs). Here, we systematically dissected the cellular factors involved in provirus repression in embryonic carcinomas (ECs) and ESCs by a genome-wide siRNA screen. Histone chaperones (Chaf1a/b), sumoylation factors (Sumo2/Ube2i/Sae1/Uba2/ Senp6), and chromatin modifiers (Trim28/Eset/Atf7ip) are key determinants that establish provirus silencing. RNA-seq analysis uncovered the roles of Chaf1a/b and sumoylation modifiers in the repression of ERVs. ChIP-seq analysis demonstrates direct recruitment of Chaf1a and Sumo2 to ERVs. Chaf1a reinforces transcriptional repression via its interaction with members of the NuRD complex (Kdm1a, Hdac1/2) and Eset, while Sumo2 orchestrates the provirus repressive function of the canon- ical Zfp809/ Trim28/Eset machinery by sumoylation of Trim28. Our study reports a genome-wide atlas of functional nodes that mediate proviral silencing in ESCs and illuminates the comprehensive, interconnected, and multi-layered genetic and epigenetic mechanisms by which ESCs repress retroviruses within the genome. 국내 학회 초록집에서 흔히 보는 상황 (내용은 관련 없는 초록에서 인용하여 재현된 것임) The expression of proviruses and endogenous retroviruses (ERVs) is restricted in pluripotent stem cells (Feuer et al., 1989; Niwa et al., 1983; Teich et al., 1977). This silencing has likely evolved for the protection of germline cells from insertional muta- genesis (Gaudet et al., 2004; Walsh et al., 1998). The expression and DNA methylation profiles of the Moloney murine leukemia virus (MMLV) have been investigated in embryonic carcinoma
Inter-Individual Variation in RNA Decay (Spearman r=0.17, P,10216for gene length and Spearman r=0.09, P,10216for 39UTR length). This association is also evident when we limit this analysis only to genes classified as decaying slower or faster than the mean decay rate (Figure 2A; Figure S4; Spearman r=0.15; P,10216for gene length and Spearman r=0.09; P,1028for 39UTR length). The increased 39 UTR length in faster decaying genes is thought to indicate an increase in potential regulatory space that could harbor RNA- decay regulatory elements (reviewed in [6]). Studies of mRNA decay of individual genes have previously identified two main classes of cis regulatory elements that might play roles in decay processes: microRNA (miRNA) binding sites [11] and AU-rich elements [15,17]. To determine the possible influence of miRNA binding on decay rates in the LCLs, we curated several miRNA databases [19,20,22–25] to create a list of confident miRNA target binding sites (see Methods S1). To account for the confounding effect of transcript length (more binding sites in longer 39UTRs), we standardized the number of miRNA target binding sites by the 39UTR length (see Methods). Using this approach, we found a slightly positive correlation between the density of miRNA target sites and decay rates. Again, when we focused exclusively on the genes classified as decaying slower or faster than the mean decay rate, we observed a stronger association (Figure 2B, Spearman r=0.16; P,0.003). We then considered the presence of AU-rich elements (AREs) in slower versus faster decaying genes. To do so, we used the AREScore algorithm [26], which searches within 39UTRs for features associated with typical type-II AREs, to assign an AREScore to each gene. A larger AREScore essentially implies increased potential for binding by an ARE-recognizing RNA binding protein to regulate the decay processes of the gene. We found that there is a significantly increased median AREScore in faster decaying genes compared to slower decaying genes (Figure 2C, Spearman r=0.14; P,10216). As our findings support the general notion that cis regulatory elements, such as miRNA bindings sites or AU-rich elements, are important determinants of mRNA decay rates, we next searched for additional sequence motifs that might represent novel binding sites for specific decay factors in LCLs. To do so, we used the FIRE algorithm [30] to search for motifs in the 146 slow decaying genes and 716 fast decaying genes. We identified three time points (see Methods for more details). We excluded from all further analyses genes that were not detected as expressed even before the arrest of transcription (time point zero) in at least 80% of individuals (see Methods). Overall, we obtained individual- specific estimates of mRNA decay rates for 16,823 Ensembl genes (see Table S1). The Contribution of RNA Decay Quantitative Trait Loci to Inter-Individual Variation in Steady-State Gene Expression Levels Characterization of genome-wide decay rates As a first step of our analysis, we characterized the genome-wide distribution of mRNA decay rates. To do so, for each gene we used the median decay rate across individuals as a measure of the gene-specific mRNA decay rate. We observed a wide range of mRNA decay rates across genes (Figure 1A), consistent with findings of previous studies. We also observed a substantial amount of variation in decay rates across individuals within each gene (Figure 1B), consistent with expectations from previous studies in human cells [1,35,40]. We classified genes as either consistently slow or fast decaying when their decay rates in at least 80% of individuals in our study were classified as slow or fast relative to the individual-mean decay rate (see Methods). We thus identified 146 genes that consistently decayed slower than average across individuals and 716 genes that consistently decayed faster than average. In agreement with previous observations, we found that genes with related biological functions often decayed at similar rates [1,52,52]. Genes with slower decay rates tend to be involved in cellular and organelle-related housekeeping processes, such as cytoplasmic and mitochondrial processes (Table S2). Genes with faster decay rates are enriched for gene regulatory functions that might require rapid mRNA decay to ensure rapid turnover of expression levels in response to changing cellular conditions (Table S3). This includes enrichments for functional annotations such as metabolic processes, regulation of gene expression, and regulation of transcription. We next investigated possible mechanisms that could account for variation in mRNA decay rates across genes. Previous studies have suggested that increased transcript length [3,41], and specifically 39UTR length [1,3], might significantly influence mRNA decay rates. Indeed, we find that both are slightly but significantly positively correlated with decay rates across genes Athma A. Pai1*, Carolyn E. Cain1, Orna Mizrahi-Man1, Sherryl De Leon2, Noah Lewellen2, Jean- Baptiste Veyrieras1,3, Jacob F. Degner1,4, Daniel J. Gaffney1,2, Joseph K. Pickrell1, Matthew Stephens1,5, Jonathan K. Pritchard1,3*, Yoav Gilad1* 1Department of Human Genetics, University of Chicago, Chicago, Illinois, United States of America, 2Howard Hughes Medical Institute, University of Chicago, Chicago, Illinois, United States of America, 3BioMiningLabs, Lyon, France, 4Committee on Genetics, Genomics, and Systems Biology, University of Chicago, Chicago, Illinois, United States of America, 5Department of Statistics, University of Chicago, Chicago, Illinois, United States of America Abstract Recent gene expression QTL (eQTL) mapping studies have provided considerable insight into the genetic basis for inter- individual regulatory variation. However, a limitation of all eQTL studies to date, which have used measurements of steady- state gene expression levels, is the inability to directly distinguish between variation in transcription and decay rates. To address this gap, we performed a genome-wide study of variation in gene-specific mRNA decay rates across individuals. Using a time-course study design, we estimated mRNA decay rates for over 16,000 genes in 70 Yoruban HapMap lymphoblastoid cell lines (LCLs), for which extensive genotyping data are available. Considering mRNA decay rates across genes, we found that: (i) as expected, highly expressed genes are generally associated with lower mRNA decay rates, (ii) genes with rapid mRNA decay rates are enriched with putative binding sites for miRNA and RNA binding proteins, and (iii) genes with similar functional roles tend to exhibit correlated rates of mRNA decay. Focusing on variation in mRNA decay across individuals, we estimate that steady-state expression levels are significantly correlated with variation in decay rates in 10% of genes. Somewhat counter-intuitively, for about half of these genes, higher expression is associated with faster decay rates, possibly due to a coupling of mRNA decay with transcriptional processes in genes involved in rapid cellular responses. Finally, we used these data to map genetic variation that is specifically associated with variation in mRNA decay rates across individuals. We found 195 such loci, which we named RNA decay quantitative trait loci (‘‘rdQTLs’’). All the observed rdQTLs are located near the regulated genes and therefore are assumed to act in cis. By analyzing our data within the context of known steady-state eQTLs, we estimate that a substantial fraction of eQTLs are associated with inter-individual variation in mRNA decay rates. 눈에 안 들어오기로 순위권을 달리던 옛날 PLOS Citation: Pai AA, Cain CE, Mizrahi-Man O, De Leon S, Lewellen N, et al. (2012) The Contribution of RNA Decay Quantitative Trait Loci to Inter-Individual Variation in Steady-State Gene Expression Levels. PLoS Genet 8(10): e1003000. doi:10.1371/journal.pgen.1003000 Editor: Greg Gibson, Georgia Institute of Technology, United States of America Figure 1. Profiles of decay rates. A. Distribution of genome-wide decay profiles across the timecourse experiment (x-axis), where each decay curve shows the decrease in gene expression level (y-axis) relative to the untreated time point. Each line represents the gene-specific median decay profile, while the darkness of the lines indicates the number of genes sharing that decay profile (darker indicates more genes). B. Representative examples of individual-specific decay profiles (dotted lines) for two genes: NFKBIE (in red), which decays faster than average and DCTN2 (in blue), which decays slower than average. Solid lines indicate the gene-specific median decay profile across all 70 individuals. doi:10.1371/journal.pgen.1003000.g001 Received April 27, 2012; Accepted August 14, 2012; Published October 11, 2012 Copyright: ! 2012 Pai et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Funding: This work was supported by NIH grant HG006123 to Y Gilad, Howard Hughes Medical Institute funds to JK Pritchard, an American Heart Association pre-doctoral fellowship to AA Pai, and an NIH Genetics and Regulation Training grant (AA Pai and JF Degner). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. PLOS Genetics | www.plosgenetics.org 3 October 2012 | Volume 8 | Issue 10 | e1003000 Competing Interests: The authors have declared that no competing interests exist. * E-mail: athma@uchicago.edu (AA Pai); pritch@uchicago.edu (JK Pritchard); gilad@uchicago.edu (Y Gilad) trans eQTLs that have been identified is the low power to map such loci compared to cis acting eQTLs (due to the stringent significance criteria required to avoid false positives when mapping across the entire genome, and generally small effect sizes of trans- QTLs [8,19–25]). Despite the recent success in mapping gene expression phenotypes, we still know little about the specific regulatory mechanisms that underlie eQTLs [26–29]. Partly, this gap is being addressed by a growing number of large-scale mapping studies of inter-individual variation in genetic and epigenetic regulatory mechanisms (which complement studies of gene expression variation [13,30–34]). Yet, even by incorporating such studies, the processes underlying regulatory variation and their relative importance remain difficult to infer, because all eQTL studies to date – regardless of the model system or species - have relied on measures of steady-state gene expression levels. Introduction Substantial variation in gene expression levels exists in natural populations [1–5]. Over the past decade, we have learned that much of this inter-individual regulatory variation is associated with specific genetic polymorphisms, which can be identified by mapping expression quantitative trait loci (eQTLs) [6–10]. Expression QTL mapping studies in different organisms have led to important insights into the genetic basis for gene regulation and, in a number of cases, into the mechanistic basis for complex phenotypes. In particular, recent eQTL mapping studies in humans have identified thousands of genetic variants affecting gene expression levels [11–14], some of which are loci that are also associated with complex diseases [15–18]. Nearly all human eQTLs, regardless of the tissue in which they were found, have been identified near the regulated genes and hence are assumed to act in cis. A partial explanation for the relatively small number of PLOS Genetics | www.plosgenetics.org 1 October 2012 | Volume 8 | Issue 10 | e1003000
RESEARCH Longitudinal histories as predictors of future diagnoses of domestic abuse: modelling study Ben Y Reis, assistant professor,1,2Isaac S Kohane, professor,1,2Kenneth D Mandl, associate professor1,2 “tyrannyoftheurgent”—wherethebriefpatient-doctor visitallowstimetodealwithonlyacutesituations,rather than optimise long term care.8As a result, much of the electronic health information might not be properly interpreted,used,orevenaccessed,leadingtopotential missed diagnoses of certain clinical conditions. One such condition is domestic abuse,9-11which is often difficult to diagnose from a single encounter and might go unrecognised for long periods of time as it is masked by acute conditions that form the basis of clin- ical visits.11-14Typically, after a diagnosis of abuse is made, a retrospective review of the longitudinal record reveals a discernable pattern of diagnoses suggestive of abuse. Domestic abuse is the most common cause of non-fatal injury to women in the United States9and accounts for more than half the murders of women every year.15It affects women and men and involves up to 16% of US couples a year,16with estimates of life- time prevalence as high as 54%11and lifetime risk of injury as high as 22%.9As undetected abuse can result inseriousinjuryandfatality,itiscriticalthatthoseatrisk should be identified as early as possible.121718 Studies have shown that screening for domestic abuse, along with appropriate follow-up,141920can be beneficialforearlydetection,treatment,andprevention of future violence, and carries few if any adverse effects.1214172122Forexample,onestudyused screening to identify 528 women as victims of intimate partner violence, of whom 443 (84%) agreed to speak to an advocate, 234 (54%) accepted case management fol- low-up, and 115 (49%) reported that they no longer believed they were at risk of violence from their abuser three to six weeks later.22Studies have also shown that both abused and non-abused patients favour routine screening.122324Asaresult,theAmericanMedicalAsso- ciation and the Joint Commission on Accreditation of Healthcare Organizations (JCAHO) have recom- mended routine screening for domestic abuse in the healthcare setting.111425A recent report from the BMA (BritishMedicalAssociation)urgeddoctorsandhealth- care professionals to be more vigilant for signs of domesticabuse.26Eventhoughsomedonotcallforuni- versalscreening,27manystillemphasisetheimportance of identifying and screening high risk patients.28 Screening for domestic abuse is particularly impor- tant in the emergency department, where victims are 1Children’s Hospital Informatics Program at the Harvard-MIT Division of Health Sciences and Technology, Children’s Hospital Boston, Boston, MA, USA 2Harvard Medical School, Boston, MA Correspondence to: B Y Reis, 1 Autumn St, Room 540.1, Boston, MA 02115 Ben_Reis@harvard.edu ABSTRACT Objective To determine whether longitudinal data in patients’ historical records, commonly available in electronichealthrecordsystems,canbeusedtopredicta patient’s future risk of receiving a diagnosis of domestic abuse. Design Bayesian models, known as intelligent histories, used to predict a patient’s risk of receiving a future diagnosis of abuse, based on the patient’s diagnostic history. Retrospective evaluation of the model’s predictions using an independent testing set. SettingAstate-wideclaimsdatabasecoveringsixyearsof inpatient admissions to hospital, admissions for observation, and encounters in emergency departments. PopulationAllpatientsagedover18whohadatleastfour years between their earliest and latest visits recorded in the database (561216 patients). Main outcome measures Timeliness of detection, sensitivity, specificity, positive predictive values, and area under the ROC curve. Results1.04%(5829)ofthepatientsmetthenarrowcase definition for abuse, while 3.44% (19303) met the broader case definition for abuse. The model achieved sensitive,specific(areaundertheROCcurveof0.88),and early(10-30monthsinadvance,onaverage)predictionof patients’ future risk of receiving a diagnosis of abuse. Analysis of model parameters showed important differences between sexes in the risks associated with certain diagnoses. Conclusions Commonly available longitudinal diagnostic data can be useful for predicting a patient’s future risk of receiving a diagnosis of abuse. This modelling approach could serve as the basis for an early warning system to help doctors identify high risk patients for further screening. Cite this as: BMJ 2009;339:b3677 doi:10.1136/bmj.b3677 전문가의 숨결이 느껴지는 조판 – BMJ INTRODUCTION Despitethecriticalimportanceofhistoricaldatainmed- icaldecisionmaking1-3andthegrowingamountoflong- itudinal data available in electronic health record systems, clinicians often do not have the time or the resources to reliably access, absorb, and review all the information available to consultations.4-7 Even with resources, assimilatingall availableinformation isa dif- ficulttask.Furthermore,Bodenheimeretaldescribethe them unlimited during brief and time BMJ | ONLINE FIRST | bmj.com page 1 of 9
RESEARCH Inter-Individual Variation in RNA Decay (Spearman r=0.17, P,10216for gene length and Spearman r=0.09, P,10216for 39UTR length). This association is also Next,weexaminedtherisksassociatedwithdifferent categories of illness. Figure 4 shows the distribution of partial risk scores in each of 12 general clinical cate- gories. (For visualisation purposes, the diagnoses were groupedinto12generalclinicalcategories,basedonthe clinicalclassificationsoftware(CCS)42publishedbythe Agency for Healthcare Researchand Quality (see table A on bmj.com). These categories were used for visuali- sation. For modelling, each ICD-9 code was treated individually.) The category related to psychological and mental health had the highest average risk score distribution overall, followed by the injury category. We also examined sex based differences in risk pro- files. Figure 5 shows a “treemap”43visualisation of the model for women and men. (Again, for purposes of visualisation, individual ICD-9 codes were grouped into CCS-level 2 diagnostic categories.42) The size of therectangleforeachdiagnosticcategoryindicatesthe prevalence in the abused population. The colour of each region indicates a continuous range of associated partial risk scores (from white = lowest to dark red = highest)forthecategoryasawhole.Severalinteresting trendsbecameevidentwhenwecomparedtherisksfor certain diagnostic categories between the two sexes (table 4).Whilemoreabusedmenhavealcoholrelated disorders, alcohol related disorders are more predic- tiveofabuseinwomenthantheyareinmen.Similarly, poisoningandinjuriesdueto externalcauses aremore predictiveofabuseinwomenthantheyareinmen.On the other hand, affective disorders, psychoses, and other mental conditions are more predictive of abuse in men than they are in women. Cite this as: BMJ 2009;339:b3677 doi:10.1136/bmj.b3677 time points (see Methods for more details). We excluded from all further analyses genes that were not detected as expressed even before the arrest of transcription (time point zero) in at least 80% of individuals (see Methods). Overall, we obtained individual- specific estimates of mRNA decay rates for 16,823 Ensembl genes (see Table S1). Table 3 |Performance of intelligent histories models using narrower case definition of abuse and broader case definition of abuse, assault, or intentional injury evident when we limit this analysis only to genes classified as decaying slower or faster than the mean decay rate (Figure 2A; Figure S4; Spearman r=0.15; P,10216for gene length and Spearman r=0.09; P,1028for 39UTR length). The increased 39 UTR length in faster decaying genes is thought to indicate an increase in potential regulatory space that could harbor RNA- decay regulatory elements (reviewed in [6]). Studies of mRNA decay of individual genes have previously identified two main classes of cis regulatory elements that might play roles in decay processes: microRNA (miRNA) binding sites [11] and AU-rich elements [15,17]. To determine the possible influence of miRNA binding on decay rates in the LCLs, we curated several miRNA databases [19,20,22–25] to create a list of confident miRNA target binding sites (see Methods S1). To account for the confounding effect of transcript length (more binding sites in longer 39UTRs), we standardized the number of miRNA target binding sites by the 39UTR length (see Methods). Using this approach, we found a slightly positive correlation between the density of miRNA target sites and decay rates. Again, when we focused exclusively on the genes classified as decaying slower or faster than the mean decay rate, we observed a stronger association (Figure 2B, Spearman r=0.16; P,0.003). We then considered the presence of AU-rich elements (AREs) in slower versus faster decaying genes. To do so, we used the AREScore algorithm [26], which searches within 39UTRs for features associated with typical type-II AREs, to assign an AREScore to each gene. A larger AREScore essentially implies increased potential for binding by an ARE-recognizing RNA binding protein to regulate the decay processes of the gene. We found that there is a significantly increased median AREScore in faster decaying genes compared to slower decaying genes (Figure 2C, Spearman r=0.14; P,10216). As our findings support the general notion that cis regulatory elements, such as miRNA bindings sites or AU-rich elements, are important determinants of mRNA decay rates, we next searched for additional sequence motifs that might represent novel binding sites for specific decay factors in LCLs. To do so, we used the FIRE algorithm [30] to search for motifs in the 146 slow decaying genes and 716 fast decaying genes. We identified three Sensitivity (%) Specificity (%) PPV (%) Mean days from detection to first abuse diagnosis Narrow case definition 1.8 99.9 14.4 280 Characterization of genome-wide decay rates As a first step of our analysis, we characterized the genome-wide distribution of mRNA decay rates. To do so, for each gene we used the median decay rate across individuals as a measure of the gene-specific mRNA decay rate. We observed a wide range of mRNA decay rates across genes (Figure 1A), consistent with findings of previous studies. We also observed a substantial amount of variation in decay rates across individuals within each gene (Figure 1B), consistent with expectations from previous studies in human cells [1,35,40]. We classified genes as either consistently slow or fast decaying when their decay rates in at least 80% of individuals in our study were classified as slow or fast relative to the individual-mean decay rate (see Methods). We thus identified 146 genes that consistently decayed slower than average across individuals and 716 genes that consistently decayed faster than average. In agreement with previous observations, we found that genes with related biological functions often decayed at similar rates [1,52,52]. Genes with slower decay rates tend to be involved in cellular and organelle-related housekeeping processes, such as cytoplasmic and mitochondrial processes (Table S2). Genes with faster decay rates are enriched for gene regulatory functions that might require rapid mRNA decay to ensure rapid turnover of expression levels in response to changing cellular conditions (Table S3). This includes enrichments for functional annotations such as metabolic processes, regulation of gene expression, and regulation of transcription. We next investigated possible mechanisms that could account for variation in mRNA decay rates across genes. Previous studies have suggested that increased transcript length [3,41], and specifically 39UTR length [1,3], might significantly influence mRNA decay rates. Indeed, we find that both are slightly but significantly positively correlated with decay rates across genes 3.5 99.8 14.3 331 3.9 99.75 13.0 350 6.5 99.5 10.9 390 10.3 99.0 8.9 459 17.5 98.0 7.6 501 21.1 97.5 7.4 523 35.5 95.0 6.3 613 50.8 92.5 6.0 661 64.2 90.0 5.7 749 82.6 85.0 4.9 890 87.3 80.0 4.0 898 Broad case definition 0.7 99.9 18.9 382 1.4 99.8 18.6 364 RESEARCH 1.7 99.75 17.6 398 2.8 99.5 15.0 421 5.5 99.0 14.8 435 Longitudinal histories as predictors of future diagnoses of domestic abuse: modelling study The Contribution of RNA Decay Quantitative Trait Loci to Inter-Individual Variation in Steady-State Gene Expression Levels 9.6 98.0 13.0 501 11.5 97.5 12.6 509 Ben Y Reis, assistant professor,1,2Isaac S Kohane, professor,1,2Kenneth D Mandl, associate professor1,2 20.9 95.0 11.6 564 “tyrannyoftheurgent”—wherethebriefpatient-doctor visitallowstimetodealwithonlyacutesituations,rather than optimise long term care.8As a result, much of the electronic health information might not be properly interpreted,used,orevenaccessed,leadingtopotential missed diagnoses of certain clinical conditions. One such condition is domestic abuse,9-11which is often difficult to diagnose from a single encounter and might go unrecognised for long periods of time as it is masked by acute conditions that form the basis of clin- ical visits.11-14Typically, after a diagnosis of abuse is made, a retrospective review of the longitudinal record reveals a discernable pattern of diagnoses suggestive of abuse. Domestic abuse is the most common cause of non-fatal injury to women in the United States9and accounts for more than half the murders of women every year.15It affects women and men and involves up to 16% of US couples a year,16with estimates of life- time prevalence as high as 54%11and lifetime risk of injury as high as 22%.9As undetected abuse can result inseriousinjuryandfatality,itiscriticalthatthoseatrisk should be identified as early as possible.121718 Studies have shown that screening for domestic abuse, along with appropriate follow-up,141920can be beneficialforearlydetection,treatment,andprevention of future violence, and carries few if any adverse effects.1214172122Forexample,onestudyused screening to identify 528 women as victims of intimate partner violence, of whom 443 (84%) agreed to speak to an advocate, 234 (54%) accepted case management fol- low-up, and 115 (49%) reported that they no longer believed they were at risk of violence from their abuser three to six weeks later.22Studies have also shown that both abused and non-abused patients favour routine screening.122324Asaresult,theAmericanMedicalAsso- ciation and the Joint Commission on Accreditation of Healthcare Organizations (JCAHO) have recom- mended routine screening for domestic abuse in the healthcare setting.111425A recent report from the BMA (BritishMedicalAssociation)urgeddoctorsandhealth- care professionals to be more vigilant for signs of domesticabuse.26Eventhoughsomedonotcallforuni- versalscreening,27manystillemphasisetheimportance of identifying and screening high risk patients.28 Screening for domestic abuse is particularly impor- tant in the emergency department, where victims are 1Children’s Hospital Informatics Program at the Harvard-MIT Division of Health Sciences and Technology, Children’s Hospital Boston, Boston, MA, USA 2Harvard Medical School, Boston, MA Correspondence to: B Y Reis, 1 Autumn St, Room 540.1, Boston, MA 02115 Ben_Reis@harvard.edu ABSTRACT Objective To determine whether longitudinal data in patients’ historical records, commonly available in electronichealthrecordsystems,canbeusedtopredicta patient’s future risk of receiving a diagnosis of domestic abuse. Design Bayesian models, known as intelligent histories, used to predict a patient’s risk of receiving a future diagnosis of abuse, based on the patient’s diagnostic history. Retrospective evaluation of the model’s predictions using an independent testing set. SettingAstate-wideclaimsdatabasecoveringsixyearsof inpatient admissions to hospital, admissions for observation, and encounters in emergency departments. PopulationAllpatientsagedover18whohadatleastfour years between their earliest and latest visits recorded in the database (561216 patients). Main outcome measures Timeliness of detection, sensitivity, specificity, positive predictive values, and area under the ROC curve. Results1.04%(5829)ofthepatientsmetthenarrowcase definition for abuse, while 3.44% (19303) met the broader case definition for abuse. The model achieved sensitive,specific(areaundertheROCcurveof0.88),and early(10-30monthsinadvance,onaverage)predictionof patients’ future risk of receiving a diagnosis of abuse. Analysis of model parameters showed important differences between sexes in the risks associated with certain diagnoses. Conclusions Commonly available longitudinal diagnostic data can be useful for predicting a patient’s future risk of receiving a diagnosis of abuse. This modelling approach could serve as the basis for an early warning system to help doctors identify high risk patients for further screening. 29.2 92.5 10.9 585 37.3 90.0 10.5 620 51.2 85.0 9.7 696 64.7 80.0 9.2 775 PPV=positive predictive value. Athma A. Pai1*, Carolyn E. Cain1, Orna Mizrahi-Man1, Sherryl De Leon2, Noah Lewellen2, Jean- Baptiste Veyrieras1,3, Jacob F. Degner1,4, Daniel J. Gaffney1,2, Joseph K. Pickrell1, Matthew Stephens1,5, Jonathan K. Pritchard1,3*, Yoav Gilad1* Prototype visualisation We also took the first steps towards describing how these models might form the basis of an early warning system to help doctors identify high risk patients for further screening. Figure 6 shows two sample Figure 1 shows the sensitivity versus the false alarm rate. Table 3 shows the performance achieved by the models at different benchmark specificities with the narrower and broader case definitions. As expected, the relatively low prevalence of the abuse diagnosis as a percentage of all patients in the dataset resulted in a low positive predictive value, depending on the chosen level of specificity. The positive predictive value was higher for the broader case definition, where cases were relatively more common. The model could detect high levels of risk of abuse farinadvanceofthefirstdiagnosisofabuserecordedin the system (fig 2). The model detected risk of abuse an averageof10-30monthsinadvance,dependingonthe chosen level of specificity. 1Department of Human Genetics, University of Chicago, Chicago, Illinois, United States of America, 2Howard Hughes Medical Institute, University of Chicago, Chicago, Illinois, United States of America, 3BioMiningLabs, Lyon, France, 4Committee on Genetics, Genomics, and Systems Biology, University of Chicago, Chicago, Illinois, United States of America, 5Department of Statistics, University of Chicago, Chicago, Illinois, United States of America 1000 Early detection horizon (days) Narrower case definition Broader case definition 800 Abstract 600 Recent gene expression QTL (eQTL) mapping studies have provided considerable insight into the genetic basis for inter- individual regulatory variation. However, a limitation of all eQTL studies to date, which have used measurements of steady- state gene expression levels, is the inability to directly distinguish between variation in transcription and decay rates. To address this gap, we performed a genome-wide study of variation in gene-specific mRNA decay rates across individuals. Using a time-course study design, we estimated mRNA decay rates for over 16,000 genes in 70 Yoruban HapMap lymphoblastoid cell lines (LCLs), for which extensive genotyping data are available. Considering mRNA decay rates across genes, we found that: (i) as expected, highly expressed genes are generally associated with lower mRNA decay rates, (ii) genes with rapid mRNA decay rates are enriched with putative binding sites for miRNA and RNA binding proteins, and (iii) genes with similar functional roles tend to exhibit correlated rates of mRNA decay. Focusing on variation in mRNA decay across individuals, we estimate that steady-state expression levels are significantly correlated with variation in decay rates in 10% of genes. Somewhat counter-intuitively, for about half of these genes, higher expression is associated with faster decay rates, possibly due to a coupling of mRNA decay with transcriptional processes in genes involved in rapid cellular responses. Finally, we used these data to map genetic variation that is specifically associated with variation in mRNA decay rates across individuals. We found 195 such loci, which we named RNA decay quantitative trait loci (‘‘rdQTLs’’). All the observed rdQTLs are located near the regulated genes and therefore are assumed to act in cis. By analyzing our data within the context of INTRODUCTION Despitethecriticalimportanceofhistoricaldatainmed- icaldecisionmaking1-3andthegrowingamountoflong- itudinal data available in electronic health record systems, clinicians often do not have the time or the resources to reliably access, absorb, and review all the information available to consultations.4-7 Even with resources, assimilatingall availableinformation isa dif- ficulttask.Furthermore,Bodenheimeretaldescribethe 400 200 them unlimited during brief and time 0 0.001 0.002 0.005 0.01 BMJ | ONLINE FIRST | bmj.com 0.02 0.05 0.10 0.20 page 1 of 9 Model composition Examination of the internal parameters of the model showed interesting findings. Firstly, we examined the effects of frequency of visits. As described above, each range of average number of visits a year was assigned a partial risk score. Figure 3 shows that partial risk score riseswiththeaveragenumberofvisitsayear.Anincrease in the number of visits would therefore increase a patient’s overall abuse score. The effect seems slightly stronger(steeperslope)amongwomenthanamongmen. False alarm rate Fig 2 | Average time in days (with 95% confidence intervals) from initial detection of high risk of abuse to first diagnosis of abuse recorded in dataset, measured for both narrow and broad case definitions. Plot includes detected abuse cases only. Model detects risk an average of 10-30 months in advance of first recorded diagnosis, depending on desired levels of specificity (shown on log scale for clarity). At high levels of specificity, fewer cases are detected, resulting in larger confidence intervals Figure 1. Profiles of decay rates. A. Distribution of genome-wide decay profiles across the timecourse experiment (x-axis), where each decay curve shows the decrease in gene expression level (y-axis) relative to the untreated time point. Each line represents the gene-specific median decay profile, while the darkness of the lines indicates the number of genes sharing that decay profile (darker indicates more genes). B. Representative examples of individual-specific decay profiles (dotted lines) for two genes: NFKBIE (in red), which decays faster than average and DCTN2 (in blue), which decays slower than average. Solid lines indicate the gene-specific median decay profile across all 70 individuals. doi:10.1371/journal.pgen.1003000.g001 known steady-state eQTLs, we estimate that a substantial fraction of eQTLs are associated with inter-individual variation in mRNA decay rates. page 4 of 9 BMJ | ONLINE FIRST | bmj.com PLOS Genetics | www.plosgenetics.org 3 October 2012 | Volume 8 | Issue 10 | e1003000 Citation: Pai AA, Cain CE, Mizrahi-Man O, De Leon S, Lewellen N, et al. (2012) The Contribution of RNA Decay Quantitative Trait Loci to Inter-Individual Variation in Steady-State Gene Expression Levels. PLoS Genet 8(10): e1003000. doi:10.1371/journal.pgen.1003000 Editor: Greg Gibson, Georgia Institute of Technology, United States of America Received April 27, 2012; Accepted August 14, 2012; Published October 11, 2012 Copyright: ! 2012 Pai et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Funding: This work was supported by NIH grant HG006123 to Y Gilad, Howard Hughes Medical Institute funds to JK Pritchard, an American Heart Association pre-doctoral fellowship to AA Pai, and an NIH Genetics and Regulation Training grant (AA Pai and JF Degner). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. Competing Interests: The authors have declared that no competing interests exist. * E-mail: athma@uchicago.edu (AA Pai); pritch@uchicago.edu (JK Pritchard); gilad@uchicago.edu (Y Gilad) trans eQTLs that have been identified is the low power to map such loci compared to cis acting eQTLs (due to the stringent significance criteria required to avoid false positives when mapping across the entire genome, and generally small effect sizes of trans- QTLs [8,19–25]). Despite the recent success in mapping gene expression phenotypes, we still know little about the specific regulatory mechanisms that underlie eQTLs [26–29]. Partly, this gap is being addressed by a growing number of large-scale mapping studies of inter-individual variation in genetic and epigenetic regulatory mechanisms (which complement studies of gene expression variation [13,30–34]). Yet, even by incorporating such studies, the processes underlying regulatory variation and their relative importance remain difficult to infer, because all eQTL studies to date – regardless of the model system or species - have relied on measures of steady-state gene expression levels. Introduction Substantial variation in gene expression levels exists in natural populations [1–5]. Over the past decade, we have learned that much of this inter-individual regulatory variation is associated with specific genetic polymorphisms, which can be identified by mapping expression quantitative trait loci (eQTLs) [6–10]. Expression QTL mapping studies in different organisms have led to important insights into the genetic basis for gene regulation and, in a number of cases, into the mechanistic basis for complex phenotypes. In particular, recent eQTL mapping studies in humans have identified thousands of genetic variants affecting gene expression levels [11–14], some of which are loci that are also associated with complex diseases [15–18]. Nearly all human eQTLs, regardless of the tissue in which they were found, have been identified near the regulated genes and hence are assumed to act in cis. A partial explanation for the relatively small number of PLOS Genetics | www.plosgenetics.org 1 October 2012 | Volume 8 | Issue 10 | e1003000
zety.com (좋은 이력서 양식은 쓰는 사람과 읽는 사람의 성향에 따라 조절해야 함)
요즘은 저널들이 리뷰어들이 보기 좋게 미리 조판된 manuscript를 권장하는 추세 Text 따로, Figures 따로, Figure Legends 따로, Suppl. Figures 따로, Suppl. Figure Legends 따로… Typesetted
타이포그래피 - 왜 알아둬야 하나? 나 독자/청중 관심이 많은가? 매우 관심 많음 그냥 한 번 와 봄 스마트폰과 갓 뽑은 논문 한 편 가져 옴 열심히 볼 마음의 준비가 됐는가? 정말 열심히 함 다른 사람의 험담에 더 관심이 많은가? 내 말이 더 소중함 솔깃함 발표자(내)가 잘 되기를 바라는가? 진심 별 관심도 없음
(재배포 불가능한 이미지) 좋은 타이포그래피?
RESEARCH Next,weexaminedtherisksassociatedwithdifferent categories of illness. Figure 4 shows the distribution of partial risk scores in each of 12 general clinical cate- gories. (For visualisation purposes, the diagnoses were groupedinto12generalclinicalcategories,basedonthe clinicalclassificationsoftware(CCS)42publishedbythe Agency for Healthcare Researchand Quality (see table A on bmj.com). These categories were used for visuali- sation. For modelling, each ICD-9 code was treated individually.) The category related to psychological and mental health had the highest average risk score distribution overall, followed by the injury category. We also examined sex based differences in risk pro- files. Figure 5 shows a “treemap”43visualisation of the model for women and men. (Again, for purposes of visualisation, individual ICD-9 codes were grouped into CCS-level 2 diagnostic categories.42) The size of therectangleforeachdiagnosticcategoryindicatesthe prevalence in the abused population. The colour of each region indicates a continuous range of associated partial risk scores (from white = lowest to dark red = highest)forthecategoryasawhole.Severalinteresting trendsbecameevidentwhenwecomparedtherisksfor certain diagnostic categories between the two sexes (table 4).Whilemoreabusedmenhavealcoholrelated disorders, alcohol related disorders are more predic- tiveofabuseinwomenthantheyareinmen.Similarly, poisoningandinjuriesdueto externalcauses aremore predictiveofabuseinwomenthantheyareinmen.On the other hand, affective disorders, psychoses, and other mental conditions are more predictive of abuse in men than they are in women. Table 3 |Performance of intelligent histories models using narrower case definition of abuse and broader case definition of abuse, assault, or intentional injury Sensitivity (%) Specificity (%) PPV (%) Mean days from detection to first abuse diagnosis Narrow case definition 1.8 99.9 14.4 280 3.5 99.8 14.3 331 3.9 99.75 13.0 350 6.5 99.5 10.9 390 10.3 99.0 8.9 459 17.5 98.0 7.6 501 21.1 97.5 7.4 523 35.5 95.0 6.3 613 50.8 92.5 6.0 661 64.2 90.0 5.7 749 82.6 85.0 4.9 890 87.3 80.0 4.0 898 Broad case definition 0.7 99.9 18.9 382 1.4 99.8 18.6 364 1.7 99.75 17.6 398 2.8 99.5 15.0 421 5.5 99.0 14.8 435 9.6 98.0 13.0 501 11.5 97.5 12.6 509 20.9 95.0 11.6 564 29.2 92.5 10.9 585 37.3 90.0 10.5 620 51.2 85.0 9.7 696 64.7 80.0 9.2 775 PPV=positive predictive value. Prototype visualisation We also took the first steps towards describing how these models might form the basis of an early warning system to help doctors identify high risk patients for further screening. Figure 6 shows two sample Figure 1 shows the sensitivity versus the false alarm rate. Table 3 shows the performance achieved by the models at different benchmark specificities with the narrower and broader case definitions. As expected, the relatively low prevalence of the abuse diagnosis as a percentage of all patients in the dataset resulted in a low positive predictive value, depending on the chosen level of specificity. The positive predictive value was higher for the broader case definition, where cases were relatively more common. The model could detect high levels of risk of abuse farinadvanceofthefirstdiagnosisofabuserecordedin the system (fig 2). The model detected risk of abuse an averageof10-30monthsinadvance,dependingonthe chosen level of specificity. 1000 Early detection horizon (days) Narrower case definition Broader case definition 800 600 400 200 0 0.001 0.002 0.005 0.01 0.02 0.05 0.10 0.20 Model composition Examination of the internal parameters of the model showed interesting findings. Firstly, we examined the effects of frequency of visits. As described above, each range of average number of visits a year was assigned a partial risk score. Figure 3 shows that partial risk score riseswiththeaveragenumberofvisitsayear.Anincrease in the number of visits would therefore increase a patient’s overall abuse score. The effect seems slightly stronger(steeperslope)amongwomenthanamongmen. False alarm rate Fig 2 | Average time in days (with 95% confidence intervals) from initial detection of high risk of abuse to first diagnosis of abuse recorded in dataset, measured for both narrow and broad case definitions. Plot includes detected abuse cases only. Model detects risk an average of 10-30 months in advance of first recorded diagnosis, depending on desired levels of specificity (shown on log scale for clarity). At high levels of specificity, fewer cases are detected, resulting in larger confidence intervals page 4 of 9 BMJ | ONLINE FIRST | bmj.com
RESEARCH Next,weexaminedtherisksassociatedwithdifferent categories of illness. Figure 4 shows the distribution of partial risk scores in each of 12 general clinical cate- gories. (For visualisation purposes, the diagnoses were groupedinto12generalclinicalcategories,basedonthe clinicalclassificationsoftware(CCS)42publishedbythe Agency for Healthcare Researchand Quality (see table A on bmj.com). These categories were used for visuali- sation. For modelling, each ICD-9 code was treated individually.) The category related to psychological and mental health had the highest average risk score distribution overall, followed by the injury category. We also examined sex based differences in risk pro- files. Figure 5 shows a “treemap”43visualisation of the model for women and men. (Again, for purposes of visualisation, individual ICD-9 codes were grouped into CCS-level 2 diagnostic categories.42) The size of therectangleforeachdiagnosticcategoryindicatesthe prevalence in the abused population. The colour of each region indicates a continuous range of associated partial risk scores (from white = lowest to dark red = highest)forthecategoryasawhole.Severalinteresting trendsbecameevidentwhenwecomparedtherisksfor certain diagnostic categories between the two sexes (table 4).Whilemoreabusedmenhavealcoholrelated disorders, alcohol related disorders are more predic- tiveofabuseinwomenthantheyareinmen.Similarly, poisoningandinjuriesdueto externalcauses aremore predictiveofabuseinwomenthantheyareinmen.On the other hand, affective disorders, psychoses, and other mental conditions are more predictive of abuse in men than they are in women. Table 3 |Performance of intelligent histories models using narrower case definition of abuse and broader case definition of abuse, assault, or intentional injury Sensitivity (%) Specificity (%) PPV (%) Mean days from detection to first abuse diagnosis Narrow case definition 1.8 99.9 14.4 280 3.5 99.8 14.3 331 3.9 99.75 13.0 350 6.5 99.5 10.9 390 10.3 99.0 8.9 459 17.5 98.0 7.6 501 21.1 97.5 7.4 523 35.5 95.0 6.3 613 50.8 92.5 6.0 661 64.2 90.0 5.7 749 82.6 85.0 4.9 890 87.3 80.0 4.0 898 Broad case definition 0.7 99.9 18.9 382 1.4 99.8 18.6 364 1.7 99.75 17.6 398 2.8 99.5 15.0 421 5.5 99.0 14.8 435 9.6 98.0 13.0 501 11.5 97.5 12.6 509 20.9 95.0 11.6 564 29.2 92.5 10.9 585 37.3 90.0 10.5 620 51.2 85.0 9.7 696 64.7 80.0 9.2 775 PPV=positive predictive value. Prototype visualisation We also took the first steps towards describing how these models might form the basis of an early warning system to help doctors identify high risk patients for further screening. Figure 6 shows two sample Figure 1 shows the sensitivity versus the false alarm rate. Table 3 shows the performance achieved by the models at different benchmark specificities with the narrower and broader case definitions. As expected, the relatively low prevalence of the abuse diagnosis as a percentage of all patients in the dataset resulted in a low positive predictive value, depending on the chosen level of specificity. The positive predictive value was higher for the broader case definition, where cases were relatively more common. The model could detect high levels of risk of abuse farinadvanceofthefirstdiagnosisofabuserecordedin the system (fig 2). The model detected risk of abuse an averageof10-30monthsinadvance,dependingonthe chosen level of specificity. 1000 Early detection horizon (days) Narrower case definition Broader case definition 800 600 400 200 0 0.001 0.002 0.005 0.01 0.02 0.05 0.10 0.20 Model composition Examination of the internal parameters of the model showed interesting findings. Firstly, we examined the effects of frequency of visits. As described above, each range of average number of visits a year was assigned a partial risk score. Figure 3 shows that partial risk score riseswiththeaveragenumberofvisitsayear.Anincrease in the number of visits would therefore increase a patient’s overall abuse score. The effect seems slightly stronger(steeperslope)amongwomenthanamongmen. False alarm rate Fig 2 | Average time in days (with 95% confidence intervals) from initial detection of high risk of abuse to first diagnosis of abuse recorded in dataset, measured for both narrow and broad case definitions. Plot includes detected abuse cases only. Model detects risk an average of 10-30 months in advance of first recorded diagnosis, depending on desired levels of specificity (shown on log scale for clarity). At high levels of specificity, fewer cases are detected, resulting in larger confidence intervals page 4 of 9 BMJ | ONLINE FIRST | bmj.com
RESEARCH Next,weexaminedtherisksassociatedwithdifferent categories of illness. Figure 4 shows the distribution of partial risk scores in each of 12 general clinical cate- gories. (For visualisation purposes, the diagnoses were groupedinto12generalclinicalcategories,basedonthe clinicalclassificationsoftware(CCS)42publishedbythe Agency for Healthcare Researchand Quality (see table A on bmj.com). These categories were used for visuali- sation. For modelling, each ICD-9 code was treated individually.) The category related to psychological and mental health had the highest average risk score distribution overall, followed by the injury category. We also examined sex based differences in risk pro- files. Figure 5 shows a “treemap”43visualisation of the model for women and men. (Again, for purposes of visualisation, individual ICD-9 codes were grouped into CCS-level 2 diagnostic categories.42) The size of therectangleforeachdiagnosticcategoryindicatesthe prevalence in the abused population. The colour of each region indicates a continuous range of associated partial risk scores (from white = lowest to dark red = highest)forthecategoryasawhole.Severalinteresting trendsbecameevidentwhenwecomparedtherisksfor certain diagnostic categories between the two sexes (table 4).Whilemoreabusedmenhavealcoholrelated disorders, alcohol related disorders are more predic- tiveofabuseinwomenthantheyareinmen.Similarly, poisoningandinjuriesdueto externalcauses aremore predictiveofabuseinwomenthantheyareinmen.On the other hand, affective disorders, psychoses, and other mental conditions are more predictive of abuse in men than they are in women. Table 3 |Performance of intelligent histories models using narrower case definition of abuse and broader case definition of abuse, assault, or intentional injury Sensitivity (%) Specificity (%) PPV (%) Mean days from detection to first abuse diagnosis Narrow case definition 1.8 99.9 14.4 280 3.5 99.8 14.3 331 3.9 99.75 13.0 350 6.5 99.5 10.9 390 10.3 99.0 8.9 459 17.5 98.0 7.6 501 21.1 97.5 7.4 523 35.5 95.0 6.3 613 50.8 92.5 6.0 661 64.2 90.0 5.7 749 82.6 85.0 4.9 890 87.3 80.0 4.0 898 Broad case definition 0.7 99.9 18.9 382 1.4 99.8 18.6 364 1.7 99.75 17.6 398 2.8 99.5 15.0 421 5.5 99.0 14.8 435 9.6 98.0 13.0 501 11.5 97.5 12.6 509 20.9 95.0 11.6 564 29.2 92.5 10.9 585 37.3 90.0 10.5 620 51.2 85.0 9.7 696 64.7 80.0 9.2 775 PPV=positive predictive value. Prototype visualisation We also took the first steps towards describing how these models might form the basis of an early warning system to help doctors identify high risk patients for further screening. Figure 6 shows two sample Figure 1 shows the sensitivity versus the false alarm rate. Table 3 shows the performance achieved by the models at different benchmark specificities with the narrower and broader case definitions. As expected, the relatively low prevalence of the abuse diagnosis as a percentage of all patients in the dataset resulted in a low positive predictive value, depending on the chosen level of specificity. The positive predictive value was higher for the broader case definition, where cases were relatively more common. The model could detect high levels of risk of abuse farinadvanceofthefirstdiagnosisofabuserecordedin the system (fig 2). The model detected risk of abuse an averageof10-30monthsinadvance,dependingonthe chosen level of specificity. 1000 Early detection horizon (days) Narrower case definition Broader case definition 800 600 400 200 0 0.001 0.002 0.005 0.01 0.02 0.05 0.10 0.20 Model composition Examination of the internal parameters of the model showed interesting findings. Firstly, we examined the effects of frequency of visits. As described above, each range of average number of visits a year was assigned a partial risk score. Figure 3 shows that partial risk score riseswiththeaveragenumberofvisitsayear.Anincrease in the number of visits would therefore increase a patient’s overall abuse score. The effect seems slightly stronger(steeperslope)amongwomenthanamongmen. False alarm rate Fig 2 | Average time in days (with 95% confidence intervals) from initial detection of high risk of abuse to first diagnosis of abuse recorded in dataset, measured for both narrow and broad case definitions. Plot includes detected abuse cases only. Model detects risk an average of 10-30 months in advance of first recorded diagnosis, depending on desired levels of specificity (shown on log scale for clarity). At high levels of specificity, fewer cases are detected, resulting in larger confidence intervals page 4 of 9 BMJ | ONLINE FIRST | bmj.com
RESEARCH Next,weexaminedtherisksassociatedwithdifferent categories of illness. Figure 4 shows the distribution of partial risk scores in each of 12 general clinical cate- gories. (For visualisation purposes, the diagnoses were groupedinto12generalclinicalcategories,basedonthe clinicalclassificationsoftware(CCS)42publishedbythe Agency for Healthcare Researchand Quality (see table A on bmj.com). These categories were used for visuali- sation. For modelling, each ICD-9 code was treated individually.) The category related to psychological and mental health had the highest average risk score distribution overall, followed by the injury category. We also examined sex based differences in risk pro- files. Figure 5 shows a “treemap”43visualisation of the model for women and men. (Again, for purposes of visualisation, individual ICD-9 codes were grouped into CCS-level 2 diagnostic categories.42) The size of therectangleforeachdiagnosticcategoryindicatesthe prevalence in the abused population. The colour of each region indicates a continuous range of associated partial risk scores (from white = lowest to dark red = highest)forthecategoryasawhole.Severalinteresting trendsbecameevidentwhenwecomparedtherisksfor certain diagnostic categories between the two sexes (table 4).Whilemoreabusedmenhavealcoholrelated disorders, alcohol related disorders are more predic- tiveofabuseinwomenthantheyareinmen.Similarly, poisoningandinjuriesdueto externalcauses aremore predictiveofabuseinwomenthantheyareinmen.On the other hand, affective disorders, psychoses, and other mental conditions are more predictive of abuse in men than they are in women. Table 3 |Performance of intelligent histories models using narrower case definition of abuse and broader case definition of abuse, assault, or intentional injury Sensitivity (%) Specificity (%) PPV (%) Mean days from detection to first abuse diagnosis Narrow case definition 1.8 99.9 14.4 280 3.5 99.8 14.3 331 3.9 99.75 13.0 350 6.5 99.5 10.9 390 10.3 99.0 8.9 459 17.5 98.0 7.6 501 21.1 97.5 7.4 523 35.5 95.0 6.3 613 50.8 92.5 6.0 661 64.2 90.0 5.7 749 82.6 85.0 4.9 890 87.3 80.0 4.0 898 Broad case definition 0.7 99.9 18.9 382 1.4 99.8 18.6 364 1.7 99.75 17.6 398 2.8 99.5 15.0 421 5.5 99.0 14.8 435 9.6 98.0 13.0 501 11.5 97.5 12.6 509 20.9 95.0 11.6 564 29.2 92.5 10.9 585 37.3 90.0 10.5 620 51.2 85.0 9.7 696 64.7 80.0 9.2 775 PPV=positive predictive value. Prototype visualisation We also took the first steps towards describing how these models might form the basis of an early warning system to help doctors identify high risk patients for further screening. Figure 6 shows two sample Figure 1 shows the sensitivity versus the false alarm rate. Table 3 shows the performance achieved by the models at different benchmark specificities with the narrower and broader case definitions. As expected, the relatively low prevalence of the abuse diagnosis as a percentage of all patients in the dataset resulted in a low positive predictive value, depending on the chosen level of specificity. The positive predictive value was higher for the broader case definition, where cases were relatively more common. The model could detect high levels of risk of abuse farinadvanceofthefirstdiagnosisofabuserecordedin the system (fig 2). The model detected risk of abuse an averageof10-30monthsinadvance,dependingonthe chosen level of specificity. 1000 Early detection horizon (days) Narrower case definition Broader case definition 800 600 400 200 0 0.001 0.002 0.005 0.01 0.02 0.05 0.10 0.20 Model composition Examination of the internal parameters of the model showed interesting findings. Firstly, we examined the effects of frequency of visits. As described above, each range of average number of visits a year was assigned a partial risk score. Figure 3 shows that partial risk score riseswiththeaveragenumberofvisitsayear.Anincrease in the number of visits would therefore increase a patient’s overall abuse score. The effect seems slightly stronger(steeperslope)amongwomenthanamongmen. False alarm rate Fig 2 | Average time in days (with 95% confidence intervals) from initial detection of high risk of abuse to first diagnosis of abuse recorded in dataset, measured for both narrow and broad case definitions. Plot includes detected abuse cases only. Model detects risk an average of 10-30 months in advance of first recorded diagnosis, depending on desired levels of specificity (shown on log scale for clarity). At high levels of specificity, fewer cases are detected, resulting in larger confidence intervals page 4 of 9 BMJ | ONLINE FIRST | bmj.com
RESEARCH Next,weexaminedtherisksassociatedwithdifferent categories of illness. Figure 4 shows the distribution of partial risk scores in each of 12 general clinical cate- gories. (For visualisation purposes, the diagnoses were groupedinto12generalclinicalcategories,basedonthe clinicalclassificationsoftware(CCS)42publishedbythe Agency for Healthcare Researchand Quality (see table A on bmj.com). These categories were used for visuali- sation. For modelling, each ICD-9 code was treated individually.) The category related to psychological and mental health had the highest average risk score distribution overall, followed by the injury category. We also examined sex based differences in risk pro- files. Figure 5 shows a “treemap”43visualisation of the model for women and men. (Again, for purposes of visualisation, individual ICD-9 codes were grouped into CCS-level 2 diagnostic categories.42) The size of therectangleforeachdiagnosticcategoryindicatesthe prevalence in the abused population. The colour of each region indicates a continuous range of associated partial risk scores (from white = lowest to dark red = highest)forthecategoryasawhole.Severalinteresting trendsbecameevidentwhenwecomparedtherisksfor certain diagnostic categories between the two sexes (table 4).Whilemoreabusedmenhavealcoholrelated disorders, alcohol related disorders are more predic- tiveofabuseinwomenthantheyareinmen.Similarly, poisoningandinjuriesdueto externalcauses aremore predictiveofabuseinwomenthantheyareinmen.On the other hand, affective disorders, psychoses, and other mental conditions are more predictive of abuse in men than they are in women. Table 3 |Performance of intelligent histories models using narrower case definition of abuse and broader case definition of abuse, assault, or intentional injury Sensitivity (%) Specificity (%) PPV (%) Mean days from detection to first abuse diagnosis Narrow case definition 1.8 99.9 14.4 280 3.5 99.8 14.3 331 3.9 99.75 13.0 350 6.5 99.5 10.9 390 10.3 99.0 8.9 459 17.5 98.0 7.6 501 21.1 97.5 7.4 523 35.5 95.0 6.3 613 50.8 92.5 6.0 661 64.2 90.0 5.7 749 82.6 85.0 4.9 890 87.3 80.0 4.0 898 Broad case definition 0.7 99.9 18.9 382 1.4 99.8 18.6 364 x-line 1.7 99.75 17.6 398 baseline descender line 2.8 99.5 15.0 421 5.5 99.0 14.8 435 9.6 98.0 13.0 501 11.5 97.5 12.6 509 20.9 95.0 11.6 564 29.2 92.5 10.9 585 37.3 90.0 10.5 620 51.2 85.0 9.7 696 64.7 80.0 9.2 775 PPV=positive predictive value. Prototype visualisation We also took the first steps towards describing how these models might form the basis of an early warning system to help doctors identify high risk patients for further screening. Figure 6 shows two sample Figure 1 shows the sensitivity versus the false alarm rate. Table 3 shows the performance achieved by the models at different benchmark specificities with the narrower and broader case definitions. As expected, the relatively low prevalence of the abuse diagnosis as a percentage of all patients in the dataset resulted in a low positive predictive value, depending on the chosen level of specificity. The positive predictive value was higher for the broader case definition, where cases were relatively more common. The model could detect high levels of risk of abuse farinadvanceofthefirstdiagnosisofabuserecordedin the system (fig 2). The model detected risk of abuse an averageof10-30monthsinadvance,dependingonthe chosen level of specificity. 1000 Early detection horizon (days) Narrower case definition Broader case definition 800 600 400 200 0 0.001 0.002 0.005 0.01 0.02 0.05 0.10 0.20 Model composition Examination of the internal parameters of the model showed interesting findings. Firstly, we examined the effects of frequency of visits. As described above, each range of average number of visits a year was assigned a partial risk score. Figure 3 shows that partial risk score riseswiththeaveragenumberofvisitsayear.Anincrease in the number of visits would therefore increase a patient’s overall abuse score. The effect seems slightly stronger(steeperslope)amongwomenthanamongmen. False alarm rate Fig 2 | Average time in days (with 95% confidence intervals) from initial detection of high risk of abuse to first diagnosis of abuse recorded in dataset, measured for both narrow and broad case definitions. Plot includes detected abuse cases only. Model detects risk an average of 10-30 months in advance of first recorded diagnosis, depending on desired levels of specificity (shown on log scale for clarity). At high levels of specificity, fewer cases are detected, resulting in larger confidence intervals page 4 of 9 BMJ | ONLINE FIRST | bmj.com
라틴 글꼴의 구성요소 ascender serifSphinx ascender line cap line x-line x-height baseline descender line descender
Sans-serif there Serif Slab serif there there As the name suggests, a font without overhanging serifs A serif font is one that has foot-like edges that extend beyond the bottom or edges of the characters A slab serif font is a category of serifed fonts characterized by serifs with angular transitions
Serif or sans-serif? Baskerville Garamond Georgia Helvetica Futura Impact Myriad Tahoma Rockwell Times New Roman
Leading은 무엇? Sphinx Luckily ascender line cap line x-line baseline descender line ascender line cap line leading (line spacing) x-line baseline descender line
/리딩/ 아니고 /레딩/ 줄과 줄 사이에 납(lead) 판을 끼움 © Urban Cottage Industries
Leading은 무엇? Sphinx Luckily ascender line cap line x-line baseline descender line ascender line cap line leading (line spacing) x-line baseline descender line
“12포인트” 글꼴은 얼마나 큰가? 1 포인트 = 1/72 인치 = 0.3528 mm Sphinx ascender line cap line x-line x-height baseline descender line 어디부터 어디까지가 0.3528 * 12 mm 이라는 거지?
투표용지 글자를 14포인트 굵은 글자로 쓰라고 되어있는데, 14포인트가 안 되는 것 같이 보여서 분쟁이 대법원까지 올라감.
point size = 1 em 옛날엔 대문자 “M” 높이 1 em Perpetua.py 이젠 글꼴마다 기준이 다름 Calisto.py 바로 이만큼이 포인트 사이즈! Sphinx ascender line cap line x-line x-height baseline descender line
line spacing 120%는 글꼴에 따라 (심지어 버전에 따라서도) 다르다. Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum. Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum. 그래서 Perpetua 20pt, 100% Calisto 20pt, 100% 글꼴의 포인트 사이즈는 대략 이 사이 쯤… (…) Sphinx ascender line cap line x-line x-height baseline descender line 어디부터 어디까지가 0.3528 * 12 mm 이라는 거지?
줄 간격은 얼마가 적당한가? Some typefaces, like Bernhard Modern, have shorter x-heights. This gives them the appearance of using less space. Some typefaces, like Bernhard Modern, have shorter x- heights. This gives them the appearance of using less space. On the other hand, typefaces like Helvetica Neue have taller x-heights that can make words appear more crowded. Typefaces with taller x- heights or longer ascenders and descenders often need more leading to make them more legible. This is Avenir. 모두 20pt, 120% ➔ 눈으로 보고 정한다. On the other hand, typefaces like Helvetica Neue have taller x-heights that can make words appear more crowded. Typefaces with taller x-heights or longer ascenders and descenders often need more leading to make them more legible. This is Garamond. Some typefaces, like Bernhard Modern, have shorter x-heights. This gives them the appearance of using less space. Some typefaces, like Bernhard Modern, have shorter x-heights. This gives them the appearance of using less space. On the other hand, typefaces like Helvetica Neue have taller x-heights that can make words appear more crowded. Typefaces with taller x-heights or longer ascenders and descenders often need more leading to make them more legible. This is Georgia. On the other hand, typefaces like Helvetica Neue have taller x-heights that can make words appear more crowded. Typefaces with taller x- heights or longer ascenders and descenders often need more leading to make them more legible. This is Helvetica Neue.
The quick brown fox jumps over the lazy dog. 대문자로만 쓰는 건 되도록 피한다
오르락 내리락 하는 모양이 있어야 잘 보인다. The quick brown fox jumps over the lazy dog. 대문자로만 쓰는 건 되도록 피한다
Kerning EGoatMan / imgur.com
커닝 (kerning) ADVANTAGE 커닝 끔 ADVANTAGE 커닝 켬
커닝 (kerning) Tree Three Lysine Dyson Gasoline Gawky
누가 봐도 커닝이 안 좋은 예: Batang Tree Three Lysine Dyson Gasoline Gawky
커닝 연습! https://type.method.ac/
내장 커닝 테이블이 안 좋은 글꼴 Arial Batang Comic Sans Gulim 고정폭 글꼴은 커닝이 불가능하므로 텍스트 읽기용으론 부적합 Monaco Courier
“디폴트”를 피해야 하는 이유 디폴트는 속도나 범용성에 더 최적화하는 경우가 많다. – Arial, Comic Sans, Times New Roman 디폴트는 너무 많이 쓰인다 (overused). 특히, 아무 신경도 안 쓰는 사람들이 많이 쓰기 때문에 은연 중에 비슷한 인상을 줄 수 있다. 디폴트가 전달하고자 하는 메시지에 꼭 최적이라는 보장이 없다. 좋은 글꼴이 정말 많다.
Arial의 좋은 대안들 Arial 1247RG Helvetica 1247RG Roboto 1247RG San Francisco 1247RG Inter 1247RG
Ligatures (재배포 불가능한 이미지)
Tracking 500 L U C Y I N T H E S K Y W I T H D I A M O N D S 100 LUCY IN THE SKY WITH DIAMONDS 0 LUCY IN THE SKY WITH DIAMONDS −50 LUCY IN THE SKY WITH DIAMONDS −100 LUCY IN THE SKY WITH DIAMONDS
Adobe Illustrator에서 조절하기 Kerning Tracking
Microsoft Word에서 조절하기 Tracking Kerning (디폴트로 꺼져 있는 경우도 많음!)
텍스트 정렬 Align left Align right Picture yourself in a boat on a river with tangerine trees and marmalade skies. Picture yourself in a boat on a river with tangerine trees and marmalade skies. Align center J ust i fy Picture yourself in a boat on a river with tangerine trees and marmalade skies. Picture yourself in a boat on a river with tangerine trees and marmalade skies.