510 likes | 611 Views
Disease Informatics: Brush up the terms describing techniques and resources. R. P. Deolankar. Half knowledge is always dangerous. Wet lab. A laboratory allowing for hands-on scientific research and equipped with Appropriate plumbing Ventilation Equipment. High-throughput technology.
E N D
Disease Informatics: Brush up the terms describing techniques and resources R. P. Deolankar Half knowledge is always dangerous
Wet lab A laboratory allowing for hands-on scientific research and equipped with • Appropriate plumbing • Ventilation • Equipment
High-throughput technology • The technology handling high volume of data or material • Large-scale methods to purify, identify, and characterize DNA, RNA, proteins and other molecules. These methods are usually automated, allowing rapid analysis of very large numbers of samples.
Microarray • A tool used to sift through and analyze the information contained within a genome. A microarray consists of different nucleic acid probes that are chemically attached to a substrate, which can be a microchip, a glass slide or a microsphere-sized bead.
DNA microarray • A microarray of immobilized single-stranded DNA fragments of known nucleotide sequence that is used especially in the identification and sequencing of DNA samples and in the analysis of gene expression (as in a cell or tissue)
Protein microarray • Protein microarray is a piece of glass on which different molecules of protein have been affixed at separate locations in an ordered manner thus forming a microscopic array.
Mass spectrometry • An instrumental method for identifying the chemical constitution of a substance by means of the separation of gaseous ions according to their differing mass and charge -- called also mass spectroscopy • Mass spectrometry: A method used to determine the masses of atoms or molecules in which an electrical charge is placed on the molecule and the resulting ions are separated by their mass to charge
Tandem mass spectrometry • Multiple steps of mass spectrometry selection, with some form of fragmentation occurring in between the stages • Immunofluorescence and immunocytochemistry, ELISA, immunoblotting
Dry lab • A laboratory for making computer simulations or for data analysis especially by computers (as in bioinformatics)—called also dry laboratory
Gene prioritization • The results of experimental or computational analyses in the post-genomic era (e.g., those from microarrays, proteomics, ChIP-chip, genome-wide in silico searches, genetic linkages, etc.) often consist of long lists of candidate genes. There are methods that provide score to the gene and rank them. This process is known as gene prioritization.
PhenoGO • PhenoGO is a multiorganism database that provides phenotypic context, such as the cell type, disease, and tissue and organ to existing associations between gene products and Gene Ontology (GO) terms as specified in the Gene Ontology Annotations (GOA).
BioMedLEE • One existing Natural Language Processing (NLP) system, known as BioMedLEE, automatically extracts biological information consisting of bio-molecular substances and phenotypic data.
MeSH • Medical Subject Heading • MeSH is the National Library of Medicine's controlled vocabulary thesaurus. It consists of sets of terms naming descriptors in a hierarchical structure that permits searching at various levels of specificity.
PhenOS • Phenotype Organizer System, PhenOS is a system under development by the Lussier research group with purpose of bridging the gap between heterogeneous biomedical terminologies.
Inparanoid algorithm • The protein interaction networks of two species are aligned by assigning proteins to sequence homology clusters using the Inparanoid algorithm
POCUS • Prioritization of candidate genes using statistics • Reference: Turner FS, Clutterbuck DR, Semple CA. POCUS: mining genomic sequence annotation to predict disease genes. Genome Biol. 2003;4(11):R75.
OMIM • Mendelian Inheritance in Man • The Online Mendelian Inheritance in Man. A catalog of human genes and genetic disorders authored and edited by Dr. Victor A. McKusick and his colleagues at Johns Hopkins and elsewhere, and provided through NCBI. The database contains information on disease phenotypes and genes, including extensive descriptions, gene names, inheritance patterns, map locations and gene polymorphisms.
TOM • A web-based integrated approach for identification of candidate disease genes, Transcriptomics of OMIM • Reference: Rossi S, Masotti D, Nardini C, Bonora E, Romeo G, Macii E, Benini L, Volinia S. TOM: a web-based integrated approach for identification of candidate disease genes. Nucleic Acids Res. 2006 Jul 1;34
Data mining • Data mining (sometimes called data or knowledge discovery) is the process of analyzing data from different perspectives and summarizing it into useful information
Online Predicted Human Interactions Database or OPHID • Designed to be both a resource for the laboratory scientist to explore known and predicted protein-protein interactions, and to facilitate bioinformatics initiatives exploring protein interaction networks.
Single nucleotide polymorphisms (SNPs) • A single nucleotide polymorphism (SNP, pronounced snip), is a DNA sequence variation occurring when a single nucleotide - A, T, C, or G - in the genome (or other shared sequence) differs between members of a species (or between paired chromosomes in an individual).
Synonymous - nonsynonymous substitutions • Substitutions that result in amino acid replacements are said to be nonsynonymous while substitutions that do not cause an amino acid replacement (such as a GGG to GGC change - both codons still encode glycine) are said to be synonymous substitutions. Because of the difference in their effects on the physiology of the organism, synonymous and nonsynonymous substitutions can have quite different dynamics. For example, synonymous substitutions usually occur at a much faster rate than do nonsynonymous substitutions. Hence, for coding sequence it is often desirable to separate these two.
Ka/Ks values • In genetics, the Ka/Ks ratio or dN/dS ratio is the ratio of the rate of non-synonymous substitutions (Ka) to the rate of synonymous substitutions (Ks), which can be used as an indication of selection on a protein-coding gene.
dbSNP • db (Database) of Single nucleotide polymorphism • A public-domain archive for a broad collection of Single Nucleotide Polymorphisms (SNPs) and is hosted at the National Center for Biotechnology Information.
Orthodisease • OrthoDisease, a comprehensive database of model organism genes that are orthologous to human disease genes • Orthodisease is constructed primarily using Inparanoid analysis. Inparanoid is a program that automatically detects orthologs (or groups of orthologs) from 2 species
Field Biology • Biology of organisms living in their natural environments • Applications in Ecology and Evolutionary Biology
Epidemiology • Epidemiology is the study of how often disease occur in different groups of people and why • Planning and evaluating strategies to prevent illness • Guide to the management of patients in whom disease is already developed • Reference: Epidemiology for the uninitiated by Coggon, Rose and Barker
Population at risk • The population at risk is the group of people, healthy or sick, who would be counted as cases if they had the disease being studied • It defines the denominator for the calculation of rates of incidences and prevalence • It is the number of persons potentially capable of experiencing the event or outcome of interest
Floating numerator • Numerator floating without its denominator • Common error occurring in field investigations • The error occurs due to the number of cases not relating to the “at risk” population • Epidemiological conclusions (on risk) cannot be drawn from purely clinical data (on the number of sick people seen)
Target population • It is the population about which the conclusions are to be drawn • Sometimes measurement can be made on the full target population else study samples are used
Study population and study sample • The group of individuals in a study • In a clinical trial, the participants make up the study population • Study sample is chosen from study population
Aetiology • The study of the factors that predispose to or precipitate the disease • External agent, a susceptible host, and an environment that brings the host and agent together is a disease etiology triad
Surveillance • Watching over a population and recording data likely to have epidemiological significance, usually with the aim of early detection of disease. Essentially an interventionist exercise compared with monitoring, which is passive.
Case • Disease in populations exists as a continuum of severity rather than as an all or none phenomenon • The real question in population studies is not “has the person got the disease?” but “How much of the disease has he or she got?” • Diagnostic continuum is dichotomized into “cases” and “non-cases” on the basis of statistical, clinical, prognostic or operational options • Hence case definition should be precise and unambiguous. • Epidemiological case definitions are narrower and more rigid than clinical ones
Incidence • It is the rate at which new cases occur in a population during a specified period (number of new cases) / (Population at risk) * (Time during which cases were ascertained)
Prevalence Point prevalence • The proportion of a population that are cases at a point in time Period prevalence • The proportion of a population that are cases at any time within a stated period
Attributable risk and relative risk • Attributable risk is the disease rate in exposed persons to that in people who are unexposed • Relative risk is the ratio of the disease rate in exposed persons to that in people who are unexposed • Attributable risk = rate of disease in unexposed persons * (relative risk – 1)
Confounding • Causing confusion about causation due to 2 or more variables associated with the disease • Confounding may give rise to spurious associations when in fact there is no causal relation, or at other extreme, it may obscure the effects of a true cause
Bias • Bias is the deviation of inferences from the truth • Selection bias is the biased selection of individuals into the study • Information bias is the biased collection or biased analysis of the data • Motto of the epidemiologist could well be “dirty hands but a clean mind” (manus sordidae, mens pura)
Chance • A measure of how likely it is that some event will occur • Random, unpredictable influences on events • The association between the exposure and disease is considered to be “statistically significant” if the probability that the test statistic < 0.05
Sensitivity • The proportion of persons with the disease who are correctly identified by defined criteria • The proportion of persons with the disease who are correctly identified by a screening test • The ability of a system to detect epidemics and other changes in disease occurrence • A sensitive test detects high proportion of the true cases
Specificity • The proportion of persons without a disease who are correctly identified by a test • The number of true negative results divided by the total number of all those without the disease
Randomization • Randomization is used to obtain a similar allocation of individuals to each group, the groups are followed at the same time • Purpose of randomization: To obtain unbiased estimates of differences among treatment responses (means or effects) and to obtain an unbiased estimate of the random error variation in the experiment
Replication and Local control • Replication is the repetition of an experiment in order to test the validity of its conclusion • Local control is blocking or grouping to eliminate or to control the various sources of variation (error) • Replication and local control are necessary to achieve a reduction in the random variation among treatment effects in the experiment
Observational (non-experimental) studies • Person-level unit of observation 1. Longitudinal measurements a. Cohort samples b. Case control samples 2. Cross-sectional measurements • Aggregate level units of observation (ecological studies) • Reference: Epidemiology Kept Simple: An Introduction to Traditional and Modern Epidemiology; by B. Burt Gerstman
Personal-level vs. Aggregate-level • Personal level study on smoking might collect information on each person’s smoking habits, age and disease status • Aggregate level of study on smoking might collect information on each region’s per capita cigarette consumption, age distribution and disease rate
Longitudinal studies • Longitudinal studies are studies in which the sequence of events in individuals can be delineated over time • In cohort studies the incidence of disease in exposed and non-exposed groups are compared • In case-control studies people with disease (cases) and people without disease (controls) are sampled from the source population and exposure histories of cases and controls are compared
Longitudinal vs. Cross sectional studies • Longitudinal measurements relates exposures and diseases in individuals at various time references • Cross-sectional measurements are not definitively time sequenced in individuals • In cross-sectional studies the analysis of data is gathered from samples at one point in time. Since both the outcome and the variables are measured at the one time these studies are not strong at showing cause-effect relationships.
Experimental studies • In experimental studies, the investigator introduces or removes an exposure in order to observe its influence on a health outcome. Such allocations may be based on chance mechanism (randomized trials) or on other deliberate mechanisms built into the study’s protocol (non-randomized trials)
Other disease informatics lectures: Supercourse: Epidemiology, the Internet and Global Health Lecture numbers 31981, 30331, 28921, 25381, 25371, and 34011 Thank you