Postgres and the Genome

Postgres and the Genome Jeff Pennington Director, Translational Informatics Center for Biomedical Informatics And Department of Pathology The Children’s Hospital Of Philadelphia

Outline • Background • Genome analysis in the clinic • Application • Database • DB Tuning

DNA as Data • 4 letter ‘alphabet’ of bases – A T C G • 3,000,000,000 base pairs • Sequence codes for biological function

Mutations

Clinical Mutation = ‘Variant’

Sequencing = 100K – 4M Variants

VARIFY

VARIFY Architecture • Varify Architecture • Three-tier web application • Harvest (http://harvest.research.chop.edu) • Javascript client • Python server using Django ORM • Postgres 9.2

Database • Physical – 9.2, RHEL VM, VMWarew/ storage on host • Round 1 – 4G RAM, 80G disk • Round 2 – 32 G RAM, 250G disk

Tuning • max_connections – too big, • shared_buffers – amount of memory allocated to PG • work_mem – amount of memory available to sort • default_statistics_target – gives the query planner something to work with

Resources • Book: PostgreSQL9.0 High Performance • Ch 5 and 6 • Page 145 • Tools: pg_buffercache • Benchmarking: • \timing • EXPLAIN • log_min_duration_statement = 5000

Tuning Round 1 (4G RAM) • max_connections = 100 • shared_buffers = 1024MB (default 32MB) • work_mem = 200MB (default 1M) • Tried 1G, bad trade-off on count (slow) vs. list (not much faster)

Tuning Round 2 (32G RAM) • max_connections = 100 • shared_buffers = 24576MB (Increased from 1024MB) • work_mem = 150MB (Decreased from 200MB)

Tuning Round 3 • Everything in Round 2 • default_statistics_target = 1000 (default 100)

Postgres and the Genome

Postgres and the Genome

Presentation Transcript

Finding and Reporting Postgres Bug # 8291

Finding and Reporting Postgres Bug #7553

Postgres Tips and Tricks

The Human Genome

Writing Basic Postgres Functions

Sequence Comparison and Genome Alignment in the Human Genome

Genome Sequencing and genome viewers

Finding and Reporting Postgres Bug #8257

THE HUMAN GENOME

SNPs and the Human Genome

DNA and the Genome

The GENOME

Genome Trees and the Nature of Genome Evolution

PostGres SQL

THE GENOME

The genome

The Human Genome

Genome Management and the Nucleosome

The Genome

PostGres SQL