Batch Primer Design

By Charles Comstock October 4th, 2006 Batch Primer Design

How does it turn, how does it work • batch-primer.pl workflow and logic • Preprocessing • Setup • Design • Finish • Remaining Problems • Conclusion

Preprocessing • Create a mispriming library • list of fasta files in /bio/extra • fasta files can be multi entry • fltr_candidates.pl filters for • minimum exon count • minimum exon length • minimum intron length • missing start/stop codon

Setup • Inputs: • source gtf • species, assembly, target chromosomes • mispriming library • intron_verified.txt batch-primer.pl setup filtered.gtf \ --intronver intron_verified.txt \ --mispriming ePCRdb.list \ --group_by 10

Setup – Initialization • assembly sequences • intron verified • rePCR database (from RTPCR) • mispriming library • db/self.fa – for self mispriming

Setup - transcripts • for each transcript in the input gtf create: • a directory for the transcript and a log file • gtf files • prediction.gtf – same as source gtf • local.gtf – local coordinates on + strand • intron verified info in verified field • spliced.gtf – same as local.gtf but including inferred exons • fasta files • unspliced.fa – raw genomic sequence • spliced.fa – cds sequence only

Design - Overview • Invocation • primer3 target selection • candidate targets • spans • scoring • Primer3 and Filtering

Design – Invocation • Runs on queue automatically after setup • Alternatively, run locally for debugging • Output on queue runs in tx/<txid>/design.log • Example invocation: batch-primer.pl design chr17.1.005.a

Design • Generate candidate targets • one for every splice site and start/stop in transcript • Generate spans for candidate targets • one for every subset of candidate targets with a combined length less then 500 bases with 3 flanking bases • Until at least one primer is designed or score drops below 0.90 design primers for best scoring span • Output successfully primer pairs

Candidate Targets • Fields: • range • verified – if both edges of target are verified • P(uncovered) – likelyhood not covered by a prior primer • score: • 1 point for utr/start/stop • 2 points for cds • Targets filtered by rePCR • success halves uncovered probability • 2 failures removes target

Candidate Span • Fields: • range • targets – list of targets in span • P(success) – likelyhood of success • Score: • P(success) *= 0.66 if range covers 1 utr/start/stop • P(success) *= 0.5 if range covers more utr/start/stop

Primer3 and filters • Using the best available span, design primers using primer3 • Primer3 tries size ranges 400-600, 300-399,200-299,100-199 • Filter the resulting primers for: • Prior coverage in this run of primer design • ePCR Mispriming – against self and library • Dust – low complexity filter • 4 G’s in primer pair • If filters remove all of initial 100 primers, rerun for 1000 primer pairs

Finish • Concatenates primers for each transcript into pp.list • Generates some statistics • Transcripts in input • Primer pairs designed • Transcripts missing primer pairs • Average primer pairs for successful transcripts batch-primer.pl finish

Remaining Problems • Bug on Negative Strand • Missing advanced configuration • Starts/stops and single exon design • Documentation beyond this presentation, the comments in the code, and source control log • Finding sufficient time to fix these problems

Questions? Conclusion

Batch Primer Design