270 likes | 536 Views
Dovecot M ail Storage. Timo Sirainen. Me: Timo Sirainen. Born 1979 in Finland First C64 BASIC programs around 1988 Open source coding since about 1998 Irssi IRC client 1999-2004, still widely used Worked as programmer since 1999 Went to university in 2006
E N D
Dovecot Mail Storage TimoSirainen
Me: TimoSirainen • Born 1979 in Finland • First C64 BASIC programs around 1988 • Open source coding since about 1998 • Irssi IRC client 1999-2004, still widely used • Worked as programmer since 1999 • Went to university in 2006 • Dovecot project started in 2002 • Working full time on it since about 2007 • 2009: Rackspace, USA • 2010: SAPO, Portugal
Dovecot • Open source IMAP/POP3 server • Only mail retrieval to clients, no mail sending • First version released in 2002 • Mostly written by me • Except Sieve by Stephan Bosch • High performance is an important goal • Disk I/O is typical bottleneck -> everything optimized to reduce it
Talk Overview • Traditional mailbox formats • Dovecot indexes • Dovecot mailbox formats • Full text search indexes • Future ideas
mbox • One file per mailbox • Metadata in headers that are filtered out • X-UID, Status, X-Status, X-Keywords, etc. • Deleting requires moving data around • Fragile: corruption if crashes in the middle • Slow when deleting old messages • May become fragmented with constant appends • But non-fragmented file is fast to read
Maildir • One file per message • Reading through all files can be slow • Message flags in filename (name:2,<flags>) • Lots of renaming • Finding the current filename can be difficult • Maildir is lockless? Not so much, Dovecot uses write/sync lock • Otherwise files can temporarily be lost during renames • Was the file really deleted or just renamed?
Dovecot Index Files • Main index • List of messages • Message flags • Offsets to cache records • Cache file • Message size, some headers, etc. • Keep only data that client actually uses • Different clients want different data for different amount of time
Dovecot Main Index • In two files: • dovecot.index: Somewhat recent snapshot • dovecot.index.log: Recent changes • All changes go through the log • Readers read snapshot to memory and apply latest changes from log • Once opened, only need to read log updates • Very efficient with remote filesystems (NFS, cluster FSes)! • Snapshot is updated “once in a while” • Tries to minimize disk I/O • Writes are usually more expensive than reads • Log also useful for finding “what changed” events for IMAP clients
Dovecot Cache • The main reason for Dovecot’s good performance • Different IMAP clients want different data • Caching data that client doesn’t use wastes disk space and disk I/O • Flexible format, allows adding any number of fields • Per-field caching decisions: “no”, “temporary”, “permanent” • Cached fields never change (IMAP guarantees) • Data is added without locking -> duplicate data is possible • Once in a while the file is recreated -> deleted and unwanted records are dropped
Locking • Lock waits are bad • Higher user visible latency • Timeout failures during high load • Dovecot v0.99 used traditional read/write index locks • Locking timeout problems • Redesigned v1.0 to do lockless reads
Lockless reads: rename() • For: • Small files • Rarely changing files • If a large part of the file changes • Writer • Lock • If file has changed, read+update internal state • Write the updated data to temp file • rename() over the original file • Unlock • Reader • Just read the file. #1 Temp file rename() #2
Lockless reads: Appends • For append-only files with “size” header in each written record • Writer • Lock • Write data with size=0 • Write size with each byte’s highest bit set to 1 • Unlock • Reader • Read one record at a time • Stop when seeing a size that isn’t fully written Size Data
Lockless writes in future? • open(path, O_APPEND) usually provides atomic writes • Except with NFS • write() may also return less bytes than intended? (signal, out of space) • read() during a write may see incomplete data?
Single-dbox • One file per message (u.<IMAP UID>) • Files have immutable metadata section • GUID, POP3 UIDL, received date, etc. • Advantages over Maildir: • Filenames don’t change • No IMAP UID <-> filename mapping required • Flags stored only in Dovecot index files • Automatically creates dovecot.index.backup once in a while • When fixing corruption, tries very hard to preserve flags based on (corrupted) index and backup files
Multi-dbox • Multiple messages in a single file (m.<id>) • File format same as with single-dbox • Multiple files in a single mailbox • Files are about 2 MB (configurable) • Larger files -> less fragmentation, but deletion slower • Preallocation • Can be rotated every n days (for incremental backups) • Delayed (ioniced) nightly deletions (“doveadm purge”) • Crash or power loss can’t corrupt or lose data • Tries very hard to preserve as much data as possible in case of (filesystem) corruption. • Saves a backup of the original broken file
Benchmarks • Realistic IMAP benchmarks are difficult to do • Depends on clients and user behavior
Benchmarks • Reading 10k messages via IMAP
Benchmarks: # NFS ops • Reading 10k messages via IMAP • Above: uncached, below: cached
Benchmarks: # NFS ops Random IMAP commands sent with: imaptest logout=5 msgs=1000 delete=10 expunge=10 secs=60 seed=1 L+A+G = lookup + access + getattr
Alternative Mail Storage • Users rarely access their old mails • Lower performance storage is cheaper -> Move old mails there • dbox supports “alternative path” setting: If u.* or m.* file isn’t found from primary path, it’s looked up from alternative path • Files could even be moved with /bin/mv • But easier/safer with “doveadmaltmove” • This would be difficult with Maildir because its filenames change
Detached Mail Attachments • MIME parts can be saved to external files • Only if they’re large enough (default: 128 kB) • Also can be filtered based on Content-Type, etc. headers • Avoid extra disk seek for downloading attachments that clients automatically display inline • Supports saving base64 encoded MIME parts decoded (25% less disk space) • Only if re-encoding can be done to 100% original • dbox-only • Metadata contains pointers to external parts • Saving is done via simplified “filesystem API”
Single Instance Storage • Storage’s internal deduplication • Could be enabled only for attachment storage • Dovecot’s SIS • FS API backend • Based on file hashes and hard links • Hash is configurable (e.g. SHA256 + size) • Byte-by-byte verification after hash found • Never, trust hash uniqueness (not implemented) • Immediate comparison during saving • Delayed (nightly) comparison and deduplication
Dovecot SIS • Attachments saved to “HA/SH/HASH-GUID” under global attachment dir (e.g. /var/attachments/) • GUID guarantees filename uniqueness • e.g. file with hash “123456” is saved to 12/34/123456-GUID • “HA” and “SH” may be symlinks to other mounts • SIS is done by hard linking HA/SH/hashes/HASH to HA/SH/HASH-GUID if it exists. • Basically: “lnhashes/123456 123456-guid” • No attempts to create cross-mount hard links • Safe to move/backup/restore attachment files • But hashes/HASH is auto-deleted only when its link count drops from 2 to 1. External changes may leak it.
Full Text Search Indexes • Dovecot has abstract FTS API • IMAP protocol says search is about “substring matching” (e.g. “ello” matches “hello”) • Almost no FTS engines support this • Few people seem to care about this anymore • Currently supported FTS backends: • Squat: Dovecot’s own indexer, supports substring matching. • Currently index updating is too inefficient • Apache Solr
FTS: Solr • Solr is a search engine server using Lucene • Dovecot talks to Solr via HTTP • Sharding via per-user fts_solr setting
Future • FS API used for indexes and dbox • Support for key-value databases • Asynchronous disk I/O