290 likes | 443 Views
Teuthology. Presented 2011-07-01 tommi.virtanen@dreamhost.com image credit: http://www.flickr.com/photos/peterblapps/3250800528/. Ceph as in Cephalopoda Mollusca Invertebrae. Teuthology Malacology. Not your grandmother's software stack. We tried Autotest.
E N D
Teuthology Presented 2011-07-01 tommi.virtanen@dreamhost.com image credit: http://www.flickr.com/photos/peterblapps/3250800528/
Ceph as in Cephalopoda Mollusca Invertebrae Teuthology Malacology
We tried Autotest ... and quickly discovered it's limitations Currently at 15 independent patches, 24 files changed, 575 insertions(+), 19 deletions(-) Realized Autotest's architecture is working against us. We still use it for it's packaged "client side" tests, but not its multi-machine features.
Python + Paramiko (SSH) + gevent = orchestra Real-time Interactive Central controller Full SSH protocol (channels!) Not Chef Not Fabric cluster = Cluster(...) cluster.run(...) cluster.only('x86').run(...) cluster.exclude('x86').run(...) http://github.com/tv42/orchestra Multi-machine control
Teuthology is a test runner Run tasks on targets as told to by roles. Automatically Setup Monitor health Run test(s) Archive results Archive logs, core dumps, etc Clean up http://github.com/tv42/teuthology Read the README
Run tasks on targets as told to by roles. targets: - ubuntu@sepiaXX.ceph.dreamhost.com - ubuntu@sepiaYY.ceph.dreamhost.com - ubuntu@sepiaZZ.ceph.dreamhost.com YAML format: lists, dicts, strings, numbers. You need to have SSH working, without passphrases. You need passphraseless sudo on the remote host.
Run tasks on targets as told to by roles. roles: - [mon.0, mds.0, osd.0] - [mon.1, osd.1] - [mon.2, client.0]
Run tasks on targets as told to by roles. targets: - ubuntu@sepiaXX... - ubuntu@sepiaYY... - ubuntu@sepiaZZ... roles: - [mon.0, mds.0, osd.0] - [mon.1, osd.1] - [mon.2, client.0]
Run tasks on targets as told to by roles. tasks: - ceph: - kclient: [client.0] - autotest: client.0: [dbench]
Interactive mode tasks: - interactive: INFO:teuthology.run_tasks:Running task interactive... Ceph test interactive mode, use ctx to interact with the cluster, press control-D to exit... >>> 1+1 2 >>>
Interactive mode >>> ctx.cluster.only('osd.0').run(args=['uptime']) INFO:orchestra.run.out: 13:05:38 up 42 days, 23:17, 0 users, load average: 0.12, 0.09, 0.07 [<orchestra.run.RemoteProcess object at 0x28bd110>] One RemoteProcess per command run.
Using just one Remote first >>> (remote,) = ctx.cluster.only('osd.0').remotes.keys() >>> proc = remote.run(args=['echo', '*']) INFO:orchestra.run.out:* >>> proc <orchestra.run.RemoteProcess ...> >>> proc.command "echo '*'" Shell quoting done for you. Works like ctx.cluster.run. Just one RemoteProcess, not a list.
Failing processes >>> remote.run(args=['bork']) INFO:orchestra.run.err:bash: bork: command not found ... CommandFailedError: Command failed with status 127: 'bork' >>> proc = remote.run(args=['bork'], ... check_status=False) INFO:orchestra.run.err:bash: bork: command not found >>> proc.exitstatus 127
Concurrency >>> proc = remote.run(args=['uptime'], wait=False) >>> proc <orchestra.run.RemoteProcess object at 0x28bd1d0> >>> proc.exitstatus <gevent.event.AsyncResult object at 0x28c2a10>
Concurrency >>> proc.exitstatus <gevent.event.AsyncResult object at 0x28c2a10> >>> import time; time.sleep(0) INFO:orchestra.run.out: 13:16:48 up 42 days, 23:28, 0 users, load average: 0.35, 0.15, 0.08 >>> proc.exitstatus <gevent.event.AsyncResult object at 0x28c2a10> >>> proc.exitstatus.get() 0
Capturing stdout/stderr >>> from orchestra import run >>> proc = remote.run(args=['uname', '-m'], ... wait=False, stdout=run.PIPE) >>> proc.exitstatus <gevent.event.AsyncResult object at 0x28c2dd0> >>> proc.exitstatus.ready() # just for debug False >>> proc.stdout.read() 'x86_64\n' >>> proc.exitstatus.get() 0
Deadlocks you must avoid: stdout vs stderr stdout/err vs stdin stdout/err vs exit
Using Cluster >>> processes = ctx.cluster.run( ... args=['uname', '-m'], ... wait=False, ... stdout=run.PIPE) >>> processes [<orchestra.run.RemoteProcess object at 0x28bdbf0>, <orchestra.run.RemoteProcess object at 0x28bdb90>, <orchestra.run.RemoteProcess object at 0x28bdad0>] >>> [p.stdout.read() for p in processes] ['x86_64\n', 'x86_64\n', 'x86_64\n'] >>> run.wait(processes) >>>
Controlling stdout/stderr logging Usually looks like teuthology.task.foo >>> import logging >>> log = logging.getLogger(__name__) >>> log.info('foo') INFO:__builtin__:foo >>> ctx.cluster.only('osd.0').run( ... args=['uptime'], ... logger=log.getChild('uptime')) INFO:__builtin__.uptime.out: 13:52:49 up 43 days, 4 min, 0 users, load average: 0.00, 0.01, 0.05 [<orchestra.run.RemoteProcess object at 0x28bdb90>] >>>
Tasks can be context managers tasks: - ceph: - kclient: ... - autotest: ... - interactive:
/tmp/cephtest Must not exist already, or target is dirty (see teuthology-nuke, later) Used by tasks to store things in Tasks are responsible for cleaning up after themselves (no toplevel rm -rf, to flush out the bugs) Anything in /tmp/cephtest/archive gets archived Please bzip2 -9 any big files your task leaves in archive
Cleanups & failures Clean up can fail, further cleanups are still attempted -> always study the first error, not the last one. If a task fails to clean up, the targets are left "dirty". teuthology-nuke is a Big Hammer.
Archived results 2011-06-21T10-00-44/ ├── ceph-sha1 ├── config.yaml ├── remote │ ├── ubuntu@sepia70.ceph.dreamhost.com │ │ ├── log │ │ │ ├── client.admin.log.bz2 │ │ │ ├── mds.0.log.bz2 │ │ │ ├── mon.0.log.bz2 │ │ │ └── osd.0.log.bz2 │ │ └── syslog │ │ ├── kern.log.bz2 │ │ └── misc.log.bz2 │ ├── ubuntu@sepia71.ceph.dreamhost.com ... │ └── ubuntu@sepia72.ceph.dreamhost.com │ ├── autotest │ │ └── ... │ ├── log ... │ └── syslog ... ├── summary.yaml └── teuthology.log
gitbuilder A low-key low-hype continuous integration tool Builds tags and heads of branches On bad build, tries older commits until finds green We have it building ceph and our kernel fork http://ceph.newdream.net/gitbuilder/ http://ceph.newdream.net/gitbuilder-i386/ http://ceph.newdream.net/gitbuilder-gcov-amd64/ http://ceph.newdream.net/gitbuilder-deb-amd64/ http://ceph.newdream.net/gitbuilder-kernel-amd64/
We made gitbuilder create tarballs http://ceph.newdream.net/gitbuilder/output/ref/origin_master/ Index of /output/ref/origin_master/mode links bytes last-changed name dr-x 2 4096 Jun 29 13:58 ./ dr-x 28 12288 Jun 29 15:16 ../ -r-- 1 149323650 Jun 29 13:58 ceph.x86_64.tgz -r-- 1 41 Jun 29 13:57 sha1 Don't trust the links, ProxyPass confuses the web server Fetch .../output/origin_master/sha1, then fetch .../output/sha1/SHA1_HERE/ceph.x86_64.tgz
Future and topics not covered teuthology-suite nightly runs machine allocation gcov flavors custom ceph builds installing custom kernels failure testing monitor health
Thank You Questions? tommi.virtanen@dreamhost.com