220 likes | 343 Views
Chipy Dan Griffin. Why Am I Here?. OpDemand. 1-click cloud deploys Dynamic configuration Automatic and customizable app monitoring Real time log feedback Complete audit trail Easy collaboration with other users EC2, Heroku and soon OpenStack !. Simple Cloud Management.
E N D
OpDemand • 1-click cloud deploys • Dynamic configuration • Automatic and customizable app monitoring • Real time log feedback • Complete audit trail • Easy collaboration with other users • EC2, Heroku and soon OpenStack! Simple Cloud Management
Two Reasons for Concurrency 1 • I want to use the time that I am spending waiting for IO or other events. Problems that are IO bound. 2 • I want to do a lot of work as fast as I can. Problems that are CPU bound and can be parallelized.
Event loops (Twisted, Asyncore, etc.) • Every significant bit of work slows everything down • Still constrained to 1 process • Library compatibility is terrible • Callback hell. d.addCallback(lambda _: self) • Inline deferreds are better a = yield db.find(id) Let the Operating System tell you when you have work to do. Usually based on select, poll, kqueue.
deferToThread • CouchDB-Python • d= threads.deferToThread( template_model.assemble, serv ) Use blocking libraries in twisted by deferring them to threads
Processes • The root of “real” concurrency for Python systems • Process per core + 1 to distribute work and collect results Fork - create a copy of current process and continue execution
The Celery Project • Parent process forks n workers • Relies on RabbitMQ and multiprocessing to handle concurrency Celery is a perfect example
Threads • Shared memory • Mutation with locks (hopefully) • Everyone knows about the GIL • Still useful in Python
A Quick Clojure Detour • Software Transactional Memory - SQL like transactions for modifying data from different threads • Embracing mutation of shared data • Everything is based on Threads, you can dosync, send, promise and deliver • Mostly immutable BUT you can change refs with ref-set inside transactions
Why Does Erlang Exist? • Wraps all the concepts into 1 heavy duty package • Pins schedulers to different cores • Uses thread pools • Has transparent inter-process/server communication • Makes use of OS event loops • You would never want to write many common tasks in it
How OpDemand Works Client Node Proxy Twisted Twisted Twisted RabbitMQ Celery Monitor
What does Node do? • Reference SocketIO Implementation • Take service updates and log output from ZMQ and re-publish over SocketIO • Serve static content • Round robin HTTP requests between reactors • Replace with Python or Nginx soon hopefully
Explicitly Saving, Implicitly Publishing • d = defer.Deferred() • d.addCallback(self.transition_state, core_fsm.DEPLOYING) • d.addCallback(self._set_status_detail, 'deploy in progress') • d.addCallback(self._save_obj, **kwargs) • d.addCallback(self._start_interval, context, 'deploy') • d.addCallback(self._deploy, context, **kwargs) • d.addCallback(self._set_time, 'deploy') • d.addCallback(self._set_interval, context, 'deploy') • d.addCallback(self.transition_state, core_fsm.ACTIVE) • d.addCallback(self._set_status_detail, 'deploy operation successful') • d.addCallback(self._save_obj, **kwargs)
Real-time Publishing • defsave_obj(self, this, ctx, **kwargs): • # here is where we save to couch • saved_obj= self.db.save(this) • if ctx and "service" in ctx: • if settings.ZMQ_PUBLISHER: • tag = 'service-%s' % ctx["service"]["_id"] • settings.ZMQ_PUBLISHER.publish( • view.to_json(saved_obj), tag=str(tag)) # Publish documents over ZMQ when they are saved
Wrapping Celery in Twisted • A "polling" deferred using twisted.internet.task • def _do_poll(): • if celery_task.ready(): • raise StopIteration • task = cooperate(_do_poll()) • return task.whenDone() Essentially launch Celery tasks and poll for completion
A Common Interface for Celery Tasks • # Celery Task Definition • @aws_celery.task • def refresh(comp, config, creds): • doctype = comp.get("doctype") • if doctype == "server": • i = Instance() • return i.refresh(comp, config) Celery transforms a component and it’s configuration
Returning the Finished Product • # AWS Instance Code • def refresh(self, comp, config, **kwargs): • boto = self.get_boto(comp, config) • comp, config = self.sync(comp, config, boto) • return comp, config The Provider code returns the new Comp and Config
Why bother with Celery • Code from the first AWS provider using Twisted • # this is one path through this • d = threads.deferToThread(self.conn.get_all_images, [dc['image_id']]) • d.addErrback(self._handle_error) • d.addCallback(self.__get_image) • d.addCallback(self.__create_reservation, • self.__prepare_kwargs(context, kwargs, resolved)) • d.addCallback(self.__construct_instances, context, resolved) • d.addCallback(self.__sync_instances, context) • d.addCallback(self._save_obj, **kwargs) • d.addCallback(self._poll_state, context, 'running', **kwargs) • if 'elastic_ip' in dc and dc['elastic_ip'] is not None: • d.addCallback(self.__associate_address, context) • d.addCallback(self._save_obj, **kwargs) • d.addCallback(self.__poll_address, context, **kwargs) • d.addCallback(self._save_obj, **kwargs) • if not context.config.get("server/instance_id"): • d.addCallback(self._poll_signal, context, 22, **kwargs) • # transition the server to built state so it gets destroyed • # I cut like 20 more lines of code
Using Celery • Much better Image_id= self._get_image_id(config) images = conn.get_all_images([image_id]) if len(images) != 1: raise LookupError('Could not find AMI: %s' % image_id) image = images[0] kwargs = self._prepare_run_kwargs(config) reservation = image.run(**kwargs) instances = reservation.instances boto = instances[0] config['ec2-instance/id'] = boto.id config['ec2-instance/region_name'] = boto.region.name config['ec2-instance/zone_name'] = boto._placement.zone return comp, config
Using Pika mq.create_async_subscriber("c2-service", "service", handle_service_updates) defcreate_async_subscriber(exchange, queue, callback, amqtype="topic"): tw = TwistedHandler(exchange, queue, callback, amqtype=amqtype) connection = TwistedConnection(pika.ConnectionParameters( host=settings.RABBITMQ_HOST, port=settings.RABBITMQ_PORT, virtual_host=settings.RABBITMQ_VHOST), tw.on_connected) return tw • Modified from Pika repository (maybe HEAD works now?) Subscribe with a Twisted handler