1 / 18

Sandbox Learning : Try without error ? Prof. Dr.-Ing. C. Müller-Schloer Universität Hannover

Sandbox Learning : Try without error ? Prof. Dr.-Ing. C. Müller-Schloer Universität Hannover Institut für Systems Engineering – System- und Rechnerarchitektur Appelstraße 4 30159 Hannover cms@sra.uni-hannover.de +49 (0)511 762 19730

emmly
Download Presentation

Sandbox Learning : Try without error ? Prof. Dr.-Ing. C. Müller-Schloer Universität Hannover

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Sandbox Learning: Trywithouterror? Prof. Dr.-Ing. C. Müller-Schloer Universität Hannover Institut für Systems Engineering – System- und Rechnerarchitektur Appelstraße 4 30159 Hannover cms@sra.uni-hannover.de +49 (0)511 762 19730 based on jointworkwith Hartmut Schmeck, University of Karlsruhe, and Theo Ungerer, University of Augsburg Team: Jörg Hähner, Holger Prothmann, Fabian Rochner, Sven Tomforde

  2. Outline • Online learning and errors • A firstsolution • OrganicTrafficControl • OrganicNetworkControl • Open questions

  3. Learning Learning • Observation of the world, update of a world model • Acting in the world: Try & error • Reinforcement learning: Reward/penalty assigned to action  influencesfuturedecisions • But: Immediate real-world effects Nature • Collective level (genotype) • 4 bn. years • Huge populations • Redundancy (neglect of the individual) • Individual level (phenotype) • Modification of behavior or preferences based on experience (try) … • …as long as theindividualsurvives.

  4. Learning in technical systems Requirements • Immediate reaction (even if sub-optimal) • Guaranteed prevention of illegal actions (4-way green) • Adaptation and long-term improvement • How long is long-term? Learning speed!! Example • Learning traffic light controller • Genetic algorithm with selection based on real-world evaluation • # tries until reasonable solution: 1000 • Assessment time constant (traffic): 15 minutes •  999 unsuitabletries •  10 days

  5. 5 Generic 3-level architecture User Definition of system objectives objectives (LoS, …) Level 2 Layer 2 • Sandbox: Off-line parameter optimization • Evolutionary Algorithm (EA) • Simulation-based evaluation • Only legal parameter sets sent to level 1 Simulator EA Layer 1 • Immediate reaction • Observer: Situation classification • Selection from legal parameter sets • might be suboptimal level 2 Level 1 Observer LCS SuOC • Real world • Sensors • Actuators System under Observation/Control detector data actuator settings Productive system

  6. Example 1: Organic Traffic Control Goals • Network of adaptive learning traffic light controllers (TLCs). • TLCslearn with some limited sensory horizon. • TLCscooperate to achieve a global goal (e.g. reduced avg. travel time). • Explore possibilities/limitations of decentralized control systems. Phase 1 • Single, isolated junction Phase 2 • Collaborating TLCs • Progressive signals (GrüneWelle)

  7. 7 Traffic Control Architecture User Definition of system objectives objectives (LOS, …) Level 2 Layer 2 • Off-line parameter optimisation • Evolutionary Algorithm (EA) evolves TLC parameters • Simulation-based evaluation (AIMSUN) Simulator EA Layer 1 • On-line parameter selection • Observer monitors traffic • Learning Classifier System (LCS) selects TLC parameters and learns rule quality Level 1 Observer LCS SuOC • Control of traffic signals • Industry-standard TLC • Fixed-time • Traffic-responsive • Parameters determine performance System under Observation/Control detector data signal settings Traffic Light Controller (TLC)

  8. Hamburg

  9. OTC: Performance OTC performance during three consecutive days Manually designed reference

  10. Example 2: OrganicNetworkControl • OrganicControl of Data CommunicationNetworks • Controland management of networkprotocolclients in datacommunicationnetworks • Autonomouscontrolsystemforeachnetworkentity • Collaborationbetweenneighbourednetworkentities

  11. ONC: Motivation • Networkprotocolconfigurationisstatic • Goal: dynamicadaptation of networkprotocolparametersettings to changingenvironment • Client actswithin large computernetworks • Currentnetworkstatus has influence on theperformanceof thenetworkprotocol. • Computer isusedfor different taskssimultaneously • Currentusage of systemressourceshas influence ontheperformance of thenetworkprotocol.

  12. File ONC: BitTorrent • Currentfocus: BitTorrent1) • Trackerresponsibleformeeting of peers • Fairness-baseddistribution • Files aresplitintosmallerparts („chunks“) • Variable parameters(mostimportantones): • Delays • Intervals (Choking, …) • Number of peers(minimum,maximum, initiallyfromtracker, etc.) • Number of openconnections • Chunksize Chunk (1) „IncentivesBuildRobustness in BitTorrent“: Bram Cohen, Proc. 1st Workshop on Economics of Peer-to-Peer Systems, Berkeley 2003.

  13. objectives (download-rate, etc.) Level 1 Observer LCS ONC architecture • User interface • User defines system objectives • E.g. download-rate for BitTorrent or coverage-rate for MANETs Level 2 Simulator Observer • Level 2 • Extend behavioral repertoire of level 1 • Off-line learning (protocol parameter sets) EA • Level 1 • Adapt SuOC-parameters (rules) • On-line learning (rule fitness) SuOC • System under Observation and Control • Network protocol client • E.g. BitTorrent Client network data Network protocol Client protocol configuration

  14. Evaluation: Off-line (1) • Off-lineoptimisation: influence of number of peers

  15. ONC Evaluation : On-line (2) • Adaptation to backgroundclientusageprofile

  16. Open questions, future work (1/2) Incongruent model • Model adjustment Abstraction of non-local environment • Influence of neighboring nodes? Verification • Optimized parameter sets could be verified before implemented into layer 1 State-less behavior  Multi-step LCS • LCSs are stateless (stimulus – response) • Learning of action sequences? objectives Layer 2 Simulator EA Layer 1 Observer LCS Productive system

  17. Open questions, future work (2/2) • So far: Simulation of local neighborhood with assumptions about the behavior of other nodes. Communication between nodes • Level 1: Increase learning performance by exchange of learnt rule sets: Rule generalization? • Level 2: Exchange of populations  distributed EA Parallel “sandbox” world on layer 2 • Network-wide distributed simulation: Synchronization? Convergence? • Influence on real world? • Analogy from human society: social discourse Layer 2 Layer 2 Simulator Simulator EA EA Layer 1 Layer 1 Observer Observer LCS LCS Productive system Productive system

  18. Thankyouforyourattention!

More Related