1 / 26

Dynamic Single Machine Scheduling Using Q-Learning

Dynamic Single Machine Scheduling Using Q-Learning. Outline. Description of Problem(Dynamic Single Machine Sheduling Problem) The work of Q-learning Action&State Simulated annealing-based Q-learing. Description of Problem. A finite set of n jobs

jkitts
Download Presentation

Dynamic Single Machine Scheduling Using Q-Learning

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Dynamic Single Machine Scheduling Using Q-Learning

  2. Outline • Description of Problem(Dynamic Single Machine Sheduling Problem) • The work of Q-learning • Action&State • Simulated annealing-based Q-learing

  3. Description of Problem • A finite set of n jobs • Each job consists of a chain of operations • A finite set of m machines • Each machine can handle at most one operation at a time • Each operation needs to be processed during an uninterrupted period of a given length on a given machine • Purpose is to find a schedule, that is, an allocation of the operations to time intervals to machines, that has minimal length

  4. Other Machines Other Machines Bottleneck Machine Single Machine Scheduling

  5. Description of Problem(cont.) • Dispatching Rules • Essentially “selecting” the next job from the queue in front of a machine based on a rule. • Common rules: • SPT – shortest processing time • EDD – earliest due date • FCFS – first come, first served • LTWK – least total work • LWKR – least work remaining • WINQ – work in next queue • Probably hundreds of dispatching rules

  6. Description of Problem(cont.) • No single rule has been found to perform well for all system object. • So, we need an intelligent Agent-based scheduling system. • And the Agent’s function is using the new Q-learning technique to select a dispatching rule for the machine.

  7. Outline • Description of Problem(Dynamic Single Machine Sheduling Problem) • The work of Q-learning • Action&State • Simulated annealing-based Q-learing

  8. The work of Q-learning • Reinforcement Learning problem • Direct utility estimation • Adaptive dynamic programming(ADP) • Temporal-difference(TD)

  9. The work of Q-learning(Cont.) • Q-learning (Q(a, s)) • A Q-learning learns a Q-function, giving the expected utility of taking a given action in a given state. • Calculate whenever action a is executed in state s leading to state s’ • It can compare the values of its available choice without needing to know their outcomes. • U(s)=max Q(s,a) • More details Chapter 21.

  10. Outline • Description of Problem(Dynamic Single Machine Sheduling Problem) • The work of Q-learning • Action&State • Simulated annealing-based Q-learning

  11. Action&State(cont.) • Action A={a1, a2, a3} • a1: SPT – shortest processing time • a2: EDD – earliest due date • a2: RR– round-robin • Environment’s states(Buffer) • we define different descriptions of the states according to different system objectives. • AST:average slack of the jobs waiting in buffer. • MST:the maximum value of slacks of those jobs waiting in the buffer. • PTJ:Number of the tardy jobs of the time.

  12. Action&State(cont.) • AST(S1) • ct:the current time Agent making decision. • ddi: job I’s due date • epti : job I’s expected processing time • n: the number of the jobs waiting in buffer at that time

  13. Action&State(cont.) • Reward function • The tardiness of the job(TT) • TT= ddi – fti(Theactual finishing time of job i)

  14. Action&State(cont.) • MST(S2)

  15. Action&State(cont.) • Reward function

  16. Action&State(cont.)

  17. Action&State(cont.) • Reward function

  18. Outline • Description of Problem(Dynamic Single Machine Sheduling Problem) • The work of Q-learning • Action&State • Simulated annealing-based Q-learning

  19. Simulated annealing-based Q-learning

  20. Action&State(cont.)

  21. Simulated annealing-based Q-learning(cont.) • where, ti is the job i’s arrival time, k is the coefficient of tightness which represents the pressure of job’s due date.

  22. Simulated annealing-based Q-learning(cont.)

  23. Simulated annealing-based Q-learning(cont.)

  24. Simulated annealing-based Q-learning(cont.)

  25. Conclusion • How to set Q-learning parameters? • If the state’s parameters change, what will happen? • Is Using another dispatching rules better?

More Related