260 likes | 276 Views
Dynamic Single Machine Scheduling Using Q-Learning. Outline. Description of Problem(Dynamic Single Machine Sheduling Problem) The work of Q-learning Action&State Simulated annealing-based Q-learing. Description of Problem. A finite set of n jobs
E N D
Outline • Description of Problem(Dynamic Single Machine Sheduling Problem) • The work of Q-learning • Action&State • Simulated annealing-based Q-learing
Description of Problem • A finite set of n jobs • Each job consists of a chain of operations • A finite set of m machines • Each machine can handle at most one operation at a time • Each operation needs to be processed during an uninterrupted period of a given length on a given machine • Purpose is to find a schedule, that is, an allocation of the operations to time intervals to machines, that has minimal length
Other Machines Other Machines Bottleneck Machine Single Machine Scheduling
Description of Problem(cont.) • Dispatching Rules • Essentially “selecting” the next job from the queue in front of a machine based on a rule. • Common rules: • SPT – shortest processing time • EDD – earliest due date • FCFS – first come, first served • LTWK – least total work • LWKR – least work remaining • WINQ – work in next queue • Probably hundreds of dispatching rules
Description of Problem(cont.) • No single rule has been found to perform well for all system object. • So, we need an intelligent Agent-based scheduling system. • And the Agent’s function is using the new Q-learning technique to select a dispatching rule for the machine.
Outline • Description of Problem(Dynamic Single Machine Sheduling Problem) • The work of Q-learning • Action&State • Simulated annealing-based Q-learing
The work of Q-learning • Reinforcement Learning problem • Direct utility estimation • Adaptive dynamic programming(ADP) • Temporal-difference(TD)
The work of Q-learning(Cont.) • Q-learning (Q(a, s)) • A Q-learning learns a Q-function, giving the expected utility of taking a given action in a given state. • Calculate whenever action a is executed in state s leading to state s’ • It can compare the values of its available choice without needing to know their outcomes. • U(s)=max Q(s,a) • More details Chapter 21.
Outline • Description of Problem(Dynamic Single Machine Sheduling Problem) • The work of Q-learning • Action&State • Simulated annealing-based Q-learning
Action&State(cont.) • Action A={a1, a2, a3} • a1: SPT – shortest processing time • a2: EDD – earliest due date • a2: RR– round-robin • Environment’s states(Buffer) • we define different descriptions of the states according to different system objectives. • AST:average slack of the jobs waiting in buffer. • MST:the maximum value of slacks of those jobs waiting in the buffer. • PTJ:Number of the tardy jobs of the time.
Action&State(cont.) • AST(S1) • ct:the current time Agent making decision. • ddi: job I’s due date • epti : job I’s expected processing time • n: the number of the jobs waiting in buffer at that time
Action&State(cont.) • Reward function • The tardiness of the job(TT) • TT= ddi – fti(Theactual finishing time of job i)
Action&State(cont.) • MST(S2)
Action&State(cont.) • Reward function
Action&State(cont.) • Reward function
Outline • Description of Problem(Dynamic Single Machine Sheduling Problem) • The work of Q-learning • Action&State • Simulated annealing-based Q-learning
Simulated annealing-based Q-learning(cont.) • where, ti is the job i’s arrival time, k is the coefficient of tightness which represents the pressure of job’s due date.
Conclusion • How to set Q-learning parameters? • If the state’s parameters change, what will happen? • Is Using another dispatching rules better?