先端論文紹介ゼミ

先端論文紹介ゼミ Ｂ４　鶴崎徹也

使用論文 An approach to the pursuit problem on a heterogeneous multiagent system using reinforcement learning 強化学習を使用して、異種マルチエージェントシステム上で追跡問題へのアプローチ

Abstract1 • 本論文では、共通の目的をもつマルチエージェントシステムのエージェント間の協力のために、４つのハンターがターゲットを捕獲する追跡問題を行う。 • 強化学習アルゴリズムはＱ学習を使用する。

Abstract2 • 最初は均一なハンターが学習を重ねてそれぞれ異なる行動をするハンターになる。シミュレーションは連続行動状態空間で行われ、ハンターの初期位置とハンターとターゲットの速度の結果による依存度により議論する。

Introduction • 組織的な行動をとるために以下のことが関係あるのか注意する（１）ターゲットの移動ルール（２）エージェント間の違い、特に速さ（３）各エージェントの初期位置

Problem domain1 • 環境は、壁に囲まれた連続格子環境 • エージェントが5体、そのうちハンターが４体、ターゲットが1体、ハンターのうち1つがターゲットを捕獲すると終了

Problem domain2 • ハンターエージェントの集合A={Ai|i=1,2,…,N} • ハンターエージェントAiは6つの要素 <Si,Ii,Oi,Pi,Fi,Gi> (1) （１）内部状態SはS={θ,dφ,dφ,xt,xc,v} （２）入力IはI={dφ,dφ} （３）出力OはO={a}

Problem domain3 Fig. 1. The action space of (a) each hunter agent and (b) target agent. The line between the hunter agent and the target agent shows the relative central axis, whose angle is 0.

Problem domain4 （４）予測関数P状態遷移関数F出力関数G F:S×I→S, G:S→O(2) F′:S×I→O (F′=F×G′) (3) G′:S→O (G′=G) (4) P:S(θ)×I→S′(θ) (5) F′:S×P×I→O (6)

The method1 • ハンターはターゲットを捕獲するために2種類の予測をして行動を学習し、自分自身の行動を決定ルールに基づいて決定する。詳細はFig.2

The method2 Fig. 2. The heterogeneous multiagent architecture

ターゲットの移動ルール • ターゲットは、ルールをもとに8方向のいずれかへ移動することができる • ターゲットは、他のハンターがいる位置へ移動することができない

各ハンターの移動ルール • 各ハンターは、ターゲットを基準に(Fig.1a)８方向に移動することができる • 各ハンターは、他のエージェントがいる場所に移動できない • 各ハンターは、視界を制限されている • 各ハンターは、他のハンターを見ることができない。しかし、組織の制約に基づいて通信することができる • 各ハンターは視界内の場合、ターゲットを見つけたり、他のハンターとの通信に基づいてターゲットの方向性を予測する(Fig.3a,3b)

The method3 Fig. 3. (a) The state space of each hunter agent when the target lies within the range of its restricted field of view. (b) The prediction of the location of the target agent and other hunter agent which found the target agent, rφ is the observation range and r the communication range.

The method4 • 終了状態はターゲットが許容方法で移動できない場合に少なくとも２つのハンターがターゲットに隣接しているとき • Fig.4はすべてのエージェントが衝突なしで移動することができないときを示している Fig. 4. Examples of capturing states. (a) Two hunter agents capture the target agent in the corner. (b) Three hunter agents capture the target agent at the wall. (c) and (d) Captures in the open with three and four hunters, respectively.

各ハンターについて1 • 各ハンターは制限された視界範囲内にターゲットがいると、自分とターゲットの間の相対角度を観察することができる • 他のハンターとの通信を通して、ターゲットと他のハンターの重心の相対角度の情報を得ることができる • 他のハンターがターゲットを観察しているときハンターは自分が観察できなくてもターゲットの方向を見積もることができる • ハンターは自分がターゲットを見ることができないとき、他のエージェントはそのことを知っている

各ハンターについて2 • ターゲットを見つけたハンターはターゲットと他のハンターの重心との角度θselfを送ることができる(θself in Fig.3b) • 通信を通じてターゲットを発見できない各ハンターは自分とハンター間の角度θself′を計算する(θself′ in Fig.3b) • ハンターの推定位置と角度θself′に基づいて、各エージェントはターゲットの位置xtを予測する

The method5 (7) (8) (9) (10) (11) (12) (13)

Experiments1 • 問題領域は２００×２００、各エージェントの形は半径４の円。ハンターが可能な初期位置は以下の通り Fig. 7. Types of motion rules. Target agent moves (a) in the opposite direction of a hunter agent (left-hand side) or toward the largest space in which there is no hunter agent (right-hand side), (b) on a straight line, or (c) on a circle.

Experiments2 • ハンターの速度Δhとターゲットの速度Δtの関係はΔh>Δt,Δh=Δt ,Δh<Δtの３つ • ターゲットの動きの可能性のルールはFig.7 Fig. 7. Types of motion rules. Target agent moves (a) in the opposite direction of a hunter agent (left-hand side) or toward the largest space in which there is no hunter agent (right-hand side), (b) on a straight line, or (c) on a circle.

Simulation results and discussion1 Fig. 8. Target agent moves away from hunters: (a) hunter agent speed>target agent speed, (b) hunter agent speed=target agent speedand (c) hunter agent speed<target agent speed. Fig. 9. Target agent moves on a straight line: (a) hunter agent speed>target agent speed, (b) hunter agent speed=target agent speed, (c) hunter agent speed<target agent speed.

Simulation results and discussion2 Fig. 10. Target agent moves on a circle: (a) hunter agent speed>target agent speed, (b) hunter agent speed=target agent speedand (c) hunter agent speed<target agent speed. Fig. 11. Target agent moves randomly: hunter agent speed>target agent speed, hunter agent speed=target agent speed and hunter agent speed<target agent speed.

Simulation results and discussion3 • ターゲットの最後の場所の分布はハンターの初期位置と比較することができる。ハンターがターゲットより速いか同じ速度の時は違いがなく、遅い場合は違いがある • ハンターが出発地点で視界内にターゲットがいると、速いときと同じ速度のときは見失うことはないが、遅いときは見失いターゲットを観察することができず、しばらく近づくことができない

Simulation results and discussion4 • 各ハンターの予測方向の確率 • 矢印は各エージェントの確率が高い方向 Fig. 12. Probability of prediction direction of each hunter agent. Each circle graph shows the probability of prediction of each corner’s agent, and arrow shows the high probability direction of each agent.

Simulation results and discussion5 • ハンターとターゲットの軌跡 Fig. 13. The trajectories of hunter agents and target agent when the target is captured which is filled in. Each hunter agent is numbered with ID.

Simulation results and discussion6 Fig. 14. An example of the trajectory of the agents. Hunter agents are faster than target; target moves away from hunter agents; hunter agents are initially located in the corners.

Conclusion • シミュレーション結果によると、ハンターがターゲットより速いときハンターの初期位置による結果の依存性はない。創発的組織の性質は、ハンターとターゲット間の関係に依存する • 均一な能力から異種の能力への移行は、待ち伏せエージェント、休憩エージェントなどが発生した。

ご清聴ありがとうございました

先端論文紹介ゼミ

先端論文紹介ゼミ

Presentation Transcript