1.16k likes | 1.18k Views
Develop an AI agent for the game of bridge using machine learning techniques to enhance gameplay decision-making. Implement a naive treatment of digital sentience in a stochastic environment to improve agent performance over time.
E N D
Computer Systems LabTJHSSTCurrent Projects 2004-2005Third Period
Current Projects, 3rd Period • Robert Brady: A Naive Treatment of Digital Sentience in a Stochastic Game • Blake Bryce Bredehoft: Robot World: A Evolution Simulation • Michael Feinberg: Computer Vision: Edge DetectionsVertical diff., Roberts, Sobels 2
Current Projects, 3rd Period • Scott Hyndman: Agent Modeling and Optimization of a Traffic Signal • Greg Maslov: Machine Intelligence Walking Robot • Eugene Mesh: • Thomas Mildorf: A Self-Propagating Continuous Differential System as Limiting Discrete Numerical Construct and Device of Sorting 3
Current Projects, 3rd Period • Carey Russell: Graphical Modeling of Atmospheric Change • Matthew Thompson: Genetic Algorithms and Music • Justin Winkler: Modeling a Saturnian Moon 4
Developing a Learning AgentThe goal of this project was to create a learning agent for the game of bridge. I think my current agent, which knows the rules, plays legally, and finds some basic good plays, is a step in the right direction. This agent could and will be improved upon over the course of the year and will become smarter and learn faster throughout the year 5
Abstract My techlab project deals with the field of Artificial Intelligence or more specifically, Machine Learning. I am designing an agent/environment for the card game of bridge. After it learns the rules, I will run simulations where it decides on its own what the best play is. The level of play for the agent will increase as the year continues because it will look up past decisions in its history to determine what the best bid or play is in a current state of the environment. A Naive Treatment of Digital Sentience in a Stochastic GameRobert Brady
Background Machine learning has been researched in the past and has dealt with bridge before. This area is new, though, and anything from intelligent agents for games to the traveling salesman problem count as part of it. An algorithm used for one problem can be applied in a similar manner to another such as the minimax algorithm or the backtracking search. To build on current work, there would have to be some sort of improvement on current bridge-playing agents such as Bridge Baron or GIB. Both of these programs play at a moderate level, but none of them can compare to an expert player. The reason why an intelligent bridge-playing agent has been hard to program in the past is that bridge is a partially observable environment. A Naive Treatment of Digital Sentience in a Stochastic GameRobert Brady
In games such as chess, or checkers, the agent could conceivably come up with the best solution (given enough time to think about it) because it knows where everything is. In bridge, there are certain cards that haven't been played yet and although you may be able to guess where they are, you can not determine this with 100% certainty. This makes programming a learning agent for a partially observable environment much harder. Progress For the first semester, I worked on this program with regards to finishing programming in the rules to the game and some simple AI commands. A Naive Treatment of Digital Sentience in a Stochastic GameRobert Brady
When this was completed about a week before the semester was over, I began researching different AI algorithms that could be implemented for searching. This research halted once I realized the tree that would be searched was difficult to construct. I consulted my professional contact Fred Gitleman and he had also encountered this problem when programming a similar search algorithm. He talked me through the problems I had and gave some advice on where to find information that would help me with those problems. During the third quarter, I worked solely on running an algorithm (the minimax algorithm) through a depth-first search with pruning of bad nodes. A Naive Treatment of Digital Sentience in a Stochastic GameRobert Brady
The algorithm still has a few problems and will hopefully be finished soon. Another portion of the code that I added this quarter deals with the machine learning part of my pro ject. This part of the pro ject stores information from the hand that the computer just played in a file that is essentially the computer's "brain." The brain stores information about how many tricks were taken with the combined hands in a trump suit or in no-trump. It uses this information for the bidding stage of the following hands. If the numbers it reads from the file are much lower than what it believes the current state of the environment is, it will bid higher and if the numbers are higher, it will try to refrain from bidding. A Naive Treatment of Digital Sentience in a Stochastic GameRobert Brady
My future plans include testing of the program against other players at the school. I will use students from my tech-lab class for a preliminary test and then after the program has established a competitive nature with these kids, it will play against the bridge club on Fridays during school. I hope it will be able to compete with the students of the club at some point in the next quarter, but if this is an unrealistic goal, I will just try and improve upon it's algorithm as much as I possibly can before the end of the year. As it is currently, the program needs a little more work on the algorithm to make it fully operational before I open it up to tests from fellow students. Co de This section is pretty much self explanatory. Some important sections are in bold and commented while the less important parts have been left out. A Naive Treatment of Digital Sentience in a Stochastic GameRobert Brady
References 1. Fred Gitelman (fred@bridgebase.com) A programmer who also is an expert bridge player. He advised me on how to look through a tree to find the solution comparable to a minimax search. 2. Russell, S. and Norvig, P. Artificial Intelligence A Modern Approach Seco Prentice Hall, NJ. 2003. A Naive Treatment of Digital Sentience in a Stochastic GameRobert Brady
Robot SwarmsMy project is an agent based simulation, posing robots in a “game of life”, with each new generation of robot comes new genes using a random number selection process creating the mutations and evolutions that in real life we experience for DNAcross over and such. 13
Abstract: My project is an agent based simulation, posing robots in a "game of life", with each new generation of robot comes new genes using a random number selection process creating the mutations and evolutions that in real life we experience for DNA cross over and such. There are two versions of my program a Simulation and a Game. Robot World: A Evolution SimulationBlake Bryce Bredehoft
The Simulation: As stated in my abstract. The Process: The base of my program was not the genetics, but the graphics itself. First one robot, then a random Artificial Intelligence for it. I then modified the world to sustain several robots, with dying a breeding. After introducing two more Artificial Intelligences a "group" Artificial Intelligence and a "battery" Artificial Intelligence promoting grouping and collecting batteries selectively. I then tweaked the environment and code until it could sustain life. I then programed in several possible places for environmental interaction, like viruses and the batteries. I added the graphical output for easier analysis. Finally I created the random number selection process to splice the genes of the parents and create a child. The heart of the program. Robot World: A Evolution SimulationBlake Bryce Bredehoft
The Game: There is the above stated "simulation" version of my program, and a later created "game" version. The game version includes all the same components as the simulation but also has a user controlled robot with "laser eyes" and "grenade launchers" use to kill the other robots. It also includes "Bosses".The Process: Taking the simulation version and modifying it was easy, first removing the natural deaths and graph. Then I introduced and tweaked the user controls and the user controlled robot. Then adding lasers and grenades and all the necessary coding, i finally added a status display embedded window in the top left corner. I continually add new pieces of flare to the program, such as bosses. Robot World: A Evolution SimulationBlake Bryce Bredehoft
1 Introduction My project has several basic components: 3D modeling, Artificial Intelligence, The selection process, and then basic game theory. All these components form the amalgam that makes up my project. Both the simulation and the game use all these components except the simulation doesn't have any game theory. Robot World: A Evolution SimulationBlake Bryce Bredehoft
2 Background 2.1 Monte Carlo Simulation Since Monte Carlo is a Simulation technique, let's first define exactly what we mean by Simulation. A true Simulation will merely describe a system, not optimize it! (However, it should be noted that a true simulation may be modified in a manner such that it can be used to significantly enhance the efficiency of a system.) Therefore, our primary goal in Simulation is to build an experimental model that will accurately and precisely describe the real system. However, the breadth and extent of Simulation models is extensive! Robot World: A Evolution SimulationBlake Bryce Bredehoft
This can be illustrated by considering the three general "classifications" of Simulation Models, below. And in each of these "classifications", I have defined two possible "characteristics". 1. Functional Classification Deterministic Characteristic - These are "exact" models that will produce the same outcome each time they are run. Stochastic Characteristic - These models include some "randomness" that may produce a different outcome each time it is run. This randomness forces us to make a large number of runs to develop a "trend" in our "collection" of outcomes. Further, the exact number of how many "runs" you must make to obtain the "right trend" is simply a matter of statistics. Robot World: A Evolution SimulationBlake Bryce Bredehoft
2. Time Dependence Static Characteristic - These models are not time-dependent. This even includes the calculation of a specific variable after a fixed period of time. Dynamic Characteristic - These models depict the change in a system over many time intervals during the calculation process. 3. Input Data Discrete Characteristic - The input data form a discrete frequency distribution. Robot World: A Evolution SimulationBlake Bryce Bredehoft
Discrete frequency distributions are characterized by the random variable X taking on an enumerable number of values xi that each have a corresponding frequency, or count, pi. Continuous Characteristic - The input data can be described by a continuous frequency distribution. Continuous frequency distributions are characterized by a continuous analytical function of the form y = (x) where y is defined as the frequency of x. This definition is valid for all possible values of x (over the domain of the function). Robot World: A Evolution SimulationBlake Bryce Bredehoft
We can now say that Monte Carlo Simulations are "True Stochastic Simulations" in that they describe the "final state" of a model by just knowing the frequency distributions of the parameters describing the "beginning state" and the appropriate metric that maps, or transforms, the beginning state to the final state. They can also be either static (easy) or dynamic (more difficult). If a prediction were required, then "every possible" option would have to be considered and this is where the well-known "Variance Reduction Methods" (antithetic variables, correlated sampling, geometry splitting, source biasing, etc.) would be used to reduce the number of iterations required in the simulation. Definition courtesy of JAMES F. WRIGHT, Ph.D Ltd. Co. at http://www.drjfwright.com/ Robot World: A Evolution SimulationBlake Bryce Bredehoft
3 Theory 3.1 3D Modeling My graphics are done using OpenGL. In the simulation version there are three different aspects to the graphical output: the agents (the robots), the environment (the floor and batteries), and the events (explosions) and the population graph. The game version has all these same component except the agents include a user controlled robot and bosses, the events include grenades, lasers and grenade explosions, there is no graph, and there is a stat indicator, and a mini map. There is also the interface of my program from where you launch the simulations and games. Robot World: A Evolution SimulationBlake Bryce Bredehoft
The environment consist of a floor and background and small cubes representing batteries. Simple enough (below right). The robots consist of prisms and spheres to form arms, legs, torso, head and facial features (below left). The explosions are tori spinning on the y-axis that get more transparent as the grow in size (below center). The is also a program that is able to output numbers for the counters in the game version. See appendix A.1 for example code. Robot World: A Evolution SimulationBlake Bryce Bredehoft
There is also a graph out put that is fairly simple. Every iteration it plots a new point on the grid, and never erases, therefore creating a line graph for the populations of each artificial intelligence type (below left). The game mode utilizes a mini map and a status bar, the bar includes life, number of grenades, and number of kills (bottom right). Robot World: A Evolution SimulationBlake Bryce Bredehoft
3.2 Artificial Intelligence There are three different main artificial intelligences in the program and one that uses a combination of the others. The first is a random artificial intelligence that is the most basic, second is the group artificial intelligence that condones forming groups for reproduction, and the last is the battery or food artificial intelligence that promotes eating. The fourth is one that is advanced, and has the agent use the battery artificial intelligence when it requires energy, and uses the group artificial intelligence when he doesn't require energy. The random artificial intelligence first randomly decides whether to turn left, right, or go forward. Robot World: A Evolution SimulationBlake Bryce Bredehoft
It has a preference to turn if it turned the iteration before. This done over time produces interesting behavior. The fact that offspring spawn close to parents and that parents have to be close to produce offspring means that after a while these will group, and any robot that randomly strays from the group will die off, where as those in the group re spawn as fast as they die. Code is located in Appendix A.2. Robot World: A Evolution SimulationBlake Bryce Bredehoft
The group artificial intelligence goes through the list of robots and first recognizes all the robots that are of a color suitable for reproduction for the given robot. Then from this list the closest robot is chosen, the robot then turns towards this robot and walks. This obviously forms groups that are more efficient than the groups produced by the random artificial intelligence, because robots will not stray off. Code is located in Appendix A.2. Robot World: A Evolution SimulationBlake Bryce Bredehoft
The battery artificial intelligence goes through the list of batteries and finds the one closest to the robot and then turns the robot towards it and walks. Robots will end up heading after the same battery and form groups, these groups may or may not be able to reproduce though, but when a pair find each other these groups produce to be fairly strong. Code is located in Appendix A.2. Robot World: A Evolution SimulationBlake Bryce Bredehoft
The group that is the strongest is an amalgamation of both group artificial intelligent robots and battery artificial intelligent robots. These groups stay together due to the number of group robots and will search for food due to the battery robots. Proving to be extremely effective in keeping alive. Sooner or later however one of the artificial intelligences ends up getting bred out. There is a fourth artificial intelligence that is an amalgamation of the group artificial intelligence and the battery artificial intelligence. When the agent is low on energy it uses the battery artificial intelligence, until it has a decent amount then it uses the group artificial intelligence. Robot World: A Evolution SimulationBlake Bryce Bredehoft
3.3 Selection Process The selection process doesn't start with selecting but instead starts at the beginning of every iteration. The process is began by inventorying the population, tallying the number of robots and their characteristics, this then produces a few tables stored as a "genome" (code for class in Appendix A.3). This tables are formed to make graphs of the frequency of the specific gene settings. When the selection is called, it goes thorough and finds the optimal gene, in the parents gene pool, using a random number process, and the fore mentioned graphs. This is done for each gene. There is also a level of randomness calculated in the allows for mutations. These mutations give a new status to the gene, not in the gene pool. Code can be seen in Appendix A.3. Robot World: A Evolution SimulationBlake Bryce Bredehoft
While the theory behind this selection process may seem somewhat simple the code on the other hand is not. 3.4 Game Theory There are lasers and grenades. Your tools for destroying the surrounding robot population. Combine their power by shooting the grenade as it falls with your laser and produce a powerful explosion. After every 50 kills boss robots appear. One will appear the first 50, two the second 50 and so on. The bosses use a boss artificial intelligence of their own. The boss artificial intelligence will find the user controlled robot a turn it toward it and walk. Robot World: A Evolution SimulationBlake Bryce Bredehoft
Computer Vision: Edge DetectionsVertical diff., Roberts, Sobels 33
Abstract and paper needed Computer Vision: Edge DetectionsVertical diff., Roberts, SobelsMichael Feinberg
Optimization of a Traffic SignalThe purpose of this project is to produce an intelligent transport system (ITS) that controls a traffic signal in order to achieve maximum traffic throughput at the intersection. To produce an accurate model of the traffic flow, it is necessary to have each car be an autonomous agent with its own driving behavior. A learning agent will be used to optimize a traffic signal for the traffic of the autonomous cars. 35
Abstract Traffic in the Washington, D.C. area is known to be some of the worst traffic in the nation. Optimizing traffic signal changes at intersections would help traffic on our roads flow better. This pro ject is to produces an intelligent transport system (ITS) that controls a traffic signal in order to achieve maximum traffic throughput at the intersection. In order to produce an accurate model of the traffic flow through an intersection, it is necessary to have each car be an autonomous agent with its own driving behavior. Agent Modeling and Optimization of a Traffic SignalScott Hyndman
The cars cannot all drive the same because all the drivers on our roads do not drive the same. A learning agent is used to optimize a traffic signal for the traffic of the autonomous cars. Note: The results and conclusion pieces of the abstract are not included yet because the pro ject is not finished. Agent Modeling and Optimization of a Traffic SignalScott Hyndman
Introduction Traffic in the Washington, D.C. area is known to be some of the worst traffic in the nation. optimizing traffic signal changes at intersections would help traffic on our roads flow better. the purpose of this pro ject is to produce an intelligent transport system (its) that controls a traffic signal in order to achieve maximum traffic throughput at the intersection. in order to produce an accurate model of the traffic flow through an intersection, it is necessary to have each car be an autonomous agent with its own driving behavior. the cars cannot all drive the same because all the drivers on our roads do not drive the same. a learning agent will be used to optimize a traffic signal for the traffic of the autonomous cars . Agent Modeling and Optimization of a Traffic SignalScott Hyndman
Background 1.1 Traffic Signal Control Strategies There are three main traffic signal control strategies: pretimed control, actuated control, and adaptive control. 1.1.2 Pretimed Control Pretimed control is the most basic of the three strategies. In the pretimed control strategy, the lights changed based on fixed time values. The values are chosen based on data concerning previous traffic flow through the intersection. This control strategy operates the same no matter what the traffic volume is. Agent Modeling and Optimization of a Traffic SignalScott Hyndman
1.1.3 Actuated Control The actuated control strategy utilizes sensors to tell where cars are at the intersection. It then uses what it learns from the sensors to figure out how long it should wait before changing the light colors. For example, if the signal picks up a car coming just before the green light is scheuled to change, the length of the green light can be extended for the car to go through. Agent Modeling and Optimization of a Traffic SignalScott Hyndman
1.1.4 Adaptive Control The adaptive control strategy is similar to the actuated control strategy. It differs in that it can change more parameters than just the light interval length. Adaptive control estimates what the intersection will be like based on data from a long way up the road. For example, if the signal notices that there is a lot of traffic building up down the road during rush hour, it might lengthen the green light intervals on the main road and shorten them on the smaller road. Agent Modeling and Optimization of a Traffic SignalScott Hyndman
1.2 Driver Behavior None at this time. 1.3 Machine Learning This piece of the pro ject has not been started. Agent Modeling and Optimization of a Traffic SignalScott Hyndman
Development 2.1 Model I am using MASON software to do my traffic simulation. MASON is a Javabased modeling package that is distributed by George Mason University. My simulation is based on the MAV simulation included with the MASON download. In MASON, everything runs from the Schedule class. The Schedule keeps track of time and moves the simulation along one step at a time. Ob jects that move implement the Steppable interface. Thus, each has its own Step method that the Schedule calls at each step in time. There is also a Stoppable interface that takes ob jects off the Schedule. Agent Modeling and Optimization of a Traffic SignalScott Hyndman
In this program, the visible simulation is made by the CarUI Car User Interface class. The CarUI starts the CarRun class running. The CarRun class is what starts the Schedule and creates everything in the simulation. CarUI takes information from CarRun to display on the screen. CarRun creates Continuous2D's, one for each of the ob ject types used in CarRun - Car, Region, Signal, and eventController. Continuous2D's store ob jects in a continuous 2D environment. They make it easier keep track of the ob jects in the simulation. The Continuous2D breaks the space of the simulation into "buckets." Agent Modeling and Optimization of a Traffic SignalScott Hyndman
If you want to find an ob ject in a certain area of the simulation, you can check in the bucket there. For example, if you want to see if a car has another car near its front, you can look in the bucket that that car is in and check to see where the other cars in that bucket are. The Car class contains the information for how each autonomous car runs. It implements both the Steppable and Stoppable interfaces. Regions are what goes on the background of the visual output. Examples of Region ob jects are the roads and medians. The Signal class is almost identical to the Region class. However, the signals are redrawn at every time iteration while the Regions are only redrawn if they change loaction or size. Agent Modeling and Optimization of a Traffic SignalScott Hyndman
Lastly, because there is no way in MASON to control when actions happen in the Schedule, I made the eventController class to tell actions when to happen. The eventController class uses functions defined in other classes to control the ob jects of those classes. 2.2 Driver Behavior None at this time. 2.3 Optimization This piece of the pro ject has not been started. Agent Modeling and Optimization of a Traffic SignalScott Hyndman
Paper ? Machine IntelligenceWalking RobotGreg Maslov
Research paper? Eugene MeshProject?