RDDL Blocks World

Blocks World

A good portion of my graduate work has had to do with automated planning applied to interactive narrative generation. Planning is the process of finding a sequence of actions that will transform some world's initial state into a goal state. A famous problem world, called a domain, used as an example in planning is called Blocks World. Three things exist in most Blocks World examples: a robot arm, a table, and a collection of three wooden blocks labeled A, B, and C. In Blocks World, the three blocks can rest on the table or on each other. The robot arm can pick a block up if no other block is resting on top of it and the robot is not already holding a block. Finally, the robot arm can place a block it is holding on the table or on top of another block. The goal of Blocks World problems is to start from some initial configuration of blocks and use the robot arm to unstack and re-stack blocks until some pre-specified configuration is achieved.

PDDL

The language used by most planners to specify a world, its mechanics, and a particular problem set in the world is called Planning Domain Definition Language (PDDL). Every PDDL problem has two components, a problem and a domain. The problem specifies what exists in the world, the initial world configuration, and the goal state configuration. The domain specifies what actions can be taken in the world, what must be true for an action to be taken, and how the action changes the world. For example, here is a Blocks World planning problem and domain modeled in PDDL. The problem introduces three objects to reason over, blocks A, B, and C. It specifies that initially all three blocks are resting on the table and the robot arm is not holding anything. Finally, the problem specifies that the goal state is to have block B resting on block C and block C resting on block A. The domain specifies four actions that can be taken by the robot arm: pickup, putdown, stack, and unstack. Pickup takes a block off the table if the block has nothing on top of it and the robot is not holding anything. Putdown places a block the robot is holding on the table. Stack places a block the robot is holding on top of another block that has nothing on top of it. Unstack is the final action, it picks a block up off another block if the first block has nothing on top of it.

If you like, you can try the example Blocks World domain and problem out at this website. Upload the two PDDL files linked earlier and hit Solve. You should be given a sequence of actions that solves the planning problem, like this:

  1. Pickup C
  2. Stack C on A
  3. Pickup B
  4. Stack B on C
This sequence of moves places B on top of C and C on top of A, which is the solution state to the planning problem. Clever computer!

RDDL

Planning has been around quite a while, at least since Shakey the robot was created at Stanford Research Institute along with STRIPS and A* in the late 1960s and early 1970s. Since then, academics who study planning have established an annual conference called the International Conference on Automated Planning and Scheduling (ICAPS) where they publish and present peer reviewed research advances in the field of planning. Starting in 1998, ICAPS has held competitions where different planning systems are tested against each other on benchmark problems to see which of them produces solutions the fastest and most efficiently (if at all!). At the 2006 planning competition, a new probabilistic track was introduced to the competition. This track allowed planning domains to have actions with non-deterministic outcomes. This means that actions in the planning world can have outcomes determined by probabilities. This new track introduced a new planning language that allows stochastic outcomes, called Probabilistic PDDL (PPDDL).

But PPDDL's reign did not last long! For the 2011 ICAPS International Probabilistic Planning Competition, Scott Sanner introduced the Relational Dynamic influence Diagram Language (RDDL, pronounced "riddle"). RDDL allows domain designers to model partial observability, or situations where the robot being controlled by the planner can't see everything in the world at all times. This allows probabilistic planners to create Partially Observable Markov Decision Processes (POMDPs) in addition to the regular Markov Decision Processes (MDPs) generated by planners from PPDDL. Here is a great introduction to RDDL by Sanner (in which he pokes fun at planning researchers for using Blocks World as a test domain).

At first, RDDL was strange from my perspective because unlike PDDL and PPDDL it represents actions as fluents. Instead of defining actions in terms of preconditions and effects, RDDL specifies what combinations of action and state fluents can cause a state fluent to change. It's an interesting paradigm shift from the perspective of someone used to working with PDDL.

Blocks World in RDDL

In order to adjust to the RDDL worldview, I decided to recreate the classic Blocks World PDDL problem with RDDL. Here is the domain I engineered and here is a problem instance that corresponds to the PDDL problem I linked earlier. In order to run the RDDL files you need two components: a world simulator and a planner. The RDDL world simulator can be downloaded and installed from Scott Sanner's GitHub project. This is the same world simulator used at the ICAPS Probabilistic Planning Competition. To create the solution MDP I used Gourmand, a planner developed by Andrey Kolobov, Mausam, and Dan Weld at The University of Washington. A different planner, PROST by Thomas Keller of The University of Freiburg, won the 2011 and 2014 competitions, but I found the runner up Gourmand to be more user friendly.

If you're not interested in setting up and running the RDDL world simulator and the planner, you can see Gourmand's output on my domain here. Sure enough, the planner performs the following steps:
  1. Pickup C
  2. Stack C on A
  3. Pickup B
  4. Stack B on C
And then it performs null actions, called noops, for the rest of its turns. It does this because each action it takes incurs a punishment, so it wants to take the least number of actions possible to accomplish the goal state.

If you're interested in learning more about RDDL, I recommend you download and play around with the simulator and planner. There are several interesting toy domains, including a Frogger and Mars rover model, that are a lot of fun to play with. Both domains are bundled with Scott Sanner's RDDL world simulator software.

Comments

Popular posts from this blog