Seong Jae Lee
CSE557 Computer Graphics
Winter 2009
Contents |
The model consists of ten cubic segments and nine joints. Each joint is a hinge joint along an axis perpendicular to the straight body and parallel to the ground plane. Every adjacent three joints are controlled by a "muscle", which applies a same torque on each. Since there are nine joints, the model has three muscles.
The model has only one state: time. Each muscle is controlled by a periodic function composed of tent bases. If we define a state to be the shape of the model, we need 22 states (the position/velocity/angular velocity of the head (4), rotation amount and angular velocity of each joints (2 x 9 = 18)), which is too big to calculate a controller on that space.
The model runs between two phases: march and pause. Since the controller doesn't know the current shape of the model, we cannot guarantee the model will proceed with a certain control even though we observe the model proceeding within a time window with it. Another issue is that ODE engine sometimes provides a huge difference during the simulation with very small numerical difference, which makes most of the controllers unstable.
To fix this, I added a pausing phase, so that the model can return to the pose with the stability: a straight pose. In this project, after a certain amount of time passes during the marching phase, the controller stops marching and adjust the model to be the initial straight pose. The picture on the right shows two periods visualized. As you can see, the length and and the trajectory of the pausing phase is slightly different from the others, which is caused by the butterfly effect of ODE engine.
The pausing controller is hand-tuned: while the angle of the joint is not zero, apply a constant force to make it close to zero. Therefore, in this section, we will focus on learning the marching controller.
The marching controller for each muscle is a weighted sum of uniformly distributed tent functions. To rephrase, the torque on muscle i on time t is a weighted sum of N tent functions Bi,j-:
.Here, each tent function is 80 force unit with length T / N while T is the marching phase period. Each weights is between -0.5 and 0.5. The range of T is [50,250]-.
The objective function is simple: since the model is designed to proceed forward only, the objective is the advance amount during a unit time. Since the pausing phase can have different times, we averaged the advance during three marching-pausing periods. I defined the advance amount to be zero if the model fell down on its side due to a huge torque generated by abrupt controls.
The discretized controller is represented using 3N + 1 parameters, since there are three muscles with N parameters plus one additional parameter T.
For the learning, we used simulated annealing algorithm.
vector w_best = w = randomW(); // starting random weight
double d_best = d = distance(w); // objective
double r = 1.0;
while (r < epsilon) {
w = random() < r ? randomW() : randomNeighbor(w);
d = distance(w);
if (d_best < d)
w_best = w; d_best = d;
r *= 0.99;
}
return w_best;
We ran 100 iterations with distributed computing. For each step, we gathered data from 105 processes. The exploration factor γ- is multiplied by 0.99 for each iteration.
The picture on the right is showing the learning curve. The period of the marching controller is 0.736576 second, and the weights are (0.419852, 0.449261, 0.107661, 0.939927, 0.678625, 0.021192, 0.0284052, 0.507091, 0.554658), (0.924705, 0.886051, 0, 0.158451, 0.293836, 0.969938, 0.233705, 0.349936, 0.127819), and (0.966276, 0.828953, 0.674043, 0.638104, 0.502207, 0.878337, 0.832906, 0.95565, 0.672078) for each muscle. The graph on the right also shows the control function of each muscles.
The speed of the model with this controller is about 0.52 length unit per second. It took 304.4 seconds to advance 100 unit length.
Since the controller only considers a plane without any slope on its learning stage, we cannot guarantee the output controller to be sub-optimal. Actually, we cannot even sure it will move forward.
I created a few inclined terrains with different slopes. The graph on the right shows the simulation result on different slope ratios. The blue line is the performance on the slope-less plane, which is same to the learning environment. As the slope ratio increases, the advance amount decreases. When the slope ratio is too high (in this case 0.2), the model even moves backward: it performs even worse than doing nothing.
The biggest discomfort on this project is that the simulation doesn't produce consistent results. For example, if I change dt from 0.05, the simulation result completely changes. Moreover, if I set dt to be the rendering frame rate, the model even doesn't move forward. Even a millionth of difference on the weights might produce a totally different result. I don't know what's the reason - the controlling function might not be smooth, the Euler method might have this limitation, or the ODE engine might be instable.
This controller learned its policy in 1 dimensional terrain. If a bump on the ground changes the model's direction, there is no way to rotate it back with current structure. To fix this, we need another joint that changes the model's direction. In this case, learning would be much harder, since the pausing controller might be a harder task than it is now. Since we have a rotating joint around the axis that is perpendicular to the ground plane, it would be VERY hard to remove this butterfly effect: no matter how the model tries to be straight, the joint angle would be slightly off to the exact zero.