Within the previous post, we historical an LQR controller to clear up the classic cart-pole pain. There, the circulate command is continuous, and our controller is allowed to output any force. This outcomes in some very soft management.

We are able to space the force over time:

We are able to ogle that our force begins off at 8 N and oscillates down to about -4 N earlier than settling down as the cart-pole system stabilizes. This raises a quiz of – don’t trusty world actuators saturate? What happens if we mannequin that saturation?
LQR with Saturation
The new LQR pain finds the sequence of (N) actions (right here, cart lateral forces) (a^{(1)}, ldots, a^{(N)}) that produces the sequence of states (s^{(1)}, ldots, s^{(N 1)}) that maximizes the discounted reward all over states and actions:
[begin{matrix}underset{boldsymbol{x}}{text{maximize}} & R(s^{(1)}, a^{(1)}) R(s^{(2)}, a^{(2)}) ldots R(s^{(N)}, a^{(N)}) R_text{final}(s^{(N 1)}) \ text{subject to} & s^{(i 1)}=T_s s^{(i)} T_a a^{(i)} qquad text{ for } i=1:N end{matrix}]
Successfully, in actuality we good buy future rewards by some ingredient (gamma in (0,1)) and we don’t effort with the terminal reward (R_textual verbalize{final}):
[begin{matrix}underset{boldsymbol{x}}{text{maximize}} & R(s^{(1)}, a^{(1)}) gamma R(s^{(2)}, a^{(2)}) ldots gamma^{N-1} R(s^{(N)}, a^{(N)}) \ text{subject to} & s^{(i 1)}=T_s s^{(i)} T_a a^{(i)} qquad text{ for } i=1:N end{matrix}]
Or more merely:
[begin{matrix}underset{boldsymbol{x}}{text{maximize}} & sum_{i=1}^N gamma^{i-1} R(s^{(i)}, a^{(i)}) & \ text{subject to} & s^{(i 1)}=T_s s^{(i)} T_a a^{(i)} & text{for } i=1:N end{matrix}]
Our dynamics are linear (as viewed within the constraint) and our reward is quadratic:
[R(s, a)=s^top R_s s a^top R_a a]
It turns out that efficient solutions exist for fixing this pain precisely (Perceive allotment 7.8 of Alg4DM), and the answer does no longer count upon the initial command (s^{(1)}).
With actuator limits comes force saturation. Now, we are able to also lawful mutter the controller that we got without fascinated about saturation, and lawful clamp its output to a feasible differ:
