0%

rlDemo

Q-learning

base代码

1
2
3
4
# 老是说我没有权限就很烦...
sudo /home/hesy/.conda/envs/py36/bin/python main.py # use default config 0.9,0.9,0.1,200,0.1,500
sudo /home/hesy/.conda/envs/py36/bin/python main.py --gamma 0.95 --me 100
sudo /home/hesy/.conda/envs/py36/bin/python main.py --gamma 0.95 --es 0.99 --me 100
  • ε-decay和ε-start还有ε-end是耦合的,第一个感觉比较难调整,就调后面两个好了

    image-20201022023149453
    • 先用默认参数跑了下,发现其实100步已经妥妥收敛了(右边),所以me果断设置100 ,确实还不错(见下)

      image-202010220307622image-20201022041939

    • 最短路径是15步,所以gamma我取了个1-1/15,约等于0.95

      image-20201022025051942image-20201022024836105

      目前看效果还不错(如上),肯定是train好了,接着调

    • ee 调到0.99,希望一开始探索多一点

      image-20201022025738159image-20201022025802328

      可以看到一开始探索多了以后,学习得居然也快了,说明探索到了好的方法

    • 再分别试试调大学习率(0.15)和调小学习率(0.05

      image-20201022030055460image-20201022030055460

      学习率大了以后果然学的就是快hhh

DQN

环境配置

游戏:CartPole-v0,action是两维度(左和右,分别用0和1表示), state是四维的($x,\overset{·}x,\theta,\overset{·}\theta$)(位置,速度,杆子与竖直方向的夹角,角度变化率);左移或者右移小车的action之后,env会返回一个+1的reward。其中CartPole-v0中到达200个reward之后,游戏也会结束,而CartPole-v1中则为500。最大奖励(reward)阈值可通过前面介绍的注册表进行修改。

错误记录 & 修正

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Traceback (most recent call last):
File "main.py", line 158, in <module>
eval(cfg)
File "main.py", line 130, in eval
action = agent.choose_action(state,train=False) # 根据当前环境state选择action
File "/home/hesy/rlreview/leedeeprl-notes/codes/dqn/agent.py", line 76, in choose_action
q_value = self.target_net(state)
File "/home/hesy/.conda/envs/py36/lib/python3.6/site-packages/torch/nn/modules/module.py", line 541, in __call__
result = self.forward(*input, **kwargs)
File "/home/hesy/rlreview/leedeeprl-notes/codes/dqn/model.py", line 29, in forward
x = F.relu(self.fc1(x))
File "/home/hesy/.conda/envs/py36/lib/python3.6/site-packages/torch/nn/modules/module.py", line 541, in __call__
result = self.forward(*input, **kwargs)
File "/home/hesy/.conda/envs/py36/lib/python3.6/site-packages/torch/nn/modules/linear.py", line 87, in forward
return F.linear(input, self.weight, self.bias)
File "/home/hesy/.conda/envs/py36/lib/python3.6/site-packages/torch/nn/functional.py", line 1370, in linear
ret = torch.addmm(bias, input, weight.t())
RuntimeError: Expected object of device type cuda but got device type cpu for argument #2 'mat1' in call to _th_addmm

choose_action在eval的时候默认选择了CPU,但是模型可能load在GPU上..

image-20201118130642905

  • ==?==我想问下这里使用CPU进行evaluation是必须的么?是考虑到不想把变量转移到GPU上增加开销才写死到CPU上进行evaluation的吗?写死的话…就会出现问题….
Hesy WeChat Pay

WeChat Pay

Hesy Alipay

Alipay