the bug is in act
function
def act(self, some_input, state):
# mu contains info required for gradient
mu, var, state_value = self.model(some_input, state)
# mu is detached and now has forgot all the operations performed
# in self.action_head
mu = mu.data.cpu().numpy()
sigma = torch.sqrt(var).data.cpu().numpy()
action = np.random.normal(mu, sigma)
action = np.clip(action, 0, 1)
action = torch.from_numpy(action/1000)
return action, state_value
for the further process, if loss is calculated using tensor operations performed on action
, it can not be traced back to update self.action_head
weights, as you detached
the tensor mu
which removes it from the computation graph and so you do not see any updates in self.action_head
.
CLICK HERE to find out more related problems solutions.