the bug is in
def act(self, some_input, state): # mu contains info required for gradient mu, var, state_value = self.model(some_input, state) # mu is detached and now has forgot all the operations performed # in self.action_head mu = mu.data.cpu().numpy() sigma = torch.sqrt(var).data.cpu().numpy() action = np.random.normal(mu, sigma) action = np.clip(action, 0, 1) action = torch.from_numpy(action/1000) return action, state_value
for the further process, if loss is calculated using tensor operations performed on
action, it can not be traced back to update
self.action_head weights, as you
detached the tensor
mu which removes it from the computation graph and so you do not see any updates in
CLICK HERE to find out more related problems solutions.