I used to evaluate my past actions based on the actual outcomes they produced. I was sort of mind blown when I realized I should evaluate the rationality of an action separately from the actual outcome that is produced. An explanation from R+N:
We need to be careful to distinguish between rationality and omniscience. An omniscient agent knows the actual outcome of its actions and can act accordingly; but omniscience is impossible in reality. Consider the following example: I am walking along the Champs-Élysées one day and I see an old friend across the street. There is no traffic nearby and I’m not otherwise engaged, so, being rational, I start to cross the street. Meanwhile, at 33,000 feet, a cargo door falls off a passing airliner, and before I make it to the other side of the street I am flattened. Was I irrational to cross the street? It is unlikely that my obituary would read “Idiot attempts to cross street.”
Another important point to remember is that information-gathering should also be considered an action.
Our definition of rationality does not require omniscience, then, because the rational choice depends only on the percept sequence to date. We must also ensure that we haven’t inadvertently allowed the agent to engage in decidedly underintelligent activities. For example, if an agent does not look both ways before crossing a busy road, then its percept sequence will not tell it that there is a large truck approaching at high speed. Does our definition of rationality say that it’s now OK to cross the road? Far from it! First, it would not be rational to cross the road given this uninformative percept sequence: the risk of accident from crossing without looking is too great. Second, a rational agent should choose the “looking” action before stepping into the street, because looking helps maximize the expected performance. Doing actions in order to modify future percepts—sometimes called information gathering is an important part of rationality and is covered in depth in Chapter 16. A second example of information gathering is provided by the exploration that must be undertaken by a vacuum-cleaning agent in an initially unknown environment.
Maybe a year or two ago I realized that I try to decide way too much apriori, and ever since then I’ve worked on forcibly trying to remember that information-gathering is a possible action whenever I’m making a decision.
But interestingly even though I would consider information-gathering to be a possible action while in the process of deciding on a future action, I did not consider information-gathering to be a possible action while reflecting on past actions.
So if is my set of possible actions excluding information-gathering and is the act of information-gathering, I made the move from considering actions in to considering actions in while choosing future actions. But while reflecting on past actions, I would still only think about whether I picked the best action in just .
And yes, this presupposes that I was not reflecting on the value of picking . I think that’s because of some mix of (1) picking rarely (2) still having some conceptual difference between and other actions in that makes something I don’t think about reflecting about.
So anyway why is there this weird discrepancy between considering for future choices and for past decisions? Well, while looking forward the goal is to pick the action that gives me the best expected outcome, and I completely realize that I need to consider in order to do that.
But while looking backwards, the outcomes have already happened. The point of reflection is to perform metareasoning to improve my judgements in the future. But instead of actually doing that, what I was really doing was trying to minimize my feelings of regret.
The only way I could minimize my regret after the fact is to feel like my action was the rational choice, despite resulting in a possibly negative outcome. I could make my action seem more rational if it was only picked out of rather than .
It’s a really subtle difference, so I didn’t realize I was doing it for a long time. Another lesson of always being conscious of the approximate evaluation metrics you use and the loopholes that are present in them.
See Wireheading by Emotional Control for a similar sort of problem.