Conventions are powerful because they allow for coordination without communication.

Consider a game with 2 agents, $A_1$ and $A_2$. Each agent can either take the action left or right, $L$ or $R$. 1 Joint actions are tuples of $A_1$’s action and $A_2$’s action. For example, if $A_1$ goes left and $A_2$ goes right, then the joint action is $(L, R)$. Also suppose that $A_1$ and $A_2$ are fully cooperative; they get the same reward. They get a reward of 0 if $(L, L)$ or $(R, R)$ occurs and a reward of 5 if $(L, R)$ or $(R, L)$ occurs.

Both agents know that the optimal joint policies are $(L, R)$ and $(R, L)$. The hard part is coordination of which joint policy to follow, so they don’t end up both accidentally picking the same action.

One way to do this is for them to communicate. It’s like when you’re playing a sport and you yell, “Got it!”, so that your teammates know that you’ll get the ball and they can focus on other things. $A_1$ can say, “I’m going left!” and then $A_2$ knows it must go right. But communication is complicated. You could run into syncronization issues, have too much noise when sending messages, or it just might not be possible in certain applications.

A simple way to ensure coordination without any communication between the two agents is to pre-establish a convention beforehand. For example, if you already know what the optimal joint policies are beforehand, the absolute easiest thing to do is to just agree to following one optimal joint policy. i.e. “Hey agents, you’re going to do $(R, L)$!”.

But generally you don’t even need to precommit to a specific policy, instead you can just precommit to a method for picking which optimal joint policy to execute. For example, you can simply agree that all agents will sort the optimal policies in lexicographic order and then pick the first one to follow. This really only requires two things (1) an order on the agents and (2) an order on the actions.

This contrast between communication vs conventions to solve coordination problems came up again today when I was making a quick fix to BookNLP. BookNLP is an NLP pipeline for literature and there is a part where the list of characters extracted from a story is being finalized. Now characters in a story can be referenced by multiple names, but you still only want to add each character once. I.e. Joe and Mr. Joe Garger are actually just the same person and we don’t want to double-count him.

One part of how this is ensured is by first only picking the longest name for each character (a convention!). For each name, $n$, let $s(n)$ be the set that contains $n$ split on whitespace. E.g. if $n = \text{"Mr. Joe Garger"}$, then $s(n) = \{\text{"Mr"}, \text{"Joe"}, \text{"Garger"}\}$. Now for every name $n_1$, if there is a name such that $n_2 \neq n_1$ and $s(n_1) \subseteq (n_2)$, then $n_1$ was flagged as a name to not be added.

Code below:

But the problem with the way this was written is that if a character was mentioned both as “Sakura Kinomoto” and “Kinomoto Sakura”, then she wouldn’t get added at all!

This is because $s(\text{"Sakura Kinomoto"}) = s(\text{"Kinomoto Sakura"})$, so both sets are subsets of each other and both names would get flagged. Alright, so the way to fix this is to check for when the sets are equal and only flag one of them. One way to do this would be to flag $n_1$ if $s(n_1) = s(n_2)$ and $n_2$ has already been added to the final list of names. Another way is to use a convention! Just pick the name that is first in lexicographic order to add.

Anyways, this is all nice and quaint, but the real question is where do conventions come from? There are a lot of people studying this and I think it’s super interesting.

1. This example is taken from Boutilier, Craig “Planning, Learning and Coordination in Multiagent Decision Processes”. Theoretical Aspects of Rationality and Knowledge, 1996.