Friday, 8 November 2013

A win for the powers of generosity - Part 1

Lokee kindly drew my attention to an interesting article recently – "Generosity leads to evolutionary success" – which is largely based on a paper by Alexander Stewart and Joshua Plotkin, "From extortion to generosity, the evolution of zero-determinant strategies in theprisoner’s dilemma".  The Stewart&Plotkin paper was, again largely, a response to what strikes me as a somewhat more technical paper by William Press and Freeman Dyson, “Iterated Prisoner’s Dilemma contains strategies that dominate any evolutionary opponent” (this latter paper certainly seems less accessible to a lay reader such as myself, irrespective of how technical it is).

I spent a bit of time mulling over these papers (and a couple more that are referenced by one or both of them) and thought I might share the outcome of those ponderings.

-----------------------------

My first reaction was to think that, based on the popular article at the Archaeology News Network (ANN), there might be scientific vindication of some of the ideas raised in the series of articles Morality as Playing Games.  In that series, I primarily suggested that self-interest might lie behind our morality (and more specifically the avoidance of loss).  I also suggested that successful ethical systems would involve overt generosity and kindness during good times combined with an ability to act less generously either during bad times, or when no-one is watching.  I further suggested that cooperation might arise when you compete against a third party (for example, in the prisoner’s dilemma, the prisoners can compete against each other, which is the standard assumption, or they can cooperate in order to compete against the prosecutor).

The key evidence that seems to vindicate the idea that bad times promote less generous behaviour is in Figure 3 of Stewart&Plotkin.  They found that in an evolving population “generous” strategies were more successful than “extortionate” strategies, basically because “extortionate” strategies don’t do well against themselves so they are in effect self-limiting – but when the populations were small (for example when it comes down to a population consisting of just you and me, or my family and your family), “extortionate” strategies prevail.

But what, you might ask, is an “extortionate” strategy?  

Let’s return to our prisoners, Larry and Wally.  I’ll have to modify the scenario slightly to introduce the idea of T, R, P and S which are “maximum payoff”, “mutual cooperation payoff”, “mutual defection payoff” and “minimum payoff” and are conventionally set to values of T=5, R=3, P=1 and S=0 (note however that Stewart&Plotkin use a “donation game” variant in which T=B, R=B-C, P=0 and S=-C, where B>C so that R>P).

-----------------------------------

In Ethical Prisoners, I explained that Larry and Wally are faced with a dilemma in which they can choose to either cooperate with each other (by remaining silent with respect to a crime they are accused of committing) or defect (by confessing).

I used a table to explain how these options play out, which I’ve updated below:

 
Larry defects (confesses)
Wally defects
(confesses)
payoffLARRY=1 (P)
payoffWALLY=1 (P)
Wally cooperates
(remains silent)
payoffLARRY=5 (T)
payoffWALLY=0 (S)

or

 
Larry cooperates (remains silent)
Wally defects
(confesses)
payoffLARRY=0 (S)
payoffWALLY=5 (T)
Wally cooperates
(remains silent)
payoffLARRY=3 (R)
payoffWALLY=3 (R)

The results are more traditionally represented in terms of cooperation (c) and defection (d) like this (where Larry is the focal player):

 
cWALLY
dWALLY
cLARRY
R=3
T=5
dLARRY
S=0
P=1

Larry and Wally can choose different strategies depending on not only what sort of outcome they want but also what sort of overall scenario they find themselves in.  As originally framed, the prisoners’ dilemma (PD) is a one-shot affair, meaning that Larry and Wally face off against each other once and make a single decision with no historical context or potential for future consequences.  We can, however, consider an iterated prisoners’ dilemma (IPD) in which Larry and Wally would make equivalent decisions repeatedly, in which they face off many times and, presumably, can learn about each other’s behaviour and react accordingly.

If we ignore the prosecutor and any pre-existing moral imperative to not abandon your partner in crime (as discussed in Ethical Prisoners), in the one-shot PD Larry and Wally are likely to choose a simple defection strategy since defection not only opens up the possibility of a maximum payoff, but also avoids the possibility of the minimum payoff.

Things get more interesting with the IPD.  Strategies can now involve a consideration of previous moves and there has been a wealth of evidence to show that the most successful strategy in terms of overall payoff is what is known as Tit-For-Tat (TFT).

(In terms of head-to-head battles, the most successful strategy is (and remains) the simple defection strategy Always Defect (ALLD), since it either wins or draws.)

We can represent the TFT strategy as a probability table like this:

previous round
cWALLY
dWALLY
cLARRY
p1=1
p2=0
dLARRY
p3=1
p4=0

where,

·         p1 is the probability of Larry cooperating if both cooperated last round,

·         p2 is the probability of Larry cooperating if Wally defected while Larry cooperated last round,

·         p3 is the probability of Larry cooperating if Wally cooperated while Larry defected last round, and

·         p4 is the probability of Larry cooperating if both defected last round. 

In short, if Wally defected in the previous round, then Larry will defect this round but if Wally cooperated, then Larry will cooperate.

Different strategies can be tested against each other using PD-bots in tournaments.  With simple strategies as TFT and ALLD, the whole outcome of the tournament is predicated on the first round.  Assuming that a TFT-bot starts off with cooperation, two TFT-bots will cooperate forever, obtaining an average score of 3 each, two ALLD-bots will defect forever obtaining an average score of 1 each and a TFT-ALLD pairing will result in an initial win for ALLD followed by mutual defection forever obtaining average scores that approach 1 (down from 5 for ALLD and up from 0 for TFT).

(If we don’t assume that TFT-bots start off with cooperation, but instead with defection, then all results default to an average score of 1.)

When assessing the average score of strategies over many rounds against a range of opponents and many iterations in each round, TFT has been shown to be a clear winner (with a cooperative start) despite losing one iteration per round in any match-up against an ALLD.

What Press&Dyson discovered is that there are more complex strategies, being a subset of “Zero Determinate” or ZD strategies, in which an “extortionate” player can drive a self- interested, evolutionary opponent to always cooperate.  A “concrete” example of an extortionate strategy on the part of Larry (per Press&Dyson) is:

previous round
cWALLY
dWALLY
cLARRY
p1=11/13
p2=1/2
dLARRY
p3=7/26
p4=0

Even before going into more detail, it might be pretty easy to see that Wally’s best option is to always cooperate in order to maximise the frequency with which Larry cooperates.  The likelihood of Larry cooperating is always higher if Wally has just cooperated, and if also Larry has just cooperated.  The flip side of this is that the likelihood of Larry defecting is higher if either of them has just defected.  These facts combine to drive Wally towards unilateral cooperation in order to maximise his score.

What Press&Dyson showed is that if Wally does unilaterally cooperate to maximise his score against the extortionate Larry, he does so at the cost of maximising Larry’s score above his own.  What I interpret out of this is that since Wally can’t win against Larry in the long term, his obvious choices are to:

·         maximise his own score, thereby ensuring that Larry wins, or

·         minimise both scores by locking into mutual defection (and thereby secure a draw).

----------------------

Stewart&Plotkin were interested in ZD strategies in general but somewhat more intrigued by a different subset of them, not the “extortionate” strategies but rather what they called “generous” ZD strategies.

The generosity in question can be interpreted in two ways.  Firstly, and I think possibly most importantly, a generous ZD strategy is more forgiving and this can be brought about by having a non-zero value of p4 (which avoids being locked eternally into mutual defection).  Secondly, a generous ZD strategy tends to maximise average payoffs for both players, which can be done by using strategies which have a low value of χ and a value of κ that approaches R, where R is the value of mutual cooperation and, in combination, κ and χ constitute an indication of how “extortionate” the strategy is.  The latter term, χ, is referred to by Press&Dyson as an “extortion factor” where χ=1 is “fairness” and higher values are increasingly extortionate.  (This term might otherwise be referred to as “leverage”.)  One can obtain an indication of how wide the gulf is between the average payoffs for each player (sLARRY and sWALLY) using an equation that applies for ZD strategies:

sLARRY - κ = χ . (sWALLY - κ)

If Larry’s strategy produces χ=1, then the payoffs are equal for both players, irrespective of the value of κ for that strategy.

--------------------------

Note that determining precisely what the parameters χ and κ relate to in reality is a little complex.  So far I’ve seen no simple method that one can use to select values of χ and κ and from them generate a strategy.  It is possible however to fiddle with the values of p1, p2, p3 and p4 in a coordinated way to raise or lower the value of the parameters.

Note also that with conventional values of T, R, P and S, there is a limitation to the value of sLARRY and sWALLY, such that 2P <= sLARRY + sWALLY <= 2R, due to the typical assumption that 2R > T + S.

------------------------------

Extortionate strategies are defined as those for which κ=P (the payoff for mutual defection) and χ>1.  In such strategies, the equation sLARRY - κ = χ . (sWALLY - κ) shows quantitatively that the other player can only increase their payoff by simultaneously increasing the extortionist’s payoff, for example for a strategy with χ=4 and κ=P=1:

sLARRY
sWALLY
1.000
1.00
1.250
1.06
1.500
1.13
1.750
1.19
2.000
1.25
2.250
1.31
2.500
1.38
2.750
1.44
3.000
1.50
3.250
1.56
3.500
1.63
3.750
1.69
4.000
1.75
4.200
1.80
3.250
1.56
3.500
1.63
3.750
1.69
4.000
1.75
4.200
1.80

where sLARRY + sWALLY = 6.0 is a limiting value.

Strategies that are more generous have a higher value of κ, for example for a strategy with χ=4 and κ=R=3:

sLARRY
sWALLY
0.000
2.25
0.250
2.31
0.500
2.38
0.750
2.44
1.000
2.50
1.250
2.56
1.500
2.63
1.750
2.69
2.000
2.75
2.250
2.81
2.500
2.88
2.750
2.94
3.000
3.00

This strategy is still manipulative in a way, since Larry is still encouraging Wally to increase Larry’s score while Wally tries to increase his own, but it’s more generous since until parity is reached (where sLARRY = sWALLY), Wally’s score will be higher than Larry’s.

What the strategy does do, however, is open Larry up to sabotage by Wally.  Wally could, at the cost of a fraction of a point, drive Larry’s score to 1.0.  Larry might not obtain any benefit from increasing his score above 2.5 so even without malice on his part, lack of incentive to cooperate further might damage Larry catastrophically.

This vulnerability to sabotage can be mitigated by Larry if he were to choose a strategy with a lower value of χ: by lowering the “extortion factor” which, in a generous strategy, works against you (hence the suggestion that the term “leverage” be used).  For example if Larry’s strategy produces χ=1.5 and κ=R=3, he will get the following results:

sLARRY
sWALLY
0.600
1.40
0.750
1.50
1.000
1.67
1.250
1.83
1.500
2.00
1.750
2.17
2.000
2.33
2.250
2.50
2.500
2.67
2.750
2.83
3.000
3.00

where sLARRY + sWALLY = 2.0 is a limiting value.

This strategy narrows the gap in scores between yourself and the other (thus limiting the effectiveness of any attempts to sabotage you), increases the self-harm done by a saboteur and makes it impossible for an opponent to drive your score to zero – while still being generous.

If χ is reduced even further, to a value below 1, a potential saboteur would do more damage to themselves than they would to their opponent, for example with χ=0.75 and κ=R=3:

sLARRY
sWALLY
~1.285
~0.715
1.500
1.00
1.750
1.33
2.000
1.67
2.250
2.00
2.500
2.33
2.750
2.67
3.000
3.00

where sLARRY + sWALLY = 2.0 is a limiting value.

Another potential way to limit sabotage is to set κ to a lower value – within the range P < κ < R.  Doing so again increases the harm a saboteur does to themselves in their efforts to harm you.  For example for a strategy with χ=1.5 and κ=(P+R)/2=2:

sLARRY
sWALLY
0.800
1.20
1.000
1.33
1.250
1.50
1.500
1.67
1.750
1.83
2.000
2.00
2.250
2.17
2.500
2.33
2.750
2.50
3.000
2.67
3.200
2.80

where sLARRY + sWALLY = 6.0 and sLARRY + sWALLY = 2.0 are limiting values.

Note however that the payoffs for sLARRY and sWALLY always reach parity at κ, no matter what value κ is set to, so reducing κ would a self-defeating strategy on the part a generous Larry, since a malign Wally can still force him into an inferior position by reducing his payoff to below whatever level Larry sets for κ.  Therefore, employing a strategy with a lower value of κ would just decrease Larry’s score.  On the other hand, against a more reasonable, evolutionary player who aims only to increase their own score, without worrying about Larry’s, this strategy is slightly superior giving Larry a marginally higher payoff than a strategy with a higher value for κ.

A better result, however, can be obtained using a lower value of χ and a higher value of κ, for example with χ=0.11 and κ=2.75:

sLARRY
sWALLY
2.625
0.60
2.750
1.74
2.875
2.875
2.900
3.10

where sLARRY + sWALLY = 6.0 is a limiting value.

Here, Wally is highly encouraged to score well, even slightly higher than Larry and is punished severely if he acts malignly without damaging Larry in any significant way, particularly since Larry has signalled that he is uninterested in a maximal score.  I’m not totally convinced that this strategy is better than one in which κ=R and χ is set low, but there may be situations in which it has its benefits (specifically thinking of situations in which success brings with it some other risk, so that being a dog that’s doing nicely without actually being top dog is preferable).

As in Ethical Prisoners , the strategies that Larry and Wally select will depend at least partially on who they believe they are playing against – in terms of this scenario, their strategies will depend on whether they want to maximise their own scores or to beat the other.

Perhaps the best strategy is somewhat more complex, one in which Larry identifies what sort of opponent he has and what the playing environment is like, and then choses an appropriate level of generosity or extortion to match.

----------------

There’s something else to unpack from this.  If this mathematical modelling can be applied to real life situations, and there are indications that it can, then something that we can take away is that if the “players” consider that either times are poor or the population is small, then they will tend to act less generously, since generosity tends to prevail only in good times with larger populations.

With creatures as intellectually complex as humans, we need to be careful about how we consider population size.  Some people might consider that there is only a population size of two – people like me (i.e. primarily me) and everyone else.  Very few will, in practical terms, consider the population to be a little over 7 billion.

These findings could be considered support for the notion that we should encourage people to regard themselves as part of a larger inclusive society because by doing so we would encourage more cooperative and generous behaviour.

Similarly, we could consider that a sense that times are poor can dissuade people from engaging in generous behaviour (examples are replete in dystopian themed tales in which people turn on each other during tougher times) – so constant proclamations of doom from the media and politicians (especially those in opposition) can become self-fulfilling.  If we take the opposite approach by highlighting the positive, we might be able to generate a virtuous circle in which a feeling that things are getting better motivates people to act more generously, resulting in evidence that things are in fact getting better thus locking further generosity.

----------------

In the next part, I’ll look at what evolution means in Press&Dyson and Stewart&Plotkin and share the results of my own very simple modelling, modelling which indicates that generosity may in fact be the best strategy for evolving populations.

No comments:

Post a Comment

Feel free to comment, but play nicely!

Sadly, the unremitting attention of a spambot means you may have to verify your humanity.