Monday, April 30, 2018

Where Game Theory Optimal (GTO) Fails

I realize this will be a controversial post among poker players but it's a discussion I haven't seen around the topic that I believe would benefit everyone.

Disclaimer: I'm not a game theory expert nor do I possess expertise in the vast majority of its related subject matter.

First what is game theory?

Taken from Wikipedia:

"The study of mathematical models of conflict and cooperation between intelligent and rational decision makers."

And what is game theory optimal or the commonly used poker expression GTO? - Several definitions are listed below to illustrate some of the confusion surrounding this phrase.

This isn't so clear.  The origins of the phrase itself aren't clear either.  However, it seems to have originated from poker.  Google Trends shows interest in the phrase peaked in December of 2004, not long after Chris MoneyMaker won the main event of the World Series and the subsequent poker boom.

Here's one of the first definitions given from a "What is GTO poker?" Google search.   Taken from a PokerNews article:

"GTO stands for "game theory optimal".  In poker, this term gets thrown around to signal a few different concepts.  It refers to thoughts about opponent modeling, and thinking about poker situations in terms of ranges and probabilities, as opposed to being strictly results oriented."

Still isn't clear:

Here's another definition taken from a different article:

"It refers to a decision in some particular situation for which an opponent cannot make a profitable counter."

A little clearer...

The best definition I could find, and also the earliest I came across, comes from a 2003 University of Alberta research paper titled, "Approximating Game-Theoretic Optimal Strategies for Full-scale Poker."

http://poker.cs.ualberta.ca/publications/IJCAI03.pdf

It reads, "Of particular interest is the existence of optimal solutions, or Nash equilibria.  An optimal solution provides a randomized strategy, basically a recipe of how to play in each possible situation.  Using this strategy ensures an agent will obtain at the least the game-theoretic value of the game regardless of the opponent's strategy."

Much clearer and seems to be a good solution for solving poker games until the next sentence.

"Unfortunately finding exact optimal solutions is limited to relatively small problem sizes, and is not practical for most domains."

Although this was written in 2003 and significant progress has been made in overcoming the lack of computing power, it remains today as the primary obstacle in solving poker games.

And still today, Nash equilibrium is the logic powering GTO along with two of the most popular GTO solver programs, PioSolver and GTORangeBuilder.

In the developer's words: Piosolver, "calculates optimal strategies, exact value and plays for every situation."  GTORB goes as far as to say, "It's the holy grail of poker."

But do they?  And is it?

A research paper written by the University of Alberta in January of 2017 titled, "Equilibrium Approximation Quality of Current No-Limit Poker Bots" may cast some doubt on the above claims:

http://webdocs.cs.ualberta.ca/~games/poker/publications/aaai17ws-lisy-lbr.pdf

Here's the last sentence taken from the Abstract of that paper that refers to a method UAB developed to evaluate the quality of current bots.

"Using this method, we show that existing poker-playing programs, based on solving abstract games, are remarkably poor Nash equilibrium approximations."

The paper looked at bots that competed in the 2016 Annual Computer Poker Competition (ACPC).  In their words, "These bots are developed by top research teams, use principled AI approaches, and the techniques they use are to large extent well documented."

One of the techniques all bots use is something called abstraction.  Because there isn't enough computing power to handle all the possible permutations such as the flop, turn, river cards, betting sequences, stack sizes, bet sizes, etc. (more on this later), similar things are lumped or bucketed together to put it in the simplest of terms.

However this comes at the cost of accuracy.  And this essentially is what the paper attempts to measure.  The conclusion of this paper... well is a bit shocking:

"Using this method we show that existing poker bots, including the second and the third best performing bots in the ACPC in 2016, all have exploitability substantially larger than folding all hands.  The bots that use card abstraction are losing over 3 big blinds per hand on average against their worst case opponent.  Exploitability can be reduced by not using card abstraction, but that necessarily leads to using a very sparse betting abstraction, which can be heavily exploited as well.  Therefore, we assume that a substantial paradigm shift is necessary to create bots that would closely approximate equilibrium in full no-limit Texas hold'em."

I should stop here and be clear that current online GTO programs like PioSolver and GTORB were not included in this study.  However, to the best of my understanding these programs use levels of abstraction to varying degrees.  And as mentioned above, abstraction leads to reduced accuracy.

If I were a user of these programs, understanding the degree or extent of this accuracy loss would be of concern.  One of many concerns.

Let's take a look at what Nash equilibrium is since this is the engine powering the intelligence behind these programs.


"It's a solution concept of a non-cooperative game involving two or more players in which each player is assumed to know equilibrium strategies of the other players, and no player has anything to gain by changing only their own strategy."

Now that we have some, albeit limited understanding, let's dive into some of my major issues with GTO itself.  We don't have to go further than the definition of Nash equilibrium:

"It's a solution concept of a non-cooperative game involving two or more players in which each player is assumed to know equilibrium strategies of the other players."

The previously linked 2003 UAB research paper also alludes to this assumption when referring to a GTO player:

"An implicit assumption is that the opponent is also playing optimally, and nothing can be gained by observing that opponent for patterns or weaknesses."

When we, human beings that is, play poker, we absolutely can not assume that the other players know and/or are implementing equilibrium strategies.  We can in fact assume the opposite, that no one is playing GTO.  So from the start we're relying on a strategy that's predicated on a false assumption.

The linked Nash equilibrium page from above continues with something called "Occurence":

According to Nash equilibrium if the following conditions are met, then we should adopt the NE strategy.

Sufficient conditions to guarantee that the Nash equilibrium are played are:

     1.  The players all will do their utmost to maximize their expected payoff as described by the 
          game.
     2.  The players are flawless in execution.
     3.  The players have sufficient intelligence to deduce the solution
     4.  The players know the planned equilibrium of all other players
     5.  The players believe that a deviation in their own strategy will not cause deviations by any               other players.
     6.  There is common knowledge that all players meet these conditions, including this one.
          So not only must each player know the other players meet the conditions, but they must
          know that they all know that they meet them, and know that they know that they meet
          them and so on.

Do any of these, much less all of them, ring true in the games you're playing in?  These are the conditions or restraints set forth as to when Nash equilibrium should be adopted.

At this point if you're a GTO proponent you may be saying to yourself, "But we still have a strategy that can't be exploited by another opponent."  I won't argue the validity of this statement except to say I'd be willing to bet it's false due to the aforementioned abstraction issues alone.

Notwithstanding, let's take on the assumption we do have a strategy that can't be exploited by another opponent.  As poker players, our goal is to make the most money possible.  The goal is NOT "to not lose money."

We should set out to maximize the expected value of every decision within the context of all our decisions.  Not to strictly ensure we always have some non-zero positive expected value.  If we only care about positive expected value as GTO does, then it's a virtual certainty we'll fail to maximize that expected value.

There was a recent Twitter poll offered by Olivier Busquet that shows just how much confusion there is around GTO and the above statements.

https://mobile.twitter.com/olivierbusquet/status/966351748028862471

He asks, "If a perfect GTO bot played only live tournaments 25K entry or higher it would be:

-Far and away the best
-Marginally the best
-Among the elite
-A winner but not elite

At the time of this writing the results were 8071 votes with the choices receiving 32%, 17%, 23%, 29% respectively.  That's about as statistically insignificant of a result from a 4 question poll as we'll ever see.

To be fair to the participants, there's inherent confusion with the question itself.  In order to judge whether someone, or something in this case, would be the "best" or "elite", we need to define those terms.  Does "best" mean the person who makes the most money?  Or does best mean that player that is most skilled assuming that could be measured somehow?  Does it mean something else?

Personally, I'd define "best" as having the highest positive monetary expectation.

I do not believe the perfect GTO bot would make the most money.  It would make some money because it would never make a decision that resulted in negative expected value.  But, it wouldn't make the most money because it wouldn't fully exploit the errors of its opponents.  And I'd argue even at this level of play, sizable mistakes are being made with meaningful frequency.

David Sklansky, noted poker author and creator of Twoplustwo poker forums is on record saying the following about Cepheus, the bot that "solved" heads up fixed limit hold'em:

"If the computer is playing a bad player, it will win but it won't win as quickly as a human being playing a bad player."  He then goes on to say, "I will destroy that beginner to a greater degree than this computer program will."

A perfect GTO bot doesn't care about the size of your mistakes.  It by definition assumes it's playing against a group of players that are also playing an unexploitable strategy and therefore plays its unexploitable strategy in response.  Once it's established (incorrectly) the other players are playing flawlessly, Nash equilibrium is adopted, and the other players become irrelevant in a sense.

Again we're back to the issue of the false assumption that our opponents are playing perfectly.

Where the crux of the debate generated from Olivier's poll question lies is in whether best exploitive human being extracts more money from its opponents than the best unexploitable robot extracts from those same opponents.  

There really are two ends of the spectrum here that I think just about everyone would agree on:

1.  A human being will extract more money from a beginner than a robot
2.  A robot will extract more money from an expert player than a human being

So if you agree with the above, there's some unknown place representative of our opponents collective skill level that lies between "beginner" and "expert" that answers Olivier's question.

I have no indisputable proof to answer this question.  I do have a strong opinion after playing over six million hands online.  Tens of thousands of these were played against robots with GTO aspirations.  And hundreds of thousands more against human beings that fell more into the GTO spectrum than the exploitive.  

I think you can guess my opinion.  Perhaps unsurprisingly though, these GTO based opponents were the toughest to play against.  However, they weren't the big winners in the game as evidenced by results on PokerTableRatings and to a lesser extent, the results in my own database.

It's worth noting, we have no idea how many players failed attempting to adopt a GTO strategy relative to how many failed in an attempt to adopt an exploitive strategy.  In other words there's inherent selection bias in only looking at the results of players that amassed a high volume of hands.  And to some degree all players are always playing, or at least striving, to implement some mixture of approximate GTO (balance) and exploitive play.  As it's impossible for a human being to play "perfect" poker.

Let's put all this aside for a moment to discuss what I believe is the most compelling argument against GTO.  Let's assume we don't care about using a strategy that doesn't meet the requirements of the game it was designed for.  Let's also assume we don't care about maximizing expected value but do care about having non-zero positive expected value.

Let's also assume that (insert your favorite GTO program here) *accurately* provides these strategies and they do meet the Nash Equilibrium criteria and have positive expected value.

Note: Accurately is starred above because I'm not sure if there has been third party testing to ensure the accuracy of the output of poker GTO programs on the market.  If I did use them, this would another concern in addition to the aforementioned abstraction issues.

Given these assumptions, how as a human being do we plan on implementing these strategies?

Let's hypothetically take something as seemingly simple as whether we should open T6 offsuit from the button playing limit Hold'em.  We launch our program and input the assumed range of our perfect playing opponent.  It responds by recommending T6o as a profitable open.  So we dutifully open it from the button from that point forward.

There are a host of problems with this even after setting the aforementioned issues aside.

We don't get dealt T6 offsuit every hand, it's one of many hands that get dealt to us.  We always have to think of this hand in the context of our range of opening hands as the program does.  So maybe you're thinking, "Not a problem, I can memorize the actions of every preflop hand that program suggests."

But can you also memorize the ~9 possible betting sequences to take T6o on each of the 17926 flops, 45 turns, and 44 rivers that went into the program calculating the profitability of opening that hand?  If my math is right, that's 25,874,746,920 iterations.  Yes 25 billion.  And remember this is only for T6o offsuit.

Here's the calculation for all hands taken from  https://arxiv.org/pdf/1302.7008.pdf






















That's 319 trillion.  If every person on earth today memorized 10,000 of these possible permutations, we still wouldn't come close to committing them to our collective consciousness.

Essentially what we're seeing when looking at the output of one of these programs is a microscopic view of a drop of water derived from a vast ocean.  We're assuming because we can see say the end result of a shooting star, that we can see the entire universe.  We can't see it, much less understand it.

This short discussion taken from a 2+2 thread I stumbled across a few days ago illustrates the point:

 https://forumserver.twoplustwo.com/53/mid-high-stakes-limit/2018-nc-lc-thread-we-ever-going-get-title-1700128/index2.html

The poster states he's "super confused" when playing against Cepheus.  Cepheus defends Js3s against a button open playing heads up.  The flop is Ac 8h 4s and Cepheus check/calls.

The poster, "can't see how that's possibly in his range" referring to the flop call.

Another poster chimes in with some good information.  "Looks like the bot continues 100% on this flop, check raises 18% with the backdoor outs as a bluff.  When I switch the 4s to the 4h it moves it to a 100% fold.  If I change the Ace to Kc, it again continues 100%, check raising 15% as a bluff."

The poster then points out that Cepheus is check raising 100% of the time on this flop when it holds Ks3s and is understandably surprised at this.

Another poster responds to the fact he's check raising Ks3s 100% and says, "K3s is not a hand you really want to call 3 times with.  Yes better to raise for thin value!"

A simple check of an equity calculator shows this isn't a "value raise" even with the most liberal of assumptions to support the claim.

I could take this haphazard assignment of reason further and surmise Cepheus is combining his equity (derived from the high card value, pair outs, and backdoor draws) with his opponent's potential fold equity gained by raising.  Maybe the 5c comes on the turn and Cepheus gets his opponent to fold 22 or 33 as one example.

The reality is, we have little idea why Cepheus is making these plays and as to why with these specific frequencies.  Sure we can evaluate the strength of his hand in relation to the board and his opponent's range by observing things like he has a backdoor flush draw, backdoor straight draw, and some high card value.  

But why are we check raising Js3s 18% of the time and Ks3s 100% of the time?  Why not 14% and 91%?  Why are we check raising or calling at all?  Why aren't we leading?  Why didn't we 3Bet preflop?  Maybe we did 3Bet preflop 72% of the time an are unknowingly looking at the other 28%.  The questions are numerous and unanswerable.

Only Cepheus who has examined every possible turn and river card in addition to every conceivable betting sequence hammered out via trillions of hands against another presumed omniscient opponent "understands" why it's doing that.

I have to reiterate this point because it's an important one.  This is just one hand in a range of hands.  In fact, it's possible to make plays with negative expected value in a vacuum that increase the overall profitability of all our hands in the same situation.

When I refer to it being "just one hand in a range of hands", I'm talking about this:

Not only all the hands we have in this situation. "Situation" being defined as our range that calls versus a button open.  But, also to all the different action sequences that can arise on the flop, turn, and river.  And this is in addition to all the flop, turn, and river cards themselves.

For example, we'd like to have hands that can bet/3-bet, bet/call, bet/fold, check/call, check raise, check raise/fold, check raise/4-bet, check/fold, etc. etc. on the flop.  And not only the flop, but the turn and river as well.  So we're targeting balance within all the conceivable action sequences on all the conceivable turn and river cards within the context of our entire range.  Even this paragraph is an oversimplification of the immense complexity involved.

You can view a range of hands like the instruments in an orchestra, all working in harmony to produce a beautiful melody.  To take one hand like Js3s out of a range on the flop might be analogous to walking over to the flute player, listening to he or she play a perfect note or two, and concluding you're ready to conduct the symphony.

The answer as to why Cepheus is doing this is unknowable for a human being.  This can be proven at the most basic level because Cepheus is making these plays in response another assumed perfect playing opponent.  To even begin investigating the true reasons behind these plays, we'd have to fully understand how its opponent is playing.

I'd like to end on a bit of a positive note towards GTO.  I'm not opposed to the idea of GTO itself and we should all strive to have a better understanding of it.  In simple terms, we want a firm grasp of balance particularly against more skilled players and in more common situations that arise at a poker table.  And examining GTO programs can certainly help point us in the right direction, particularly in seeking answers to more theoretical questions.  Though I think there are simpler and more effective ways.

I look at balance like the volume knob on a stereo.  The better my opponent and/or the more common the situation, the higher I'll dial the balance knob up. The worse my opponent or less common the situation, the lower the volume.

Granted the volume on my stereo doesn't go nearly as high as say Cepheus' but neither does my opponents.

-Tony Pirone
TPirahna
----------------------------------------------------------------------------------------------
I will make a post in the near future that goes into a lot more detail surrounding
balance and the practical implementation of it.




No comments:

Post a Comment