Advice on Convergence Rates of Dynamic Games to (Nash) Equilibriums

I’m a PhD student and spend most of my time at the intersection of econometrics and machine learning.  As a side project, I have been thinking about the problem of the convergence rate of dynamic games to equilibrium.  Obviously, this is a more of an algorithmic game theory question than economic game theory, but seeing as I don’t know anyone in the former field, I might as well try asking people in my actual field.  I did discuss this with people at school, but again I couldn't find much in this area in the econ department and hoping a wider net may yield better results.

There is a class of algorithms based on the policy gradient algorithm in reinforcement learning like Trust Region Policy Optimization (TRPO) and natural gradients policy optimization.  These algorithms are attractive because they have sub-linear convergence rates (as a function of distance).   For example see this paper: https://openreview.net/pdf?id=rJl3S2A9t7 for a proof of logarithmic convergence of this algorithm to solve arbitrary dynamic optimization problems (eq 8 for deterministic and eq 9 for stochastic).  One good way to get the intuition for the log convergence rate is to think about value function iteration., which would be logarithmic as a function of distance if one didn’t have to worry about the grid.  These algorithms use a functional approximator to the policy function and not a grid and are able to keep the logarithmic guarantees.

I assumed agents in a dynamic game were optimizing according to these algorithms.  Then for example, given the strategies of every other player, you can find your best response in a logarithmic number of actions (as a function of how far away you are from that optimum).  Thus, every dynamic game can be reduced to a static game where you know your best response to everyone else's strategies in logarithmic time.  Thus if the reduced static game can be solved in polynomial time, then so can the dynamic game.

An application of this is to solve dynamic and even stochastic potential games in logarithmic time by using the algorithm to find the global maximum of the potential function, taking everyone’s actions as controls and the environment as states.  Then trivially this proves logarithmic convergence to Nash equilibrium of dynamic potential games.  This was proved recently here: https://arxiv.org/pdf/1707.06465.pdf see section 6.   Also, you can apply this to zero-sum games, where you can solve the max-min formulation in log^2 time (I think).  I am not very knowledgeable about game theory and wondering if people in the know can provide more sophisticated insight into this problem.  Is this potentially useful?  Has this been thought of before?  Are there other dynamic games this can provide convergence rate guarantees for?  Thanks for your help,

Cameron