Computing Good Nash Equilibria in Graphical Games ∗ Edith Elkind Hebrew University of Jerusalem, Israel, and University of Southampton, Southampton, SO17 1BJ, U.K. Leslie Ann Goldberg University of Liverpool Liverpool L69 3BX, U.K. Paul Goldberg University of Liverpool Liverpool L69 3BX, U.K. ABSTRACT This paper addresses the problem of fair equilibrium selection in graphical games. Our approach is based on the data structure called the best response policy, which was proposed by Kearns et al. [13] as a way to represent all Nash equilibria of a graphical game. In [9], it was shown that the best response policy has polynomial size as long as the underlying graph is a path. In this paper, we show that if the underlying graph is a bounded-degree tree and the best response policy has polynomial size then there is an efficient algorithm which constructs a Nash equilibrium that guarantees certain payoffs to all participants. Another attractive solution concept is a Nash equilibrium that maximizes the social welfare. We show that, while exactly computing the latter is infeasible (we prove that solving this problem may involve algebraic numbers of an arbitrarily high degree), there exists an FPTAS for finding such an equilibrium as long as the best response policy has polynomial size. These two algorithms can be combined to produce Nash equilibria that satisfy various fairness criteria. Categories and Subject Descriptors F.2 [Theory of Computation]: Analysis of Algorithms and Problem Complexity; J.4 [Computer Applications]: Social and Behavioral Sciences-economics 1. INTRODUCTION In a large community of agents, an agent"s behavior is not likely to have a direct effect on most other agents: rather, it is just the agents who are close enough to him that will be affected. However, as these agents respond by adapting their behavior, more agents will feel the consequences and eventually the choices made by a single agent will propagate throughout the entire community. This is the intuition behind graphical games, which were introduced by Kearns, Littman and Singh in [13] as a compact representation scheme for games with many players. In an n-player graphical game, each player is associated with a vertex of an underlying graph G, and the payoffs of each player depend on his action as well as on the actions of his neighbors in the graph. If the maximum degree of G is Δ, and each player has two actions available to him, then the game can be represented using n2Δ+1 numbers. In contrast, we need n2n numbers to represent a general n-player 2-action game, which is only practical for small values of n. For graphical games with constant Δ, the size of the game is linear in n. One of the most natural problems for a graphical game is that of finding a Nash equilibrium, the existence of which follows from Nash"s celebrated theorem (as graphical games are just a special case of n-player games). The first attempt to tackle this problem was made in [13], where the authors consider graphical games with two actions per player in which the underlying graph is a boundeddegree tree. They propose a generic algorithm for finding Nash equilibria that can be specialized in two ways: an exponential-time algorithm for finding an (exact) Nash equilibrium, and a fully polynomial time approximation scheme (FPTAS) for finding an approximation to a Nash equilibrium. For any > 0 this algorithm outputs an -Nash equilibrium, which is a strategy profile in which no player can improve his payoff by more than by unilaterally changing his strategy. While -Nash equilibria are often easier to compute than exact Nash equilibria, this solution concept has several drawbacks. First, the players may be sensitive to a small loss in payoffs, so the strategy profile that is an -Nash equilibrium will not be stable. This will be the case even if there is only a small subset of players who are extremely price-sensitive, and for a large population of players it may be difficult to choose a value of that will satisfy everyone. Second, the strategy profiles that are close to being Nash equilibria may be much better with respect to the properties under consideration than exact Nash equilibria. Therefore, the (approximation to the) value of the best solution that corresponds to an -Nash equilibrium may not be indicative of what can be achieved under an exact Nash equilibrium. This is especially important if the purpose of the approximate solution is to provide a good benchmark for a system of selfish agents, as the benchmark implied by an -Nash equilibrium may be unrealistic. For these reasons, in this paper we focus on the problem of computing exact Nash equilibria. Building on ideas of [14], Elkind et al. [9] showed how to find an (exact) Nash equilibrium in polynomial time when the underlying 162 graph has degree 2 (that is, when the graph is a collection of paths and cycles). By contrast, finding a Nash equilibrium in a general degree-bounded graph appears to be computationally intractable: it has been shown (see [5, 12, 7]) to be complete for the complexity class PPAD. [9] extends this hardness result to the case in which the underlying graph has bounded pathwidth. A graphical game may not have a unique Nash equilibrium, indeed it may have exponentially many. Moreover, some Nash equilibria are more desirable than others. Rather than having an algorithm which merely finds some Nash equilibrium, we would like to have algorithms for finding Nash equilibria with various sociallydesirable properties, such as maximizing overall payoff or distributing profit fairly. A useful property of the data structure of [13] is that it simultaneously represents the set of all Nash equilibria of the underlying game. If this representation has polynomial size (as is the case for paths, as shown in [9]), one may hope to extract from it a Nash equilibrium with the desired properties. In fact, in [13] the authors mention that this is indeed possible if one is interested in finding an (approximate) -Nash equilibrium. The goal of this paper is to extend this to exact Nash equilibria. 1.1 Our Results In this paper, we study n-player 2-action graphical games on bounded-degree trees for which the data structure of [13] has size poly(n). We focus on the problem of finding exact Nash equilibria with certain socially-desirable properties. In particular, we show how to find a Nash equilibrium that (nearly) maximizes the social welfare, i.e., the sum of the players" payoffs, and we show how to find a Nash equilibrium that (nearly) satisfies prescribed payoff bounds for all players. Graphical games on bounded-degree trees have a simple algebraic structure. One attractive feature, which follows from [13], is that every such game has a Nash equilibrium in which the strategy of every player is a rational number. Section 3 studies the algebraic structure of those Nash equilibria that maximize social welfare. We show (Theorems 1 and 2) that, surprisingly, the set of Nash equilibria that maximize social welfare is more complex. In fact, for any algebraic number α ∈ [0, 1] with degree at most n, we exhibit a graphical game on a path of length O(n) such that, in the unique social welfare-maximizing Nash equilibrium of this game, one of the players plays the mixed strategy α.1 This result shows that it may be difficult to represent an optimal Nash equilibrium. It seems to be a novel feature of the setting we consider here, that an optimal Nash equilibrium is hard to represent, in a situation where it is easy to find and represent a Nash equilibrium. As the social welfare-maximizing Nash equilibrium may be hard to represent efficiently, we have to settle for an approximation. However, the crucial difference between our approach and that of previous papers [13, 16, 19] is that we require our algorithm to output an exact Nash equilibrium, though not necessarily the optimal one with respect to our criteria. In Section 4, we describe an algorithm that satisfies this requirement. Namely, we propose an algorithm that for any > 0 finds a Nash equilibrium whose total payoff is within of optimal. It runs in polynomial time (Theorem 3,4) for any graphical game on a bounded-degree tree for which the data structure proposed by [13] (the so-called best response policy, defined below) is of size poly(n) (note that, as shown in [9], this is always the case when the underlying graph is a path). More pre1 A related result in a different context was obtained by Datta [8], who shows that n-player 2-action games are universal in the sense that any real algebraic variety can be represented as the set of totally mixed Nash equilibria of such games. cisely, the running time of our algorithm is polynomial in n, Pmax, and 1/ , where Pmax is the maximum absolute value of an entry of a payoff matrix, i.e., it is a pseudopolynomial algorithm, though it is fully polynomial with respect to . We show (Section 4.1) that under some restrictions on the payoff matrices, the algorithm can be transformed into a (truly) polynomial-time algorithm that outputs a Nash equilibrium whose total payoff is within a 1 − factor from the optimal. In Section 5, we consider the problem of finding a Nash equilibrium in which the expected payoff of each player Vi exceeds a prescribed threshold Ti. Using the idea from Section 4 we give (Theorem 5) a fully polynomial time approximation scheme for this problem. The running time of the algorithm is bounded by a polynomial in n, Pmax, and . If the instance has a Nash equilibrium satisfying the prescribed thresholds then the algorithm constructs a Nash equilibrium in which the expected payoff of each player Vi is at least Ti − . In Section 6, we introduce other natural criteria for selecting a good Nash equilibrium and we show that the algorithms described in the two previous sections can be used as building blocks in finding Nash equilibria that satisfy these criteria. In particular, in Section 6.1 we show how to find a Nash equilibrium that approximates the maximum social welfare, while guaranteeing that each individual payoff is close to a prescribed threshold. In Section 6.2 we show how to find a Nash equilibrium that (nearly) maximizes the minimum individual payoff. Finally, in Section 6.3 we show how to find a Nash equilibrium in which the individual payoffs of the players are close to each other. 1.2 Related Work Our approximation scheme (Theorem 3 and Theorem 4) shows a contrast between the games that we study and two-player n-action games, for which the corresponding problems are usually intractable. For two-player n-action games, the problem of finding Nash equilibria with special properties is typically NP-hard. In particular, this is the case for Nash equilibria that maximize the social welfare [11, 6]. Moreover, it is likely to be intractable even to approximate such equilibria. In particular, Chen, Deng and Teng [4] show that there exists some , inverse polynomial in n, for which computing an -Nash equilibrium in 2-player games with n actions per player is PPAD-complete. Lipton and Markakis [15] study the algebraic properties of Nash equilibria, and point out that standard quantifier elimination algorithms can be used to solve them. Note that these algorithms are not polynomial-time in general. The games we study in this paper have polynomial-time computable Nash equilibria in which all mixed strategies are rational numbers, but an optimal Nash equilibrium may necessarily include mixed strategies with high algebraic degree. A correlated equilibrium (CE) (introduced by Aumann [2]) is a distribution over vectors of players" actions with the property that if any player is told his own action (the value of his own component) from a vector generated by that distribution, then he cannot increase his expected payoff by changing his action. Any Nash equilibrium is a CE but the converse does not hold in general. In contrast with Nash equilibria, correlated equilibria can be found for low-degree graphical games (as well as other classes of conciselyrepresented multiplayer games) in polynomial time [17]. But, for graphical games it is NP-hard to find a correlated equilibrium that maximizes total payoff [18]. However, the NP-hardness results apply to more general games than the one we consider here, in particular the graphs are not trees. From [2] it is also known that there exist 2-player, 2-action games for which the expected total payoff 163 of the best correlated equilibrium is higher than the best Nash equilibrium, and we discuss this issue further in Section 7. 2. PRELIMINARIES AND NOTATION We consider graphical games in which the underlying graph G is an n-vertex tree, in which each vertex has at most Δ children. Each vertex has two actions, which are denoted by 0 and 1. A mixed strategy of a player V is represented as a single number v ∈ [0, 1], which denotes the probability that V selects action 1. For the purposes of the algorithm, the tree is rooted arbitrarily. For convenience, we assume without loss of generality that the root has a single child, and that its payoff is independent of the action chosen by the child. This can be achieved by first choosing an arbitrary root of the tree, and then adding a dummy parent of this root, giving the new parent a constant payoff function, e.g., 0. Given an edge (V, W ) of the tree G, and a mixed strategy w for W , let G(V,W ),W =w be the instance obtained from G by (1) deleting all nodes Z which are separated from V by W (i.e., all nodes Z such that the path from Z to V passes through W ), and (2) restricting the instance so that W is required to play mixed strategy w. Definition 1. Suppose that (V, W ) is an edge of the tree, that v is a mixed strategy for V and that w is a mixed strategy for W . We say that v is a potential best response to w (denoted by v ∈ pbrV (w)) if there is an equilibrium in the instance G(V,W ),W =w in which V has mixed strategy v. We define the best response policy for V , given W , as B(W, V ) = {(w, v) | v ∈ pbrV (w), w ∈ [0, 1]}. The upstream pass of the generic algorithm of [13] considers every node V (other than the root) and computes the best response policy for V given its parent. With the above assumptions about the root, the downstream pass is straightforward. The root selects a mixed strategy w for the root W and a mixed strategy v ∈ B(W, V ) for each child V of W . It instructs each child V to play v. The remainder of the downward pass is recursive. When a node V is instructed by its parent to adopt mixed strategy v, it does the following for each child U - It finds a pair (v, u) ∈ B(V, U) (with the same v value that it was given by its parent) and instructs U to play u. The best response policy for a vertex U given its parent V can be represented as a union of rectangles, where a rectangle is defined by a pair of closed intervals (IV , IU ) and consists of all points in IV × IU ; it may be the case that one or both of the intervals IV and IU consists of a single point. In order to perform computations on B(V, U), and to bound the number of rectangles, [9] used the notion of an event point, which is defined as follows. For any set A ⊆ [0, 1]2 that is represented as a union of a finite number of rectangles, we say that a point u ∈ [0, 1] on the U-axis is a Uevent point of A if u = 0 or u = 1 or the representation of A contains a rectangle of the form IV × IU and u is an endpoint of IU ; V -event points are defined similarly. For many games considered in this paper, the underlying graph is an n-vertex path, i.e., a graph G = (V, E) with V = {V1, . . . , Vn} and E = {(V1, V2), . . . , (Vn−1, Vn)}. In [9], it was shown that for such games, the best response policy has only polynomially-many rectangles. The proof that the number of rectangles in B(Vj+1, Vj) is polynomial proceeds by first showing that the number of event points in B(Vj+1, Vj ) cannot exceed the number of event points in B(Vj, Vj−1) by more than 2, and using this fact to bound the number of rectangles in B(Vj+1, Vj ). Let P0 (V ) and P1 (V ) be the expected payoffs to V when it plays 0 and 1, respectively. Both P0 (V ) and P1 (V ) are multilinear functions of the strategies of V "s neighbors. In what follows, we will frequently use the following simple observation. CLAIM 1. For a vertex V with a single child U and parent W , given any A, B, C, D ∈ Q, A , B , C , D ∈ Q, one can select the payoffs to V so that P0 (V ) = Auw + Bu + Cw + D, P1 (V ) = A uw + B u + C w + D . Moreover, if all A, B, C, D, A , B , C , D are integer, the payoffs to V are integer as well. PROOF. We will give the proof for P0 (V ); the proof for P1 (V ) is similar. For i, j = 0, 1, let Pij be the payoff to V when U plays i, V plays 0 and W plays j. We have P0 (V ) = P00(1 − u)(1 − w) + P10u(1 − w) + P01(1 − u)w + P11uw. We have to select the values of Pij so that P00 − P10 − P01 + P11 = A, −P00 + P10 = B, −P00 + P01 = C, P00 = D. It is easy to see that the unique solution is given by P00 = D, P01 = C + D, P10 = B + D, P11 = A + B + C + D. The input to all algorithms considered in this paper includes the payoff matrices for each player. We assume that all elements of these matrices are integer. Let Pmax be the greatest absolute value of any element of any payoff matrix. Then the input consists of at most n2Δ+1 numbers, each of which can be represented using log Pmax bits. 3. NASH EQUILIBRIA THAT MAXIMIZE THE SOCIAL WELFARE: SOLUTIONS IN R \ Q From the point of view of social welfare, the best Nash equilibrium is the one that maximizes the sum of the players" expected payoffs. Unfortunately, it turns out that computing such a strategy profile exactly is not possible: in this section, we show that even if all players" payoffs are integers, the strategy profile that maximizes the total payoff may have irrational coordinates; moreover, it may involve algebraic numbers of an arbitrary degree. 3.1 Warm-up: quadratic irrationalities We start by providing an example of a graphical game on a path of length 3 with integer payoffs such that in the Nash equilibrium that maximizes the total payoff, one of the players has a strategy in R \ Q. In the next subsection, we will extend this example to algebraic numbers of arbitrary degree n; to do so, we have to consider paths of length O(n). THEOREM 1. There exists an integer-payoff graphical game G on a 3-vertex path UV W such that, in any Nash equilibrium of G that maximizes social welfare, the strategy, u, of the player U and the total payoff, p, satisfy u, p ∈ R \ Q. PROOF. The payoffs to the players in G are specified as follows. The payoff to U is identically 0, i.e., P0 (U) = P1 (U) = 0. Using Claim 1, we select the payoffs to V so that P0 (V ) = −uw + 3w and P1 (V ) = P0 (V ) + w(u + 2) − (u + 1), where u and w are the (mixed) strategies of U and W , respectively. It follows that V is indifferent between playing 0 and 1 if and only if w = f(u) = u+1 u+2 . Observe that for any u ∈ [0, 1] we have f(u) ∈ [0, 1]. The payoff to W is 0 if it selects the same action as V and 1 otherwise. CLAIM 2. All Nash equilibria of the game G are of the form (u, 1/2, f(u)). That is, in any Nash equilibrium, V plays v = 1/2 and W plays w = f(u). Moreover, for any value of u, the vector of strategies (u, 1/2, f(u)) constitutes a Nash equilibrium. PROOF. It is easy to check that for any u ∈ [0, 1], the vector (u, 1/2, f(u)) is a Nash equilibrium. Indeed, U is content to play 164 any mixed strategy u no matter what V and W do. Furthermore, V is indifferent between 0 and 1 as long as w = f(u), so it can play 1/2. Finally, if V plays 0 and 1 with equal probability, W is indifferent between 0 and 1, so it can play f(u). Conversely, suppose that v > 1/2. Then W strictly prefers to play 0, i.e., w = 0. Then for V we have P1 (V ) = P0 (V ) − (u + 1), i.e., P1 (V ) < P0 (V ), which implies v = 0, a contradiction. Similarly, if v < 1/2, player W prefers to play 1, so we have w = 1. Hence, P1 (V ) = P0 (V ) + (u + 2) − (u + 1), i.e., P1 (V ) > P0 (V ), which implies v = 1, a contradiction. Finally, if v = 1/2, but w = f(u), player V is not indifferent between 0 and 1, so he would deviate from playing 1/2. This completes the proof of Claim 2. By Claim 2, the total payoff in any Nash equilibrium of this game is a function of u. More specifically, the payoff to U is 0, the payoff to V is −uf(u) + 3f(u), and the payoff to W is 1/2. Therefore, the Nash equilibrium with the maximum total payoff corresponds to the value of u that maximizes g(u) = −u (u + 1) u + 2 + 3 u + 1 u + 2 = − (u − 3)(u + 1) u + 2 . To find extrema of g(u), we compute h(u) = − d du g(u). We have h(u) = (2u − 2)(u + 2) − (u − 3)(u + 1) (u + 2)2 = u2 + 4u − 1 (u + 2)2 . Hence, h(u) = 0 if and only if u ∈ {−2 + √ 5, −2 − √ 5}. Note that −2 + √ 5 ∈ [0, 1]. The function g(u) changes sign at −2, −1, and 3. We have g(u) < 0 for g > 3, g(u) > 0 for u < −2, so the extremum of g(u) that lies between 1 and 3, i.e., u = −2 + √ 5, is a local maximum. We conclude that the social welfare-maximizing Nash equilibrium for this game is given by the vector of strategies (−2+√ 5, 1/2, (5 − √ 5)/5). The respective total payoff is 0 − ( √ 5 − 5)( √ 5 − 1) √ 5 + 1 2 = 13/2 − 2 √ 5. This concludes the proof of Theorem 1. 3.2 Strategies of arbitrary degree We have shown that in the social welfare-maximizing Nash equilibrium, some players" strategies can be quadratic irrationalities, and so can the total payoff. In this subsection, we will extend this result to show that we can construct an integer-payoff graphical game on a path whose social welfare-maximizing Nash equilibrium involves arbitrary algebraic numbers in [0, 1]. THEOREM 2. For any degree-n algebraic number α ∈ [0, 1], there exists an integer payoff graphical game on a path of length O(n) such that, in all social welfare-maximizing Nash equilibria of this game, one of the players plays α. PROOF. Our proof consists of two steps. First, we construct a rational expression R(x) and a segment [x , x ] such that x , x ∈ Q and α is the only maximum of R(x) on [x , x ]. Second, we construct a graphical game whose Nash equilibria can be parameterized by u ∈ [x , x ], so that at the equilibrium that corresponds to u the total payoff is R(u) and, moreover, some player"s strategy is u. It follows that to achieve the payoff-maximizing Nash equilibrium, this player has to play α. The details follow. LEMMA 1. Given an algebraic number α ∈ [0, 1], deg(α) = n, there exist K2, . . . , K2n+2 ∈ Q and x , x ∈ (0, 1) ∩ Q such that α is the only maximum of R(x) = K2 x + 2 + · · · + K2n+2 x + 2n + 2 on [x , x ]. PROOF. Let P(x) be the minimal polynomial of α, i.e., a polynomial of degree n with rational coefficients whose leading coefficient is 1 such that P(α) = 0. Let A = {α1, . . . , αn} be the set of all roots of P(x). Consider the polynomial Q1(x) = −P2 (x). It has the same roots as P(x), and moreover, for any x ∈ A we have Q1(x) < 0. Hence, A is the set of all maxima of Q1(x). Now, set R(x) = Q1(x) (x+2)...(x+2n+1)(x+2n+2) . Observe that R(x) ≤ 0 for all x ∈ [0, 1] and R(x) = 0 if and only if Q1(x) = 0. Hence, the set A is also the set of all maxima of R(x) on [0, 1]. Let d = min{|αi − α| | αi ∈ A, αi = α}, and set α = max{α − d/2, 0}, α = min{α + d/2, 1}. Clearly, α is the only zero (and hence, the only maximum) of R(x) on [α , α ]. Let x and x be some rational numbers in (α , α) and (α, α ), respectively; note that by excluding the endpoints of the intervals we ensure that x , x = 0, 1. As [x , x ] ⊂ [α , α ], we have that α is the only maximum of R(x) on [x , x ]. As R(x) is a proper rational expression and all roots of its denominator are simple, by partial fraction decomposition theorem, R(x) can be represented as R(x) = K2 x + 2 + · · · + K2n+2 x + 2n + 2 , where K2, . . . , K2n+2 are rational numbers. Consider a graphical game on the path U−1V−1U0V0U1V1 . . . Uk−1Vk−1Uk, where k = 2n + 2. Intuitively, we want each triple (Ui−1, Vi−1, Ui) to behave similarly to the players U, V , and W from the game described in the previous subsection. More precisely, we define the payoffs to the players in the following way. • The payoff to U−1 is 0 no matter what everyone else does. • The expected payoff to V−1 is 0 if it plays 0 and u0 − (x − x )u−1 −x if it plays 1, where u0 and u−1 are the strategies of U0 and U−1, respectively. • The expected payoff to V0 is 0 if it plays 0 and u1(u0 + 1)− u0 if it plays 1, where u0 and u1 are the strategies of U0 and U1, respectively. • For each i = 1, . . . , k − 1, the expected payoff to Vi when it plays 0 is P0 (Vi) = Aiuiui+1 − Aiui+1, and the expected payoff to Vi when it plays 1 is P1 (Vi) = P0 (Vi) + ui+1(2 − ui) − 1, where Ai = −Ki+1 and ui+1 and ui are the strategies of Ui+1 and Ui, respectively. • For each i = 0, . . . , k, the payoff to Ui does not depend on Vi and is 1 if Ui and Vi−1 select different actions and 0 otherwise. We will now characterize the Nash equilibria of this game using a sequence of claims. CLAIM 3. In all Nash equilibria of this game V−1 plays 1/2, and the strategies of u−1 and u0 satisfy u0 = (x − x )u−1 + x . Consequently, in all Nash equilibria we have u0 ∈ [x , x ]. 165 PROOF. The proof is similar to that of Claim 2. Let f(u−1) = (x − x )u−1 + x . Clearly, the player V−1 is indifferent between playing 0 and 1 if and only if u0 = f(u−1). Suppose that v−1 < 1/2. Then U0 strictly prefers to play 1, i.e., u0 = 1, so we have P1 (V−1) = P0 (V−1) + 1 − (x − x )u−1 − x . As 1 − x ≤ 1 − (x − x )u−1 − x ≤ 1 − x for u−1 ∈ [0, 1] and x < 1, we have P1 (V−1) > P0 (V−1), so V−1 prefers to play 1, a contradiction. Similarly, if v−1 > 1/2, the player U0 strictly prefers to play 0, i.e., u0 = 0, so we have P1 (V−1) = P0 (V−1) − (x − x )u−1 − x . As x < x , x > 0, we have P1 (V−1) < P0 (V−1), so V−1 prefers to play 0, a contradiction. Finally, if V−1 plays 1/2, but u0 = f(u−1), player V−1 is not indifferent between 0 and 1, so he would deviate from playing 1/2. Also, note that f(0) = x , f(1) = x , and, moreover, f(u−1) ∈ [x , x ] if and only if u−1 ∈ [0, 1]. Hence, in all Nash equilibria of this game we have u0 ∈ [x , x ]. CLAIM 4. In all Nash equilibria of this game for each i = 0, . . . , k − 1, we have vi = 1/2, and the strategies of the players Ui and Ui+1 satisfy ui+1 = fi(ui), where f0(u) = u/(u + 1) and fi(u) = 1/(2 − u) for i > 0. PROOF. The proof of this claim is also similar to that of Claim 2. We use induction on i to prove that the statement of the claim is true and, additionally, ui = 1 for i > 0. For the base case i = 0, note that u0 = 0 by the previous claim (recall that x , x are selected so that x , x = 0, 1) and consider the triple (U0, V0, U1). Let v0 be the strategy of V0. First, suppose that v0 > 1/2. Then U1 strictly prefers to play 0, i.e., u1 = 0. Then for V0 we have P1 (V0) = P0 (V0) − u0. As u0 = 0, we have P1 (V0) < P0 (V0), which implies v1 = 0, a contradiction. Similarly, if v0 < 1/2, player U1 prefers to play 1, so we have u1 = 1. Hence, P1 (V0) = P0 (V0) + 1. It follows that P1 (V0) > P0 (V0), which implies v0 = 1, a contradiction. Finally, if v0 = 1/2, but u1 = u0/(u0 + 1), player V0 is not indifferent between 0 and 1, so he would deviate from playing 1/2. Moreover, as u1 = u0/(u0 + 1) and u0 ∈ [0, 1], we have u1 = 1. The argument for the inductive step is similar. Namely, suppose that the statement is proved for all i < i and consider the triple (Ui, Vi, Ui+1). Let vi be the strategy of Vi. First, suppose that vi > 1/2. Then Ui+1 strictly prefers to play 0, i.e., ui+1 = 0. Then for Vi we have P1 (Vi) = P0 (Vi)−1, i.e., P1 (Vi) < P0 (Vi), which implies vi = 0, a contradiction. Similarly, if vi < 1/2, player Ui+1 prefers to play 1, so we have ui+1 = 1. Hence, P1 (Vi) = P0 (Vi) + 1 − ui. By inductive hypothesis, we have ui < 1. Consequently, P1 (Vi) > P0 (Vi), which implies vi = 1, a contradiction. Finally, if vi = 1/2, but ui+1 = 1/(2 − ui), player Vi is not indifferent between 0 and 1, so he would deviate from playing 1/2. Moreover, as ui+1 = 1/(2 − ui) and ui < 1, we have ui+1 < 1. CLAIM 5. Any strategy profile of the form (u−1, 1/2, u0, 1/2, u1, 1/2, . . . , uk−1, 1/2, uk), where u−1 ∈ [0, 1], u0 = (x − x )u−1 + x , u1 = u0/(u0 + 1), and ui+1 = 1/(2 − ui) for i ≥ 1 constitutes a Nash equilibrium. PROOF. First, the player U−1"s payoffs do not depend on other players" actions, so he is free to play any strategy in [0, 1]. As long as u0 = (x −x )u−1 +x , player V−1 is indifferent between 0 and 1, so he is content to play 1/2; a similar argument applies to players V0, . . . , Vk−1. Finally, for each i = 0, . . . , k, the payoffs of player Ui only depend on the strategy of player Vi−1. In particular, as long as vi−1 = 1/2, player Ui is indifferent between playing 0 and 1, so he can play any mixed strategy ui ∈ [0, 1]. To complete the proof, note that (x − x )u−1 + x ∈ [0, 1] for all u−1 ∈ [0, 1], u0/(u0 + 1) ∈ [0, 1] for all u0 ∈ [0, 1], and 1/(2 − ui) ∈ [0, 1] for all ui ∈ [0, 1], so we have ui ∈ [0, 1] for all i = 0, . . . , k. Now, let us compute the total payoff under a strategy profile of the form given in Claim 5. The payoff to U−1 is 0, and the expected payoff to each of the Ui, i = 0, . . . , k, is 1/2. The expected payoffs to V−1 and V0 are 0. Finally, for any i = 1, . . . , k − 1, the expected payoff to Vi is Ti = Aiuiui+1 − Aiui+1. It follows that to find a Nash equilibrium with the highest total payoff, we have to maximize Pk−1 i=1 Ti subject to conditions u−1 ∈ [0, 1], u0 = (x −x )u−1+x , u1 = u0/(u0+1), and ui+1 = 1/(2−ui) for i = 1, . . . , k − 1. We would like to express Pk−1 i=1 Ti as a function of u0. To simplify notation, set u = u0. LEMMA 2. For i = 1, . . . , k, we have ui = u+i−1 u+i . PROOF. The proof is by induction on i. For i = 1, we have u1 = u/(u + 1). Now, for i ≥ 2 suppose that ui−1 = (u + i − 2)/(u + i − 1). We have ui = 1/(2 − ui−1) = (u + i − 1)/(2u + 2i − 2 − u − i + 2) = (u + i − 1)/(u + i). It follows that for i = 1, . . . , k − 1 we have Ti = Ai u + i − 1 u + i u + i u + i + 1 − Ai u + i u + i + 1 = −Ai 1 u + i + 1 = Ki+1 u + i + 1 . Observe that as u−1 varies from 0 to 1, u varies from x to x . Therefore, to maximize the total payoff, we have to choose u ∈ [x , x ] so as to maximize K2 u + 2 + · · · + Kk u + k = R(u). By construction, the only maximum of R(u) on [x , x ] is α. It follows that in the payoff-maximizing Nash equilibrium of our game U0 plays α. Finally, note that the payoffs in our game are rational rather than integer. However, it is easy to see that we can multiply all payoffs to a player by their greatest common denominator without affecting his strategy. In the resulting game, all payoffs are integer. This concludes the proof of Theorem 2. 4. APPROXIMATING THE SOCIALLY OPTIMAL NASH EQUILIBRIUM We have seen that the Nash equilibrium that maximizes the social welfare may involve strategies that are not in Q. Hence, in this section we focus on finding a Nash equilibrium that is almost optimal from the social welfare perspective. We propose an algorithm that for any > 0 finds a Nash equilibrium whose total payoff is within from optimal. The running time of this algorithm is polynomial in 1/ , n and |Pmax| (recall that Pmax is the maximum absolute value of an entry of a payoff matrix). While the negative result of the previous section is for graphical games on paths, our algorithm applies to a wider range of scenarios. Namely, it runs in polynomial time on bounded-degree trees 166 as long as the best response policy of each vertex, given its parent, can be represented as a union of a polynomial number of rectangles. Note that path graphs always satisfy this condition: in [9] we showed how to compute such a representation, given a graph with maximum degree 2. Consequently, for path graphs the running time of our algorithm is guaranteed to be polynomial. (Note that [9] exhibits a family of graphical games on bounded-degree trees for which the best response policies of some of the vertices, given their parents, have exponential size, when represented as unions of rectangles.) Due to space restrictions, in this version of the paper we present the algorithm for the case where the graph underlying the graphical game is a path. We then state our result for the general case; the proof can be found in the full version of this paper [10]. Suppose that s is a strategy profile for a graphical game G. That is, s assigns a mixed strategy to each vertex of G. let EPV (s) be the expected payoff of player V under s and let EP(s) =P V EPV (s). Let M(G) = max{EP(s) | s is a Nash equilibrium for G}. THEOREM 3. Suppose that G is a graphical game on an nvertex path. Then for any > 0 there is an algorithm that constructs a Nash equilibrium s for G that satisfies EP(s ) ≥ M(G)− . The running time of the algorithm is O(n4 P3 max/ 3 ) PROOF. Let {V1, . . . , Vn} be the set of all players. We start by constructing the best response policies for all Vi, i = 1, . . . , n − 1. As shown in [9], this can be done in time O(n3 ). Let N > 5n be a parameter to be selected later, set δ = 1/N, and define X = {jδ | j = 0, . . . , N}. We say that vj is an event point for a player Vi if it is a Vi-event point for B(Vi, Vi−1) or B(Vi+1, Vi). For each player Vi, consider a finite set of strategies Xi given by Xi = X ∪ {vj |vj is an event point for Vi}. It has been shown in [9] that for any i = 2, . . . , n, the best response policy B(Vi, Vi−1) has at most 2n + 4 Vi-event points. As we require N > 5n, we have |Xi| ≤ 2N; assume without loss of generality that |Xi| = 2N. Order the elements of Xi in increasing order as x1 i = 0 < x2 i < · · · < x2N i . We will refer to the strategies in Xi as discrete strategies of player Vi; a strategy profile in which each player has a discrete strategy will be referred to as a discrete strategy profile. We will now show that even we restrict each player Vi to strategies from Xi, the players can still achieve a Nash equilibrium, and moreover, the best such Nash equilibrium (with respect to the social welfare) has total payoff at least M(G) − as long as N is large enough. Let s be a strategy profile that maximizes social welfare. That is, let s = (s1, . . . , sn) where si is the mixed strategy of player Vi and EP(s) = M(G). For i = 1, . . . , n, let ti = max{xj i | xj i ≤ si}. First, we will show that the strategy profile t = (t1, . . . , tn) is a Nash equilibrium for G. Fix any i, 1 < i ≤ n, and let R = [v1, v2]×[u1, u2] be the rectangle in B(Vi, Vi−1) that contains (si, si−1). As v1 is a Vi-event point of B(Vi, Vi−1), we have v1 ≤ ti, so the point (ti, si−1) is inside R. Similarly, the point u1 is a Vi−1-event point of B(Vi, Vi−1), so we have u1 ≤ ti−1, and therefore the point (ti, ti−1) is inside R. This means that for any i, 1 < i ≤ n, we have ti−1 ∈ pbrVi−1 (ti), which implies that t = (t1, . . . , tn) is a Nash equilibrium for G. Now, let us estimate the expected loss in social welfare caused by playing t instead of s. LEMMA 3. For any pair of strategy profiles t, s such that |ti − si| ≤ δ we have |EPVi (s) − EPVi (t)| ≤ 24Pmaxδ for any i = 1, . . . , n. PROOF. Let Pi klm be the payoff of the player Vi, when he plays k, Vi−1 plays l, and Vi+1 plays m. Fix i = 1, . . . , n and for k, l, m ∈ {0, 1}, set tklm = tk i−1(1 − ti−1)1−k tl i(1 − ti)1−l tm i+1(1 − ti+1)1−m sklm = sk i−1(1 − si−1)1−k sl i(1 − si)1−l sm i+1(1 − si+1)1−m . We have |EPVi (s) − EPVi (t)| ≤ X k,l,m=0,1 |Pi klm(tklm − sklm )| ≤ 8Pmax max klm |tklm − sklm | We will now show that for any k, l, m ∈ {0, 1} we have |tklm − sklm | ≤ 3δ; clearly, this implies the lemma. Indeed, fix k, l, m ∈ {0, 1}. Set x = tk i−1(1 − ti−1)1−k , x = sk i−1(1 − si−1)1−k , y = tl i(1 − ti)1−l , y = sl i(1 − si)1−l , z = tm i+1(1 − ti+1)1−m , z = sm i+1(1 − si+1)1−m . Observe that if k = 0 then x − x = (1 − ti−1) − (1 − si−1), and if k = 1 then x − x = ti−1 − si−1, so |x − x | ≤ δ. A similar argument shows |y − y | ≤ δ, |z − z | ≤ δ. Also, we have x, x , y, y , z, z ∈ [0, 1]. Hence, |tklm −sklm | = |xyz−x y z | = |xyz − x yz + x yz − x y z + x y z − x y z | ≤ |x − x |yz + |y − y |x z + |z − z |x y ≤ 3δ. Lemma 3 implies Pn i=1 |EPVi (s) − EPVi (t)| ≤ 24nPmaxδ, so by choosing δ < /(24nPmax), or, equivalently, setting N > 24nPmax/ , we can ensure that the total expected payoff for the strategy profile t is within from optimal. We will now show that we can find the best discrete Nash equilibrium (with respect to the social welfare) using dynamic programming. As t is a discrete strategy profile, this means that the strategy profile found by our algorithm will be at least as good as t. Define ml,k i to be the maximum total payoff that V1, . . . , Vi−1 can achieve if each Vj , j ≤ i, chooses a strategy from Xj , for each j < i the strategy of Vj is a potential best response to the strategy of Vj+1, and, moreover, Vi−1 plays xl i−1, Vi plays xk i . If there is no way to choose the strategies for V1, . . . , Vi−1 to satisfy these conditions, we set ml,k i = −∞. The values ml,k i , i = 1, . . . , n; k, l = 1, . . . , N, can be computed inductively, as follows. We have ml,k 1 = 0 for k, l = 1, . . . , N. Now, suppose that we have already computed ml,k j for all j < i; k, l = 1, . . . , N. To compute mk,l i , we first check if (xk i , xl i−1) ∈ B(Vi, Vi−1). If this is not the case, we have ml,k i = −∞. Otherwise, consider the set Y = Xi−2 ∩ pbrVi−2 (xl i−1), i.e., the set of all discrete strategies of Vi−2 that are potential best responses to xl i−1. The proof of Theorem 1 in [9] implies that the set pbrVi−2 (xl i−1) is non-empty: the player Vi−2 has a potential best response to any strategy of Vi−1, in particular, xl i−1. By construction of the set Xi−2, this implies that Y is not empty. For each xj i−2 ∈ Y , let pjlk be the payoff that Vi−1 receives when Vi−2 plays xj i−2, Vi−1 plays xl i−1, and Vi plays xk i . Clearly, pjlk can be computed in constant time. Then we have ml,k i = max{mj,l i−1 + pjlk | xj i−2 ∈ Y }. Finally, suppose that we have computed ml,k n for l, k = 1, . . . , N. We still need to take into account the payoff of player Vn. Hence, 167 we consider all pairs (xk n, xl n−1) that satisfy xl n−1 ∈ pbrVn−1 (xk n), and pick the one that maximizes the sum of mk,l n and the payoff of Vn when he plays xk n and Vn−1 plays xl n−1. This results in the maximum total payoff the players can achieve in a Nash equilibrium using discrete strategies; the actual strategy profile that produces this payoff can be reconstructed using standard dynamic programming techniques. It is easy to see that each ml,k i can be computed in time O(N), i.e., all of them can be computed in time O(nN3 ). Recall that we have to select N ≥ (24nPmax)/ to ensure that the strategy profile we output has total payoff that is within from optimal. We conclude that we can compute an -approximation to the best Nash equilibrium in time O(n4 P3 max/ 3 ). This completes the proof of Theorem 3. To state our result for the general case (i.e., when the underlying graph is a bounded-degree tree rather than a path), we need additional notation. If G has n players, let q(n) be an upper bound on the number of event points in the representation of any best response policy. That is, we assume that for any vertex U with parent V , B(V, U) has at most q(n) event points. We will be interested in the situation in which q(n) is polynomial in n. THEOREM 4. Let G be an n-player graphical game on a tree in which each node has at most Δ children. Suppose we are given a set of best-response policies for G in which each best-response policy B(V, U) is represented by a set of rectangles with at most q(n) event points. For any > 0, there is an algorithm that constructs a Nash equilibrium s for G that satisfies EP(s ) ≥ M(G) − . The running time of the algorithm is polynomial in n, Pmax and −1 provided that the tree has bounded degree (that is, Δ = O(1)) and q(n) is a polynomial in n. In particular, if N = max((Δ + 1)q(n) + 1, n2Δ+2 (Δ + 2)Pmax −1 ) and Δ > 1 then the running time is O(nΔ(2N)Δ . For the proof of this theorem, see [10]. 4.1 A polynomial-time algorithm for multiplicative approximation The running time of our algorithm is pseudopolynomial rather than polynomial, because it includes a factor which is polynomial in Pmax, the maximum (in absolute value) entry in any payoff matrix. If we are interested in multiplicative approximation rather than additive one, this can be improved to polynomial. First, note that we cannot expect a multiplicative approximation for all inputs. That is, we cannot hope to have an algorithm that computes a Nash equilibrium with total payoff at least (1 − )M(G). If we had such an algorithm, then for graphical games G with M(G) = 0, the algorithm would be required to output the optimal solution. To show that this is infeasible, observe that we can use the techniques of Section 3.2 to construct two integercoefficient graphical games on paths of length O(n) such that for some X ∈ R the maximal total payoff in the first game is X, the maximal total payoff in the second game is −X, and for both games, the strategy profiles that achieve the maximal total payoffs involve algebraic numbers of degree n. By combining the two games so that the first vertex of the second game becomes connected to the last vertex of the first game, but the payoffs of all players do not change, we obtain a graphical game in which the best Nash equilibrium has total payoff 0, yet the strategies that lead to this payoff have high algebraic complexity. However, we can achieve a multiplicative approximation when all entries of the payoff matrices are positive and the ratio between any two entries is polynomially bounded. Recall that we assume that all payoffs are integer, and let Pmin > 0 be the smallest entry of any payoff matrix. In this case, for any strategy profile the payoff to player i is at least Pmin, so the total payoff in the social-welfare maximizing Nash equilibrium s satisfies M(G) ≥ nPmin. Moreover, Lemma 3 implies that by choosing δ < /(24Pmax/Pmin), we can ensure that the Nash equilibrium t produced by our algorithm satisfies nX i=1 EPVi (s) − nX i=1 EPVi (t) ≤ 24Pmaxδn ≤ nPmin ≤ M(G), i.e., for this value of δ we have Pn i=1 EPVi (t) ≥ (1 − )M(G). Recall that the running time of our algorithm is O(nN3 ), where N has to be selected to satisfy N > 5n, N = 1/δ. It follows that if Pmin > 0, Pmax/Pmin = poly(n), we can choose N so that our algorithm provides a multiplicative approximation guarantee and runs in time polynomial in n and 1/ . 5. BOUNDED PAYOFF NASH EQUILIBRIA Another natural way to define what is a good Nash equilibrium is to require that each player"s expected payoff exceeds a certain threshold. These thresholds do not have to be the same for all players. In this case, in addition to the payoff matrices of the n players, we are given n numbers T1, . . . , Tn, and our goal is to find a Nash equilibrium in which the payoff of player i is at least Ti, or report that no such Nash equilibrium exists. It turns out that we can design an FPTAS for this problem using the same techniques as in the previous section. THEOREM 5. Given a graphical game G on an n-vertex path and n rational numbers T1, . . . , Tn, suppose that there exists a strategy profile s such that s is a Nash equilibrium for G and EPVi (s) ≥ Ti for i = 1, . . . , n. Then for any > 0 we can find in time O(max{nP3 max/ 3 , n4 / 3 }) a strategy profile s such that s is a Nash equilibrium for G and EPVi (s ) ≥ Ti − for i = 1, . . . , n. PROOF. The proof is similar to that of Theorem 3. First, we construct the best response policies for all players, choose N > 5n, and construct the sets Xi, i = 1, . . . , n, as described in the proof of Theorem 3. Consider a strategy profile s such that s is a Nash equilibrium for G and EPVi (s) ≥ Ti for i = 1, . . . , n. We construct a strategy profile ti = max{xj i | xj i ≤ si} and use the same argument as in the proof of Theorem 3 to show that t is a Nash equilibrium for G. By Lemma 3, we have |EPVi (s) − EPVi (t)| ≤ 24Pmaxδ, so choosing δ < /(24Pmax), or, equivalently, N > max{5n, 24Pmax/ }, we can ensure EPVi (t) ≥ Ti − for i = 1, . . . , n. Now, we will use dynamic programming to find a discrete Nash equilibrium that satisfies EPVi (t) ≥ Ti − for i = 1, . . . , n. As t is a discrete strategy profile, our algorithm will succeed whenever there is a strategy profile s with EPVi (s) ≥ Ti− for i = 1, . . . , n. Let zl,k i = 1 if there is a discrete strategy profile such that for any j < i the strategy of the player Vj is a potential best response to the strategy of Vj+1, the expected payoff of Vj is at least Tj − , and, moreover, Vi−1 plays xl i−1, Vi plays xk i . Otherwise, let zl,k i = 0. We can compute zl,k i , i = 1, . . . , n; k, l = 1, . . . , N inductively, as follows. We have zl,k 1 = 1 for k, l = 1, . . . , N. Now, suppose that we have already computed zl,k j for all j < i; k, l = 1, . . . , N. To compute zk,l i , we first check if (xk i , xl i−1) ∈ B(Vi, Vi−1). If this 168 is not the case, clearly, zk,l i = 0. Otherwise, consider the set Y = Xi−2 ∩pbrVi−2 (xl i−1), i.e., the set of all discrete strategies of Vi−2 that are potential best responses to xl i−1. It has been shown in the proof of Theorem 3 that Y = ∅. For each xj i−2 ∈ Y , let pjlk be the payoff that Vi−1 receives when Vi−2 plays xj i−2, Vi−1 plays xl i−1, and Vi plays xk i . Clearly, pjlk can be computed in constant time. If there exists an xj i−2 ∈ Y such that zj,l i−1 = 1 and pjlk ≥ Ti−2 − , set zl,k i = 1. Otherwise, set zl,k i = 0. Having computed zl,k n , l, k = 1, . . . , N, we check if zl,k n = 1 for some pair (l, k). if such a pair of indices exists, we instruct Vn to play xk n and use dynamic programming techniques (or, equivalently, the downstream pass of the algorithm of [13]) to find a Nash equilibrium s that satisfies EPVi (s ) ≥ Ti − for i = 1, . . . , n (recall that Vn is a dummy player, i.e., we assume Tn = 0, EPn(s ) = 0 for any choice of s ). If zl,k n = 0 for all l, k = 1, . . . , N, there is no discrete Nash equilibrium s that satisfies EPVi (s ) ≥ Ti − for i = 1, . . . , n and hence no Nash equilibrium s (not necessarily discrete) such that EPVi (s) ≥ Ti for i = 1, . . . , n. The running time analysis is similar to that for Theorem 3; we conclude that the running time of our algorithm is O(nN3 ) = O(max{nP3 max/ 3 , n4 / 3 }). REMARK 1. Theorem 5 can be extended to trees of bounded degree in the same way as Theorem 4. 5.1 Exact Computation Another approach to finding Nash equilibria with bounded payoffs is based on inductively computing the subsets of the best response policies of all players so as to exclude the points that do not provide sufficient payoffs to some of the players. Formally, we say that a strategy v of the player V is a potential best response to a strategy w of its parent W with respect to a threshold vector T = (T1, . . . , Tn), (denoted by v ∈ pbrV (w, T)) if there is an equilibrium in the instance G(V,W ),W =w in which V plays mixed strategy v and the payoff to any player Vi downstream of V (including V ) is at least Ti. The best response policy for V with respect to a threshold vector T is defined as B(W, V, T) = {(w, v) | v ∈ pbrV (w, T), w ∈ [0, 1]}. It is easy to see that if any of the sets B(Vj, Vj−1, T), j = 1, . . . , n, is empty, it means that it is impossible to provide all players with expected payoffs prescribed by T. Otherwise, one can apply the downstream pass of the original algorithm of [13] to find a Nash equilibrium. As we assume that Vn is a dummy vertex whose payoff is identically 0, the Nash equilibrium with these payoffs exists as long as Tn ≤ 0 and B(Vn, Vn−1, T) is not empty. Using the techniques developed in [9], it is not hard to show that for any j = 1, . . . , n, the set B(Vj , Vj−1, T) consists of a finite number of rectangles, and one can compute B(Vj+1, Vj , T) given B(Vj , Vj−1, T). The advantage of this approach is that it allows us to represent all Nash equilibria that provide required payoffs to the players. However, it is not likely to be practical, since it turns out that the rectangles that appear in the representation of B(Vj , Vj−1, T) may have irrational coordinates. CLAIM 6. There exists a graphical game G on a 3-vertex path UV W and a vector T = (T1, T2, T3) such that B(V, W, T) cannot be represented as a union of a finite number of rectangles with rational coordinates. PROOF. We define the payoffs to the players in G as follows. The payoff to U is identically 0, i.e., P0 (U) = P1 (U) = 0. Using Claim 1, we select the payoffs to V so that P0 (V ) = uw, P1 (V ) = P0 (V ) + w − .8u − .1, where u and w are the (mixed) strategies of U and W , respectively. It follows that V is indifferent between playing 0 and 1 if and only if w = f(u) = .8u + .1; observe that for any u ∈ [0, 1] we have f(u) ∈ [0, 1]. It is not hard to see that we have B(W, V ) = [0, .1]×{0} ∪ [.1, .9]×[0, 1] ∪ [.9, 1]×{1}. The payoffs to W are not important for our construction; for example, set P0(W ) = P0(W ) = 0. Now, set T = (0, 1/8, 0), i.e., we are interested in Nash equilibria in which V "s expected payoff is at least 1/8. Suppose w ∈ [0, 1]. The player V can play a mixed strategy v when W is playing w as long as U plays u = f−1 (w) = 5w/4 − 1/8 (to ensure that V is indifferent between 0 and 1) and P0 (V ) = P1 (V ) = uw = w(5w/4 − 1/8) ≥ 1/8. The latter condition is satisfied if w ≤ (1 − √ 41)/20 < 0 or w ≥ (1 + √ 41)/20. Note that we have .1 < (1 + √ 41)/20 < .9. For any other value of w, any strategy of U either makes V prefer one of the pure strategies or does not provide it with a sufficient expected payoff. There are also some values of w for which V can play a pure strategy (0 or 1) as a potential best response to W and guarantee itself an expected payoff of at least 1/8; it can be shown that these values of w form a finite number of segments in [0, 1]. We conclude that any representation of B(W, V, T) as a union of a finite number of rectangles must contain a rectangle of the form [(1 + √ 41)/20, w ]×[v , v ] for some w , v , v ∈ [0, 1]. On the other hand, it can be shown that for any integer payoff matrices and threshold vectors and any j = 1, . . . , n − 1, the sets B(Vj+1, Vj, T) contain no rectangles of the form [u , u ]×{v} or {v}×[w , w ], where v ∈ R\Q. This means that if B(Vn, Vn−1, T) is non-empty, i.e., there is a Nash equilibrium with payoffs prescribed by T, then the downstream pass of the algorithm of [13] can always pick a strategy profile that forms a Nash equilibrium, provides a payoff of at least Ti to the player Vi, and has no irrational coordinates. Hence, unlike in the case of the Nash equilibrium that maximizes the social welfare, working with irrational numbers is not necessary, and the fact that the algorithm discussed in this section has to do so can be seen as an argument against using this approach. 6. OTHER CRITERIA FOR SELECTING A NASH EQUILIBRIUM In this section, we consider several other criteria that can be useful in selecting a Nash equilibrium. 6.1 Combining welfare maximization with bounds on payoffs In many real life scenarios, we want to maximize the social welfare subject to certain restrictions on the payoffs to individual players. For example, we may want to ensure that no player gets a negative expected payoff, or that the expected payoff to player i is at least Pi max − ξ, where Pi max is the maximum entry of i"s payoff matrix and ξ is a fixed parameter. Formally, given a graphical game G and a vector T1, . . . , Tn, let S be the set of all Nash equilibria s of G that satisfy Ti ≤ EPVi (s) for i = 1, . . . , n, and let ˆs = argmaxs∈S EP(s). If the set S is non-empty, we can find a Nash equilibrium ˆs that is -close to satisfying the payoff bounds and is within from ˆs with respect to the total payoff by combining the algorithms of Section 4 and Section 5. Namely, for a given > 0, choose δ as in the proof of Theorem 3, and let Xi be the set of all discrete strategies of player Vi (for a 169 formal definition, see the proof of Theorem 3). Combining the proofs of Theorem 3 and Theorem 5, we can see that the strategy profile ˆt given by ˆti = max{xj i | xj i ≤ ˆsi} satisfies EPVi (ˆt) ≥ Ti − , |EP(ˆs) − EP(ˆt)| ≤ . Define ˆml,k i to be the maximum total payoff that V1, . . . , Vi−1 can achieve if each Vj, j ≤ i, chooses a strategy from Xj , for each j < i the strategy of Vj is a potential best response to the strategy of Vj+1 and the payoff to player Vj is at least Tj − , and, moreover, Vi−1 plays xl i−1, Vi plays xk i . If there is no way to choose the strategies for V1, . . . , Vi−1 to satisfy these conditions, we set ml,k i = −∞. The ˆml,k i can be computed by dynamic programming similarly to the ml,k i and zl,k i in the proofs of Theorems 3 and 5. Finally, as in the proof of Theorem 3, we use ml,k n to select the best discrete Nash equilibrium subject to the payoff constraints. Even more generally, we may want to maximize the total payoff to a subset of players (who are assumed to be able to redistribute the profits fairly among themselves) while guaranteeing certain expected payoffs to (a subset of) the other players. This problem can be handled similarly. 6.2 A minimax approach A more egalitarian measure of the quality of a Nash equilibrium is the minimal expected payoff to a player. The optimal solution with respect to this measure is a Nash equilibrium in which the minimal expected payoff to a player is maximal. To find an approximation to such a Nash equilibrium, we can combine the algorithm of Section 5 with binary search on the space of potential lower bounds. Note that the expected payoff to any player Vi given a strategy s always satisfies −Pmax ≤ EPVi (s) ≤ Pmax. For a fixed > 0, we start by setting T = −Pmax, T = Pmax, T∗ = (T + T )/2. We then run the algorithm of Section 5 with T1 = · · · = Tn = T∗ . If the algorithm succeeds in finding a Nash equilibrium s that satisfies EPVi (s ) ≥ T∗ − for all i = 1, . . . , n, we set T = T∗ , T∗ = (T + T )/2; otherwise, we set T = T∗ , T∗ = (T + T )/2 and loop. We repeat this process until |T − T | ≤ . It is not hard to check that for any p ∈ R, if there is a Nash equilibrium s such that mini=1,...,n EPVi (s) ≥ p, then our algorithm outputs a Nash equilibrium s that satisfies mini=1,...,n EPVi (s) ≥ p−2 . The running time of our algorithm is O(max{nP3 max log −1 / 3 , n4 log −1 / 3 }). 6.3 Equalizing the payoffs When the players" payoff matrices are not very different, it is reasonable to demand that the expected payoffs to the players do not differ by much either. We will now show that Nash equilibria in this category can be approximated in polynomial time as well. Indeed, observe that the algorithm of Section 5 can be easily modified to deal with upper bounds on individual payoffs rather than lower bounds. Moreover, we can efficiently compute an approximation to a Nash equilibrium that satisfies both the upper bound and the lower bound for each player. More precisely, suppose that we are given a graphical game G, 2n rational numbers T1, . . . , Tn, T1, . . . , Tn and > 0. Then if there exists a strategy profile s such that s is a Nash equilibrium for G and Ti ≤ EPVi (s) ≤ Ti for i = 1, . . . , n, we can find a strategy profile s such that s is a Nash equilibrium for G and Ti − ≤ EPVi (s ) ≤ Ti + for i = 1, . . . , n. The modified algorithm also runs in time O(max{nP3 max/ 3 , [4]n4 / 3 }). This observation allows us to approximate Nash equilibria in which all players" expected payoffs differ by at most ξ for any fixed ξ > 0. Given an > 0, we set T1 = · · · = Tn = −Pmax, T1 = · · · = Tn = −Pmax + ξ + , and run the modified version of the algorithm of Section 5. If it fails to find a solution, we increment all Ti, Ti by and loop. We continue until the algorithm finds a solution, or Ti ≥ Pmax. Suppose that there exists a Nash equilibrium s that satisfies |EPVi (s) − EPVj (s)| ≤ ξ for all i, j = 1, . . . , n. Set r = mini=1,...,n EPVi (s); we have r ≤ EPVi (s) ≤ r + ξ for all i = 1, . . . , n. There exists a k ≥ 0 such that −Pmax + (k − 1) ≤ r ≤ −Pmax + k . During the kth step of the algorithm, we set T1 = · · · = Tn = −Pmax +(k−1) , i.e., we have r− ≤ Ti ≤ r, r + ξ ≤ Ti ≤ r + ξ + . That is, the Nash equilibrium s satisfies Ti ≤ r ≤ EPVi (s) ≤ r + ξ ≤ Ti , which means that when Ti is set to −Pmax + (k − 1) , our algorithm is guaranteed to output a Nash equilibrium t that satisfies r − 2 ≤ Ti − ≤ EPVi (t) ≤ Ti + ≤ r +ξ +2 . We conclude that whenever such a Nash equilibrium s exists, our algorithm outputs a Nash equilibrium t that satisfies |EPVi (t) − EPVj (t)| ≤ ξ + 4 for all i, j = 1, . . . , n. The running time of this algorithm is O(max{nP3 max/ 4 , n4 / 4 }). Note also that we can find the smallest ξ for which such a Nash equilibrium exists by combining this algorithm with binary search over the space ξ = [0, 2Pmax]. This identifies an approximation to the fairest Nash equilibrium, i.e., one in which the players" expected payoffs differ by the smallest possible amount. Finally, note that all results in this section can be extended to bounded-degree trees. 7. CONCLUSIONS We have studied the problem of equilibrium selection in graphical games on bounded-degree trees. We considered several criteria for selecting a Nash equilibrium, such as maximizing the social welfare, ensuring a lower bound on the expected payoff of each player, etc. First, we focused on the algebraic complexity of a social welfare-maximizing Nash equilibrium, and proved strong negative results for that problem. Namely, we showed that even for graphical games on paths, any algebraic number α ∈ [0, 1] may be the only strategy available to some player in all social welfaremaximizing Nash equilibria. This is in sharp contrast with the fact that graphical games on trees always possess a Nash equilibrium in which all players" strategies are rational numbers. We then provided approximation algorithms for selecting Nash equilibria with special properties. While the problem of finding approximate Nash equilibria for various classes of games has received a lot of attention in recent years, most of the existing work aims to find -Nash equilibria that satisfy (or are -close to satisfying) certain properties. Our approach is different in that we insist on outputting an exact Nash equilibrium, which is -close to satisfying a given requirement. As argued in the introduction, there are several reasons to prefer a solution that constitutes an exact Nash equilibrium. Our algorithms are fully polynomial time approximation schemes, i.e., their running time is polynomial in the inverse of the approximation parameter , though they may be pseudopolynomial with respect to the input size. Under mild restrictions on the inputs, they can be modified to be truly polynomial. This is the strongest positive result one can derive for a problem whose exact solutions may be hard to represent, as is the case for many of the problems considered here. While we prove our results for games on a path, they can be generalized to any tree for which the best response policies have compact representations as unions of rectangles. In the full version of the paper we describe our algorithms for the general case. Further work in this vein could include extensions to the kinds of guarantees sought for Nash equilibria, such as guaranteeing total payoffs for subsets of players, selecting equilibria in which some players are receiving significantly higher payoffs than their peers, etc. At the moment however, it is perhaps more important to inves170 tigate whether Nash equilibria of graphical games can be computed in a decentralized manner, in contrast to the algorithms we have introduced here. It is natural to ask if our results or those of [9] can be generalized to games with three or more actions. However, it seems that this will make the analysis significantly more difficult. In particular, note that one can view the bounded payoff games as a very limited special case of games with three actions per player. Namely, given a two-action game with payoff bounds, consider a game in which each player Vi has a third action that guarantees him a payoff of Ti no matter what everyone else does. Then checking if there is a Nash equilibrium in which none of the players assigns a nonzero probability to his third action is equivalent to checking if there exists a Nash equilibrium that satisfies the payoff bounds in the original game, and Section 5.1 shows that finding an exact solution to this problem requires new ideas. Alternatively it may be interesting to look for similar results in the context of correlated equilibria (CE), especially since the best CE may have higher value (total expected payoff) than the best NE. The ratio between these values is called the mediation value in [1]. It is known from [1] that the mediation value of 2-player, 2-action games with non-negative payoffs is at most 4 3 , and they exhibit a 3-player game for which it is infinite. Furthermore, a 2-player, 3action example from [1] also has infinite mediation value. 8. REFERENCES [1] I. Ashlagi, D. Monderer and M. Tenneholtz, On the Value of Correlation, Proceedings of Dagstuhl seminar 05011 (2005) [2] R. Aumann, Subjectivity and Correlation in Randomized Strategies, Journal of Mathematical Economics 1 pp. 67-96 (1974) [3] B. Blum, C. R. Shelton, and D. Koller, A Continuation Method for Nash Equilibria in Structured Games, Proceedings of IJCAI"03 [4] X. Chen, X. Deng and S. Teng, Computing Nash Equilibria: Approximation and Smoothed Complexity, Proceedings of FOCS"06 [5] X. Chen, X. Deng, Settling the Complexity of 2-Player Nash-Equilibrium, Proceedings of FOCS"06 [6] V. Conitzer and T. Sandholm, Complexity Results about Nash Equilibria, Proceedings of IJCAI"03 [7] C. Daskalakis, P. W. Goldberg and C. H. Papadimitriou, The Complexity of Computing a Nash Equilibrium, Proceedings of STOC"06 [8] R. S. Datta, Universality of Nash Equilibria, Mathematics of Operations Research 28:3, 2003 [9] E. Elkind, L. A. Goldberg, and P. W. Goldberg, Nash Equilibria in Graphical games on Trees Revisited, Proceedings of ACM EC"06 [10] E. Elkind, L. A. Goldberg, and P. W. Goldberg, Computing Good Nash Equilibria in Graphical Games, http://arxiv.org/abs/cs.GT/0703133 [11] I. Gilboa and E. Zemel, Nash and Correlated Equilibria: Some Complexity Considerations, Games and Economic Behavior, 1 pp. 80-93 (1989) [12] P. W. Goldberg and C. H. Papadimitriou, Reducibility Among Equilibrium Problems, Proceedings of STOC"06 [13] M. Kearns, M. Littman, and S. Singh, Graphical Models for Game Theory, Proceedings of UAI"01 [14] M. Littman, M. Kearns, and S. Singh, An Efficient Exact Algorithm for Singly Connected Graphical Games, Proceedings of NIPS"01 [15] R. Lipton and E. Markakis, Nash Equilibria via Polynomial Equations, Proceedings of LATIN"04 [16] L. Ortiz and M. Kearns, Nash Propagation for Loopy Graphical Games, Proceedings of NIPS"03 [17] C.H. Papadimitriou, Computing Correlated Equilibria in Multi-Player Games, Proceedings of STOC"05 [18] C.H. Papadimitriou and T. Roughgarden, Computing Equilibria in Multi-Player Games, Proceedings of SODA"05 [19] D. Vickrey and D. Koller, Multi-agent Algorithms for Solving Graphical Games, Proceedings of AAAI"02 171