Kaufman, Becker - THE EMPIRICAL DETERMINATION OF GAMETHEORETICAL.pdf

(498 KB) Pobierz
241154685 UNPDF
Journal of Experimental Psychology
1961, Vol. 61, No. 6, 462-468
THE EMPIRICAL DETERMINATION OF GAME-
THEORETICAL STRATEGIES *
HERBERT KAUFMAN AND GORDON M. BECKER 2
Electric Boat Division, General Dynamics Corporation
The present paper reports a study
of the behavior of naive subjects in
a strategy-making situation where a
normative solution, i.e., game theory,
can be stated. In simplest game
theory terms, 5s were required to
find optimal strategies in a two person
zero sum game under conditions to be
described. The game theory solution
provides a logical reference or ideal
to which the actual game playing is
compared. The procedure provides
a dynamic behavioral analysis of
decision problems usually handled by
the game theory model.
Atkinson and Suppes (1958, 1959),
Lieberman (1959), and Simon (1956)
have previously reported empirical
studies involving game theory con-
cepts. These studies, almost uni-
versally, are concerned with behavior
in game situations where individual
moves, rather than strategies, are
made at each trial. The present
paper represents a radical departure
from this usual procedure in that the
individual playing a matrix game is
required to indicate a game strategy
at every trial, i.e., to select 100 moves
at each trial, rather than simply one
move.
In order to explain the experimental
procedure, some characteristics of the
game theory model will be stated:
(a) In the two person, two choice,
1 This research was done as part of the
Basic Research Program of General Dynamics
Corporation. The authors wish to thank
Mortimer H. Applezweig for his help in
providing Connecticut College psychology
laboratory facilities and students.
2 Now at the University of California,
Los Angeles.
zero sum game represented by a
2X2 matrix there exist optimal
strategies for each player, (b) If
either player plays his optimal strat-
egy the play of the other has no
effect on the average payoff. (c)
Playing an iterated game involves,
for each player, both a decision about
the relative frequencies of play for
the two choices, the game strategy,
and a decision about the sequence of
play, the game tactics. In the absence
of any predictive criteria on the play
of the opponent, a random sequence
of the choices is to be preferred in
order to guarantee the expected
payoff for a given strategy.
In the experiment to be described,
the strategy-making aspect of the
game decisions was isolated by re-
moving the burden of tactical play
from 5s. The major questions to be
answered were: (a) Do naive 5s learn,
through the experience of playing a
game, to find and maintain a strategy,
optimal or otherwise? (b) Do games
having the same values (average
payoff for optimal strategy), but
different optimal strategies, differ
in ease of learning ?
METHOD
Strategy decisions were emphasized by
forcing 5s to choose relative frequencies of
play, and not allowing them to decide upon
the sequence of play. At each trial S divided
100 choices between his two alternatives.
Each block of 100 choices amounts to a
strategy. Thus, at each trial 5 announced
the strategy he wanted on that trial, e.g.,
42 of one alternative and 58 of the other.
The 5s strategy was paired with a strategy
chosen by E to be punishing to 5 (see below).
That is to say, in every game S's opponent
462
241154685.002.png
GAME-THEORETICAL STRATEGIES
463
was in fact E. The payoff to S for this pair
of opposing strategies was calculated under
an assumption of random play by both
players, equivalent to arranging the two
sequences of choices each in random order
and summing the 100 payoffs thus determined.
Figure 1 is the graphical representation of
Game D shown in Table 1. T and B are the
alternative choices for the row player, L and
R the alternatives for the column player,
and the cell entries represent the payoffs
to the row player for the given combinations
of row and column choices. In this figure
the strategies of the row player are shown
along the abscissa, labeled as the proportion
of top row (T choices). For example, the
point 92 on the abscissa corresponds to the
row player's choice of 92T and 8B. The
ordinate is payoff points to the row player.
The four ordinate values, a, b, c, and d, are
the cell entries of the matrix in the top row
(a, b) and the bottom row (c, d), from left to
right. For example, in Game D (Table 1),
a is 1.12, b is 0.78, c is 0.4S, and d is 1.47.
The column strategies are represented para-
metrica'lly by the lines cutting across from
the line X = 0 to the line X = 100, labeled
as left column proportion. For example,
the line running from (100, 85) to (0, 126)
corresponds to the column player's strategy
of 20L and 80R. The payoff to the row
player for 92T-8B against a column strategy
of 20L-80R is found on the graph to be 88
points.
TABLE 1
FIVE EXPERIMENTAL GAMES
Game
5 Chooses
Rows
T & B
Payoffs to 5
When E Chooses
Column
Optima!
Strategy for S
(100 Moves)
A
T
B
B
B
B
B
L
0.95
0.28
0.68
1.35
1.02
0.35
1.12
0.45
0.60
1.30
R
0.95
1.65
1.23
0.53
0.88
1.58
100
0
B
60
40
90
10
C
D
0.78
1.47
1.28
0.62
75
25
5 50
E
All the lines intersect at the point (X',
Y'). X' is the row player's optimal strategy;
in this case 75T-25B. Y' is the value, V,
of the game; in this case 95 points.
The S was always the row player. His
opponent (E) was the column player. 'The
JE's choices were obtained as follows: (See
Fig. 1). Each strategy of 5 defines an inter-
val (containing V) of possible payoffs to S.
Each point in this interval corresponds to a
strategy for E. All points with payoff
greater than V (shaded area in Fig. 1) were
eliminated so that S could never do better
than V. The remaining interval shown in
the unshaded portion of the graph was divided
into 10 equal parts, each part corresponding
to the E strategies 100-0, 95-5, . . ., 55-45.
One of the 10 strategies in the interval
determined by S's strategy was selected
randomly to give 5 his payoff on that play.
The five 2X2 matrix games shown in
Table 1 were used. These games were selected
according to the following specifications:
(a) All the games have the same value
(V = .95). (b) The optimal S strategies
for the games are: 100-0, 90-10, 75-25,
60-40, 50-50, these numbers referring to the
proportions of T and B, respectively.
(c) The average 5 payoffs for strategies
deviating from the optimum are approxi-
mately equal for equal absolute deviations
and E's most punishing choices (always
either 100-0 or 0-100). Graphically this
means that the lines representing E's 0-100
and 100-0 strategies have slopes of approxi-
mately equal magnitude, but opposite in sign.
FIG. 1. Graphical illustration of Game D.
241154685.003.png
464
HERBERT KAUFMAN AND GORDON M. BECKER
The payoffs for the five games were ob-
tained graphically, by drawing the lines ac
and bd, as in Fig. 1, and reading off the
intercepts at pairs of perpendiculars lOOx
units apart; the left perpendicular at x'
(100-0 game), 10 units to the left of x' (90-10
game), etc. The values were rounded to the
nearest hundredth. The slopes of the lines
ac and bd were chosen with two ideas in
mind.
In the first place, if the slopes were too
small, differences in players' responses would
not be differentially rewarded and it would
be difficult to locate the optimal strategy
by trial and error. On the other hand, if the
slopes were large, the range of payoffs would
be large, leading both to easier response
differentiation and to awkward point-to-
money conversion rates.
In the second place, the two lines were
chosen with slightly different absolute slopes
to avoid obvious symmetries in the SO—50
game and to avoid the easily learned char-
acteristic of equal slopes, i.e., probability
of playing a given row is simply the difference
between the payoffs in the other row divided
by twice the slope. Furthermore, the slope
for any line is simply the difference between
the column payoffs. Therefore, for equal
slopes, the optimal strategy would always be
the difference between the payoffs in the
opposite row, divided by twice the difference
between the payoffs in either column. The
values actually used in the experiment reflect
a compromise among these conflicting con-
siderations. The slopes of the lines ac and
bd were the same for all five games. Since
these slopes determine a reward gradient
for responses deviating from optimal strate-
gies, it is clear that the reward gradients were
the same for all games. This means that the
average reward and range of rewards for
responses deviating by a given absolute
amount from optimal strategies was the same
for all games. The games were obtained
empirically rather than analytically partly
because of the ease with which the graphic
method provided a solution to these problems
and partly because two place accuracy was
considered more than sufficient for the
purposes of this experiment.
The 5s were told that their point total
would be converted to money after the experi-
ment—the more points, the more money—
with a conversion factor of about 100 points
equal to 1 cent. They could receive up to
$1.00 more than the minimum guaranteed
($1.50) for participating in the experiment.
They were urged to make all reasonable
efforts to maximize their point total for
each trial and for the entire series of tasks.
The 5s were, in fact, paid on a linear scale
where the lowest point total equaled $1.50
and the highest equaled $2.50.
The same game value was used for all five
games to avoid confounding the effect of
value with that of the major variable, optimal
strategy. The value of 95 points was chosen
so that the point-to-money conversion could
be made at reasonable cost, i.e., it was felt
that a ratio of approximately 100 points to
1 cent would give a return satisfactory to 5s
and tolerable to the experimental budget.
The value of the games was not made a
rounded figure, say 100 points, in order to
avoid making this particular aspect of the
procedure too transparent to 5s.
Instructions. —In the instructions given 5
the following points were emphasized: (a)
Your ability to make decisions is being tested.
The better your decisions, the higher will be
your payoff, (b) The E is single-minded in
his determination to keep your winnings as
low as possible. He will be informed of your
choices but his information will not be
complete. He will be told enough, however,
so that unwise choices on your part can be
punished severely, (c) A given payoff will
be determined entirely by two factors: your
strategy and your opponent's strategy.
The procedure for obtaining the payoffs
from the two strategies at each trial was
explained carefully to 5s. One reason was
to avoid the misconception that if, for ex-
ample, the two strategies were T90-B10 and
L90-R10, there would be 90TL combinations,
10BR combinations and no BL or TR com-
binations, when in fact there would be 81TL,
9TR, 9BL, and 1BR combinations.
The procedure was represented to 5s as
an analog of real life situations involving two
choices each for two antagonists. The pay-
off rules of the game as given by the matrix
were explained, but no instructions were
given involving game theory concepts.
Specifically, 5s were not told that all games
had the same value, nor what the value of
any game was, nor indeed did they have
any explanation of what a game value was.
They were told nothing of the possible payoff
beyond the information contained in the
matrix and the important facts labeled (6)
immediately above. All this was aimed at
creating a situation in which 5s were con-
fronted by a hostile and knowledgeable
opponent. By choosing their optimal strategy
5s could guarantee themselves a certain
number of points each trial. They could
never do better than the optimal strategy
payoff but they could do considerably worse.
241154685.004.png
GAME-THEORETICAL STRATEGIES
465
Procedure. —The procedure for each game
was as follows: 5 was given a three-page
answer sheet with the matrix game repro-
duced on each page. The alternatives were
denoted by capital letters and were changed
from game to game and from 5 to 5, The 5
announced his strategy (frequencies of the T
and B alternatives adding to 100). The E
chose a strategy punishing to 5 as described
above. The E then announced the strategy
he had chosen together with 5's payoff. The
5 recorded both strategies and his payoff.
This procedure was continued for 50 trials
or until criterion was reached. (See Results
and Analysis).
Experimental design. —The design, shown
in Table 2, was a S X 5 latin square replicated
four times; each replication had the 5 games,
5 orders of presentation (each with a different
S), and 5 temporal positions.
TABLE 3
FREQUENCY DISTRIBUTIONS FOR ADs FROM
OPTIMAL STRATEGY BY POSITIONS
AD
Position
l
3
19.5
2
3
0
1 8
12.0
4
1
11
11.5
S
5
11
13.5
40 +
30-39.9
20-29.9
10-19.9
0-9.9
Mean
1
1
4
19.0
Tables 3 and 4. The other measure
was trials to achieve a steady-state
response. Steady-state was defined
as repetition, without deviation, of
the same response up to and including
the last trial (Trial 50), or 5"s an-
nounced intention to continue in this
way. The percentage of 5s reaching
criterion in each of the five positions
was 10, 20, 40, 50, and 65, and in each
of the five games was 50, 40, 30, 35,
and 30 for Games A through E,
respectively. All steady-state solu-
tions to the game problems were the
"correct" game theory (maximin) so-
lutions, but a steady-state was not
always achieved.
Because a steady-state solution
was not achieved in many games,
the analyses of the criterion scores
were performed using a nonpara-
TABLE 2
EXPERIMENTAL DESIGN
Group
N
Order of Presentation
of Games
1
2
D
3
4
C
s
A
I
II
III
I V
4
4
A
B
C
D
E
C
B
The order of the rows in each game matrix
was varied randomly from one S to another.
Since each game in each of the five order of
presentation groups was presented to four
different 5s, two of the four 5s in each group
had the matrix as given in Table 1, and the
other two had the rows inverted.
Subjects. —Twenty paid volunteers, under-
graduate women students from Connecticut
College, served as 5s.
TABLE 4
FREQUENCY DISTRIBUTIONS FOR ADs FROM
OPTIMAL STRATEGY BY GAMES
RESULTS
Two measures were used, each
corresponding to a somewhat dif-
ferent aspect of performance. To
measure the overall correspondence
of the response to the optimal strategy
an "integrated error" score was used.
This was the average deviation (AD)
from optimal strategy over all trials
for each game. These are shown in
AD
Game
40 +
30-39.9
20-29.9
10-19.9
0-9.9
Mean
A
4
20.5
B
7
12
10.0
C
7
16.0
D
8
14.5
E
0
1 6
14.5
241154685.005.png
466
HERBERT KAUFMAN AND GORDON M. BECKER
TABLE 5
ANALYSIS OF VARIANCE OF AD SCORES
received by 5s. Although the main
concern was centered on what 5s
did (responses given as strategies)
rather than what they received (rein-
forcement given as number of points
received) there is a possibility that
the lack of response differentiation
might have been mediated by reward
constancy, or in simpler terms, that
5s failed to seek and find better
strategies because the extra effort
did not lead to any particular gain
in payoff. The analysis of variance
on the point totals reveals the same
effects as those demonstrated in the
response measure. The tau correla-
tion between earning (points) and
deviations from optimal strategy,
computed over 5s, was —0.947
(P < .001).
The "how-much" scores of average
error and points, and the "how-long"
score of trials-to-criterion lead to
similar conclusions concerning effects
of 5s and temporal position. A
major discrepancy is found for the
five games which differ in their
optimal strategies for 5. Here it is
found that 5s tend on the average
to be further from the optimal
strategies on the 100-0 and 90-10
games than on the other three games
in the presolution portions of the
game, but do not differ significantly
in the number of trials taken to
achieve a solution.
Source
Between 5s
Order of presentation
Replicates (Groups)
Residual among 5s
Within 5s
Games
Position
Residual from L.S.
Residual within 5s
df
MS
F
4
3
12
109.86
375.65
231.84
268.59
388.14
58.91
1.62
3.94*
4.56*
6.59*
4
12\ 7 2
*P <.01.
metric ranking test, the Friedman
two-way analysis of variance on ranks.
Average deviation scores.- —The anal-
ysis of variance performed on the AD
scores is shown in Table 5. The
results show that performance, using
this measure, varied with the person
playing the game, the game being
played, and the point in the sequence
at which the game was played.
Trials-to-criterion scores. —In per-
forming the analysis of variance on
ranks, parallel analyses were done
on rankings with ties, and on rankings
with ties broken by appeal to the
AD scores. The results, shown in
Table 6, indicate that performance,
using this measure, varied with the
person playing the game and the
point in the sequence at which the
game was played, but did not vary
with the game.
Of the 100 games played by the 20
5s, 37 were solved by 13 5s. Among
the 13 5s who solved at least one
game the mean number of solutions
was 2.8 (median of 3) out of a possible
five. In only one case did an 5 fail
to solve a game following a solution
to an earlier game.
In addition to the above two re-
sponse measures the point totals
collected by each 5 for each game were
analyzed to see what effects were
reflected by the amount of points
TABLE 6
FRIEDMAN TWO-WAY ANALYSIS OF VARIANCE
ON RANKS OF TRIALS-TO-CRITERION
SCORES
X
. —
•t
Soincc
di
Games
Position
5s
4
19
Tics
2.4
15.0*
36.7*
No Tics
5.0
27.7*
45.1*
*P
60/ 7 2
241154685.001.png
Zgłoś jeśli naruszono regulamin