Primate Social Intelligence

Preserved from the old Primate Info Net

Primate Social Intelligence


Charteris Ltd, 6 Kinghorn Street, London EC1A 7HT


A computational theory of primate social intelligence is proposed, in which primates represent social situations internally by discrete symbol structures, called scripts. Three well-defined computational operations on scripts are sufficient to support social learning, planning and prediction. This gives a formal, predictive model in which to analyse how primate social knowledge is acquired, as well as how it is used.

The theory is compared with primate data, such as Cheney and Seyfarth’s observations of vervet monkeys. It gives simple, understandable script-based analyses of many observed phenomena – such as the recognition and use of kin relations, learning of alarm calls, habituation to calls, knowledge of rank, tactical deception, and attachment behaviour.

I argue that a tight, concise theory of social cognition, such as script theory, is needed to explain the rapid learning and social guile seen in primates. It also has the benefits of simplicity and testability. The extension of scripts to incorporate a primate theory of mind is described in a subsequent paper.

Published in Cognitive Science, 20(4): 579-616, 1996

1. Introduction

In recent years our knowledge of primate behaviour and intelligence have grown rapidly, giving new insights into the origins and nature of our own intelligence. It has been proposed that the richness and complexity of primate social interactions have been a forcing-house for the growth of primate intelligence (Jolly 1966; Humphrey 1976).

Primate social cognition is often approached by informal verbal descriptions (eg Byrne and Whiten 1988; Dennett 1983; Cheney and Seyfarth 1990). This paper presents a working computational model of social intelligence. By making the model complete and consistent, we force all its assumptions into the open and can calculate its predictions unambiguously. The main results are :

– There are good reasons to expect that primate social cognition is based on discrete, symbolic representations of social situations. Scripts are such a representation, chosen to be as simple as possible.

– A complete and consistent theory of social cognition can be built using scripts and three basic operations on them.

– The theory gives simple, understandable accounts of many observations, such as primates’ understanding of kin and status relations in their group, of alarm calls and attachment behaviour.

– The theory gives highly adaptable social intelligence, with rapid learning of new social regularities – in broad agreement with observed primate behaviour.

A formal notation to describe primate social knowledge and behaviour has also been proposed by Byrne (1993), using a production rule formalism. The script analysis proposed here has features in common with Byrne’s proposal, differing mainly in having an explicit theory of learning, tailored to the social domain.

The theory describes general primate social intelligence, as seen in monkeys and most primates, but not the extended social intelligence – which seems to require a knowledge of others’ knowledge and intentions – seen in the great apes and mankind (Premack & Woodruff 1978; de Waal 1982; Byrne and Whiten 1988). The extension of this theory to include the primate ‘theory of mind’ is described in a subsequent paper (Worden 1995a).

Section 2 discusses the problem of primate social intelligence, and the types of computation in the brain which might underlie it, motivating the approach taken in this paper. Section 3 presents the computational model, which uses tree-like information structures called scripts, and three key operations on them for learning and performance. Scripts are easily envisaged, and the operations can be done with pencil and paper. I describe how these operations are used for learning, planning and prediction of social situations.

Section 4 compares the model with observations – particularly those of Cheney and Seyfarth (1990) on vervet monkeys. I give script-based analyses of monkey alarm calls, use of kin and rank relations, and attachment behaviour. Section 5 discusses further tests of the theory, while Section 6 compares the theory with other work and discusses its general implications.

In spite of the use of the term ‘scripts’, this computational model of social intelligence contains elements of scripts, mental models and production rule systems; the same script structures can serve as a specialised mental model of social situations, or as rules defining how those situations may develop. In this, the model has much in common with the framework for induction of Holland, Holyoak, Nisbett and Thagard (1986), which combine the same elements. Their models are more elaborate, being applied to human cognition; this simpler model, which maps onto a subset of theirs, applies to general primate social cognition. Both put strong emphasis on learning.

The theory of this paper tackles not only the problem of how social knowledge is represented and used in the brain, but also the related (and harder) problem of how that knowledge is acquired. I hope that by presenting a defined and predictive theory of primate social intelligence I may stimulate those who work with primates to express their findings in its terms, and to devise tests of the theory.

2. The Need for Social Intelligence

2.1 Social Intelligence in the Primate Brain

Social interactions in primates are more complex than those in other mammals. Some examples:

– Kin recognition : (Dasser 1987) has shown that monkeys recognise kin relationships amongst their peers.

– Redirected Aggression : (Judge 1982; Smuts 1985) After a fight between two monkeys, relatives of one are likely to threaten relatives of the other, showing again that monkeys recognise kin relations and use them in social exchanges.

– Protective Threat : (Kummer 1967) Female baboons will stay close to a dominant male for protection, and will use elaborate tactics to try to separate some other female from this protection.

Other examples are described in section 4, where they are compared with the theory. These examples show that primates have a detailed knowledge of others in their group, of their kin, status and alliance relations, of their current state and activities, and of the cause-effect regularities of their society; that they combine all this knowledge in flexible ways to achieve diverse goals, such as:

– Attachment to a parent

– Feeding

– Reproduction

– Avoidance of predators

– Maintaining status in the group

– Caregiving to offspring

Each one of these goals involves complex coordinated patterns of behaviour, and can be studied as a behavioral system (Hinde 1982). At any one time, an animal is involved in typically one, or at most two or three behavioural systems. In higher animals, a behavioral system involves not just stereotyped reflexes, but also goal-directed behaviour.

To achieve the goals of any behavioral system, complex locomotor problems, problems of navigation, and social problems may need to be solved. For instance, in order to feed, a primate might have to navigate to a food source, negotiate social obstacles of in the form of dominant peers, and then climb a tree to pick fruit.

We assume that there are common ‘modules’ in the brain to help solve these problems across many behavioral systems (Fodor 1983). In particular, to solve immediate problems of locomotion, there is an internal representation, or mental model, of Local Space and Motion – abbreviated as the LSM model – which is closely linked to the visual system.

We then postulate a Social Intelligence Module (SIM) which is used to achieve social goals, resulting from any behavioral system (eg reproductive, feeding, attachment..). Jackendoff (1992) has proposed a similar ‘faculty of social cognition’. Since social situations can depend on sense data of any modality (vision, smell, hearing,…), the SIM must receive inputs from all sensory areas of the brain, many of them via the LSM model.

This paper presents a formalism and a theory to analyse the workings of the SIM – in particular, to analyse the learning problem of how social knowledge is acquired. The modularity assumption helps to keep this social learning problem within tractable bounds, by assuming that certain hard problems of learning are already solved by other modules of the brain.

For instance, to learn directly from complex, multi-dimensional sets of input stimuli (such as visual data) there are problems of individuation (deciding which features in the visual field relate to some individual entity or part-entity in the environment), and categorisation (deciding which subspaces of the input space form significant clusters, and which are categorically distinct). Categorisation may involve hierarchically-structured taxonomies. Learning in visual, spatial and olfactory domains depends crucially on solving such problems.

The problems of individuation and categorisation occur in many domains of cognition; they arise for (and are solved by) many non-primate species, and so in evolutionary terms were probably quite well solved (within visual, olfactory, locomotor and other modules of the brain) well before the period around 50 million years ago, when primate social life started to become complex. I therefore assume that feature individuation and categorisation are solved by other modules in the brain, which deliver categorised, individuated symbols to the SIM. Its role is to learn and use social knowledge in the newly-complex domain of kin, alliances, etc.

This assumption is doubtless an approximation, but is a necessary one in order to proceed to a first understanding of the SIM. As we shall see, the social domain has enough complexity of its own, without mixing in those other challenges; maybe a later theory will tackle the interactions – how the SIM itself may contribute to individuation, categorisation, and so on.

2.2 The Structure of the Social Domain

A good strategy in many domains of cognition seems to be to form internal representations of situations in the domain; running an internal simulation of external reality is a low-cost way to check the consequences of possible actions, before doing them for real (for some relevant considerations, see (Vera & Simon 1993), and the responses to their article, and (Worden 1995c)).

To apply the idea of internal representation to the social domain, we first list some important properties of social situations; the theory will use internal representations which match these properties. I shall use examples from a hypothetical troop of monkeys with Roman names; and will contrast the social domain with the spatial/physical domain represented in the LSM. The social domain is:

(S1) A Structured Domain: A social situation is not just an unstructured set of components (such as Romulus, Remus, Portia, and threatening); it is important that Romulus is threatening Portia (rather than Remus threatening Romulus). The structure and interrelations between the components are crucial.

(S2) A Systematic Domain : If it is possible for Romulus to threaten Remus, then it is equally possible for Remus to threaten Romulus. The set of possible social situations is a systematic set, which we can enumerate systematically (Fodor 1987; Fodor & Pylyshyn 1988); so is the set of possible causal relations between situations.

(S3) A Productive Domain: The set of possible situations is very large. If there are many individuals in a monkey’s group, then any subset of them can be involved in the current situation; they can be in many different binary relationships (grooming, fighting, mating,…) and each one may have many attributes (large, male, hungry, angry,….). This makes a combinatorially large set of possible situations; and the set of possible causal relations (situation A causes situation B) is even larger.

(S4) A Domain of Discrete Values: A monkey’s social milieu involves discrete, identified individuals, who tend to be in discrete, all-or-nothing relations to one another (two monkeys either are siblings, or they are not); and their behaviour tends to be discrete, as defined by their on/off behavioral systems. A monkey is feeding, or not; is is oestrus, or not; and so on. Many of the key variables describing the social situation are discrete variables, each with a few possible discrete values. (The categorisation to find these discrete values is done outside the SIM.)

This is a key difference between the social and spatial/physical domains. Physical situations also are structured, systematic and productive; but they are described by continuous variables such as sizes, distances and velocities.

(S5) Causal Relations Hold Over Long Intervals : The interval between social cause and effect may extend over minutes, hours or days. Remus, being intelligent, can remember for long periods and may bide his time. This is a second major difference between social and physical domains; in the domain of local physical movement, cause generally follows effect within a fraction of a second.

(S6) Generalisations Across Individuals are Important : Many causal regularities, such as “When X makes a distress call, X’s mother will react” (Cheney & Seyfarth 1990) are generalisations across individuals; X may denote any juvenile in the troop. These generalisations are very prevalent and important in primate social life.

(S7) There is Chaining of Cause and Effect : If A causes B, and B causes C, then effectively A causes C. This can be used both for anticipation of outcomes and for planning one’s own actions.

2.3 Cognitive Models of Social Intelligence

We next compare these seven properties (S1-S7) with four possible classes of cognitive model, to see how well they match:

A. Conditioning Models such as the Rescorla-Wagner (1972) model do not capture the structured, systematic and productive character of social situations (S1-S3), because they represent each causal relation by a single local coupling strength; there is no representation of the structure of the relation, or systematic enumeration of possible relations.They can represent discrete values (S4) causal relations over long intervals (S5), chaining of cause and effect (S7), but have no way of discovering or representing the generalisations across individuals (S6) which are important in social cognition.

B. Neural Net models (Rumelhart 1991; Denker et al 1987) do not capture the structured, systematic and productive character of the social domain (S1-S3) (Fodor & Pylyshyn 1988). While they can generalise from examples, they have no special sensitivity to the generalisations across individuals which are important in the social domain (S6); most neural nets would not form such generalisations without extensive and exhaustive training data (e.g. thousands ofexamples) which is not available to the average primate in its lifetime.

C. Mental models (eg analogue representations of local space and motion such as the LSM: Johnson-Laird 1983) are probably used by higher animals to predict the movements of objects around them and to plan their own movements – for instance in hunting. To the extent that this spatial/physical domain resembles the social domain (as in properties S1 – S3 and S7) these mental models are suited to the social domain. However, they are not sensitive to many of the key variables of the social domain (eg kin and status relations) (S4) or generalisations across individuals (S6), and do not model causal relations which hold over long time intervals (S5). Also, a detailed, continuous space-time model of a situation would be an overkill to represent a few simple discrete social facts.

D. Symbolic Processing (Charniak & McDermott 1988) has the structured, systematic and productive character needed for the social domain (S1-S3). It is also well suited to handle the discrete values involved in social situations (S4), the generalisations across individuals (S6) and the chaining of cause and effect (S7). It has no intrinsic bias against representing causal relations which hold over long time intervals (S5).

The match between features of the social domain and these styles of computational model is summarised in table 1.

Assoc.Condit NeuralNets MentalModels SymbProc.
S1 Structured y y
S2 Systematic y y
S3 Productive y y
S4 DiscreteValues y y y
S5 Long TimeIntervals y y y
S6 Generaliseacross indivs y
S7 Chaining ofCause/ Effect y y y y

Table 1: The match between the social domain and four styles of cognitive model

Symbolic processing techniques, as developed in Artificial Intelligence to handle problems such as planning, language and logic (Charniak & McDermott 1988), are well suited to the social domain. Furthermore, there are well-developed theories of symbolic learning.The model of social cognition which we shall describe is largely symbolic, but is not simply a symbol processing model; it takes important features from the other major styles of computation. Like mental models, it uses internal representations of the external situation; and like conditioning models, it can learn regularities from very few training examples, using a statistical criterion of sufficient evidence.

3 A Theory of Social Intelligence

3.1 Structure and Meaning of Scripts

I shall describe the theory at Marr’s (1982) algorithmic level – an abstract description of information structures and operations on them – not going to the implementation level to consider possible neural realisations (that is probably the level at which neural nets are relevant, as components of the SIM).

The SIM uses sense data of all modalities, and is concerned with discrete-valued information (S4), which can be encoded concisely. We assume that the sensory systems of the brain and the LSM model send concise, discrete information to the SIM; for instance, the visual cortex and LSM reduce the great volume of information from the eyes to a much smaller volume of output information, containing, for instance, discrete tokens whose meaning is essentially “leopard, over there” or “Remus, angry”. Similarly for the auditory cortex, and other sensory modalities. Categorisation and individuation problems are solved in these other brain modules, in this approximation.

In like manner, the outputs of the SIM consist of concisely encoded command symbols such as “attack Romulus” or “run away” or or “submit”; the conversion of these high-level commands into detailed motor sequences, changes in hormone levels and so on, is done by other brain subsystems, acting on concise commands from the SIM.

We look for the simplest internal representation of social situations which captures their important properties – the properties (S1) – (S7) of section 2. A script is a tree-like information structure designed to capture these properties. Scripts are derived from the scripts introduced by Schank & Abelson (1977) and are notationally similar to them (and to many other AI knowledge representations). They differ from Schank’s scripts in having a precise mathematical theory for their learning and use, which can be used to show that these scripts are an optimal solution to the problem of social cognition – giving the best possible fitness under defined conditions. There is not space here to present the mathematical theory of scripts, or the proof of their optimality; these are the subject of a later paper. Here we use examples to illustrate the key properties of scripts, and to show how they are used for social learning and intelligence. Any sequence of primate social events can be represented as a script, such as that in figure 1.

Figure 1: A script Frepresenting a simple sequence of social events. Node types are denoted by sr:script node; se: scene node and en: entity node.

This script shows a sequence of two scenes. In the first scene, the monkey which ‘owns’ this script (who is denoted by the identity ‘self’) bites another monkey, Portia. In the second scene, as he is eating a nut, Portia bites him back. The whole script is denoted by a symbol Fwhich will be used later.

The script is constructed of nodes (circles in the diagram) connected in a tree-like structure. The tree is rooted at the top script node, to which are connected several scene nodes – each one denoting the events happening at a place and time. The arrow between the scene nodes indicates that one scene precedes the other.

Below each scene node are entity nodes, denoting animals (peers of the script owner) or things. Each node has some slots, each with a value denoting some property of the node. These are shown as slot:value pairs written next to the node. A slot typically has a small number of allowed discrete values (eg gender can be ‘male’ or ‘female’). The slot ‘id:’ denotes the identity of an individual.

Further nodes are used to denote binary relationships between individuals and other entities – relationships such as grooming, eating, mother-of, and so on. The slot ‘rel:’ describes which relationship is involved; it too has a few discrete allowed values.

Using suitable slots and values, scripts can describe social situations and sequences of some complexity; there is no limit to the size of script trees. Many important facets of primate social behaviour can be described using simple scripts, such as the examples in this paper.

Scripts embody by design several of the properties (S1) – (S7) of social situations. They are a structured representation (S1) using tree structures and linking information (in slots) with individuals (nodes). They are systematic (S2), in that there is a systematic set of possible tree structures; and productive (S3) in that the number of possible script trees grows exponentially with their size. Finally, as slots have discrete values, scripts are a discrete-valued representation (S4). It is hard to envisage any more concise information structure which could capture these important properties of the social domain.

3.2 Factual Scripts and Rule Scripts

In the theory, each primate continually forms script representations of the social events which he or she observes. These are called factual scripts, and form a sort of historic record of the primateÕs life (or recent past). The purpose of having this representation is to predict likely social outcomes before they happen, and take appropriate actions. To predict outcomes, you need to know the causal relations by which the present influences the future. We need a flexible and expressive way to represent both general and local social causal laws. Scripts also provide this representation, in the form of rule scripts.

Suppose that the same monkey as in figure 1 also observes the sequence described by the script Fof figure 2. Again he bites another monkey, and again he is bitten back.

Figure 2: A script Frepresenting another biting incident, involving the same monkey ‘selfÕ

There seems to be an underlying regularity here, of the form “If you bite someone, he or she will bite you back.” This regularity is represented by the rule script R in figure 3.

Figure 3 : A rule script R which underlies the examples of figures 1 and 2

This rule script R is interpreted like the factual scripts Fand F, but with the following extensions:

– Every rule script has one or more cause scenes and an effect scene; it says that if the cause scenes occur, the effect scene is likely to follow, with a probability defined in the rule.

– The effect scene may follow some time after the cause scenes.

– A generalisation across individuals is expressed by using a wild card identity (the slot id:?X) on two nodes of the script. On its first occurrence, the slot ‘id:?X’ effectively means ‘any individual’; on its other occurrences, it means ‘the same individual’. Wild cards are like variables in algebra, or in programming languages such as Prolog (Clocksin & Mellish 1979).

Rule scripts embody two further important properties of the social domain; they allow us to express causal relations which act over long time intervals (S5) and generalisations across individuals (S6).

3.3 Social Planning and Prediction – Applying Rule Scripts

Suppose that a monkey has a number of rule scripts R, S, T …, similar in form to the rule script R of figure 3, each describing some causal regularity of monkey social life. These can be used in several ways to guide his social actions:

(1) Prediction : Suppose that the factual script F, which describes the current situation, matches with the cause scenes of a rule script R. This means that the rule R is applicable to the current situation, and the effect scene of rule R predicts what will ensue; the monkey may then take appropriate action to anticipate what will ensue.

(2) Forward Planning : Suppose a monkey is considering some action, after which the current situation will be FÕ. Again, if FÕ matches the cause scene of some rule R, the effect scene of R predicts what will ensue from his action, and may indicate that the action should or should not be taken.

(3) Goal-directed planning : suppose a monkey has a social goal which can be described by a script G. Now if G matches the effect scene of some rule R, the cause scenes of R may indicate what the monkey needs to do to reach the goal – to bring about the required effect.

Clearly, then, having a good set of rule scripts can be a major asset in predicting and exploiting social situations. I shall illustrate just one of these cases, the case (2) of forward planning.

Suppose that the same monkey as in the previous examples is considering biting yet a third monkey. His intention to bite is described by the script FÕ of figure 4; but suppose he also has the rule script R of figure 3. He may use this script to anticipate the consequences of his intention FÕ. By the test of script inclusion, he may realise that the rule script R matches the script FÕ which would arise if he carried out this intention to bite; he can then unify the rule script R with his intention script FÕ to find out the likely consequence.

Figure 4: A script FÕ describing an intention, to bite someone else.

Unification is a process of matching two scripts, node by node, to get the maximum possible overlap and including all the information from both scripts in the result; it cannot be done if the two scripts have conflicting information. It is much like unification in Prolog (Clocksin & Mellish 1979). The result of unifying R with FÕ is written as R U FÕ, and is shown in figure 5.

Figure 5: The result FÕ U R of unifying the scripts in figures (3) and (4), to calculate the consequences of the rule R in situation FÕ.

Unifying with the rule script does not alter any of the information in FÕ, but adds to it the information implicit in the rule R – drawing out the consequence that Cassius is likely to bite back. In this way, the monkey may anticipate the consequences of his actions, and save himself injury.

Prediction by script unification can be taken further, as the ‘effect’ scene of one rule may match with the ’cause’ scene of another rule; then the second rule script can also be unified to predict a further consequence. Similarly the backward chaining of rules for goal-directed planning (from a desired goal to the required actions) can be chained through several steps if necessary. Thus the rule script mechanism embodies the chaining of cause and effect (S7) in social encounters.

Script unification is similar to the firing of a production rule, as in many AI systems, and as used by Byrne (1993) in his formal notation for primate social intelligence. Scripts can express the same information as these production rules. They are also similar to the scripts introduced by Schank and Abelson (1977) to describe childrens’ social knowledge. In AI terms, therefore, the use of scripts for planning and prediction is not new (apart from its application to the social domain).

3.4 Learning Rule Scripts

Having a notation to describe primates’ social knowledge, and a mechanism for them to apply that knowledge, does not yet give us a predictive theory of primate social behaviour. We might still endow a primate with an arbitrarily powerful set of rule scripts, giving it great (and unrealistic) powers of social anticipation. The theory is not predictive until we include a theory of social learning – so we can predict, from a primate’s previous history, what particular set of rule scripts it is likely to know.

Some rule scripts may be an innate part of primates’ cognitive makeup. Innate scripts cannot be assumed without limit; an arbitrarily powerful innate endowment of innate scripts would make the theory non-predictive. I shall return to the issue of innate rule scripts in section 4.6; for the moment we shall assume that there are very few innate scripts, and that the majority of useful rule scripts are acquired by learning.

Making a good cognitive model of script learning is harder than modelling the use of scripts. It is an example of a class of problems which have been extensively studied in AI and machine learning – the class of concept learning problems, where some complex concept, or structure (such as a production rule, or rule script) must be induced from examples (Michalski 1986). The theory described here embodies a concept learning procedure which, under well-defined but fairly broad conditions, is optimal for the primate social domain. This is the form of learning which gives best possible fitness, and which we would therefore expect to evolve under the pressure of primate social competition. It is compared with other computational models of concept learning in section 6.2.

The problem of learning rule scripts consists of two sub-problems:

(1) Finding candidate rule scripts : The space of possible rule scripts is a very large one – the number of allowed rule scripts, even including just the simpler structures, may run into many billions. Some means is required to find a few good candidate rules to investigate, out of this vast space of possibilities.

(2) Knowing whether to ‘believe’ a candidate rule script : In this regard, there are two possible penalties from poor performance: (a) the penalty of not believing a true rule script (and therefore failing to apply it for social planning and prediction), and (b) the penalty of believing some untrue rule script (which is not a true causal regularity of your social milieu, but which appears to be true because of fluke events). Both these penalties lead to decreased fitness, and the learning mechanism needs to minimise the combined penalty of (a) and (b).

To solve both these problems, the concept of the information content of a script is important. For any script S, its information content I(S) can be approximately calculated as a sum of the information content on each node, which in turn is a sum of the information content from each slot on the node; for instance a slot ‘gender: male’ contributes one bit to this sum. From inspection of examples, typical primate rule scripts appear to have an information content in the range 20 – 100 bits.

If there is some rule script R which underlies a factual script F1, then all the information in R is also contained in F1, but there may also be extra information in F1 which is not in R; in this case we say that F1 includes R, written as F1 incl R. (Script inclusion is the inverse of subsumption in logic programming.) If the same rule script also underlies another factual script F2, then similarly F2 incl R. Given only the examples F1 and F2, but not knowing the rule R, what is the most likely form of the rule? Their script intersection, written as X = F1 int F2, is defined as the script with the largest possible information content which obeys both F1 incl X and F2 incl X; thus it is a good candidate for the rule R.

There is a simple procedure to calculate the intersection of any two or more scripts. This involves matching the scripts together, node by node, to maximise the overlap of information, and retaining only the slots and nodes which match, keeping only structure which is common to the two scripts. For instance, the rule script R of figure 3 is just the script intersection of the factual scripts F1 and F2 of figures 1 and 2. Script intersection automatically discovers the generalisations across individuals (creating wild card identities in R) which are an important property of the social domain (S6).

Suppose that a primate has a set of N factual scripts F1, F2,…. Fn, recording his recent social history. Form all the script intersections (Fi int Fj) between pairs of factual scripts. If two scripts Fi and Fj do not arise from some common underlying regularity R, then any similarities between them are mere coincidence, so the information content of their intersection Fi int Fj will be very small. If, however, Fi and Fj arise from the action of a rule script R, their intersection obeys (Fi int Fj) incl R, and must therefore have at least the information content of R; in fact (Fi int Fj) will be a good approximation to R, having only a few extra bits of information from other, coincidental, similarities between Fi and Fj. So taking pairwise intersections of factual scripts, and keeping only those results whose information content is above some threshold, is an effective and efficient way to find candidate rule scripts, giving a practical solution to sub-problem (1). Any rule whose effects have arisen more than once will be found in this way.

However, even if some candidate rule script seems to be indicated by two examples, it might have arisen just from spurious coincidences between those two incidents. A primate which accumulated many spurious rules, and acted as if they were true, would be at a disadvantage. How many examples are needed to ‘believe’ a rule script – without falling into the opposite trap of being an over-cautious slow learner ?

There is a Bayesian probabilistic criterion for learning, which minimises the combined penalty of (a) being a slow learner of true rules, and (b) believing spurious rules. Since there are many millions of possible rule scripts, and only a finite number of them are actually true, the prior probability for any rule script R to be true is very small; we model this small probability approximately by a form P(R) = C 2, where l is of order 2 or 3, thus penalising complex rule scripts with large I(R). Then if a set of factual scripts {F} appears to indicate some rule script R, we calculate the probability that R is true in the light of this evidence, in the usual Bayesian manner – comparing P(R) P({F}|R) with P(not R)P({F}|not R). In this way we can calculate the average expected penalties from (a) failing to believe a true R and (b) believing a spurious R, and minimise the sum of these penalties.

The result of this Bayesian analysis is that most rule scripts can be believed as soon as they have occurred in a rather small number – typically fewer than half a dozen – of examples in the set of factual scripts.The learning procedure is very fast, being able to learn a rule script from a few examples (any faster learning is not useful, because it would incur a greater penalty of learning spurious scripts). This fast learning contrasts with that given by neural nets and other reinforcement learning techniques, which typically require thousands of training examples to learn a regularity.

Note that the prior probability function favours simpler rule scripts, giving animals a kind of Occam’s Razor-like tendency to believe the simplest set of rule scripts which can account for their experience; any extra rule is only believed when the evidence for it is statistically significant. At the same time, however, script intersection finds the most complex (information-rich) possible rule underlying two or more factual scripts; this enables animals to learn complex rules if they are true, and not to over-generalise.

This subtle tradeoff in the learning procedure allows an animal to learn both general rules and more specific exception rules at the same time, if both are true. For instance, it can learn the general ‘retaliation’ rule of figure 3, and a more specific rule that some individual (eg Claudius) tends not to retaliate. More examples are required to learn both a general rule and an exception, and the theory predicts how many examples are required. In this way primates can rapidly learn the important regularities of their social milieu.

3.5 A Consequence of the Learning Theory

While this learning mechanism is very efficient – learning most rule scripts from just a few examples – it has one simple consequence which may be important for experimental and observational studies. It implies that primates cannot learn any complex rule script from just one example. This result follows in the theory for two reasons:

1. The prior probability of a complex rule script being true is so small, that just one example cannot ‘overcome’ this small probability; it is more likely that the one case arose just by chance, so the rule should not be believed.

2. With only one example, there is no way to prune away irrelevant information about things which just happened to be going on in the example script, separating it from information which is genuinely involved the causal relation; so the resulting rule is likely to be too specific to be useful. (With two or more examples, script intersection is a very efficient way to prune out irrelevant information).

These reasons do not depend on the precise details of this theory, and may also hold in many other theories of social learning. The constraint only applies to complex scripts, with fairly large information content; simpler scripts might have such a large prior probability that they can be learnt from one example, just like taste-nausea conditioning in rats (Dickinson 1980) or may even be innate. However, this constraint against one-shot learning does apply to the kinds of complex scripts which would be needed, for instance, for tactical deception (Byrne & Whiten 1990, 1992).

3.6 Scripts in the Architecture of the Brain

In this theory, therefore, the primate Social Intelligence Module (SIM) continually receives pre-categorised, symbolic inputs from other cognitive subsystems such as the visual system. It arranges these inputs into factual scripts which form a record of the primateÕs social life. The factual scripts are continually input to the rule learning procedure (in 3.4 above) to find out new rule scripts, as soon as the evidence for each one becomes significant. At any moment the whole stock of rule scripts (learnt so far) can be used for prediction and planning of social actions, as described in section 3.3. This results in the SIM sending outputs to other motor subsystems, to execute the actions required by the SIM.

All this may take place as an automatic computation in the SIM, not necessarily linked to conscious awareness. Since our own conscious awareness is generally awareness of sense data (eg visual images, sounds of words) rather than of abstract symbolic structures like scripts, it seems likely that the SIM itself is not in conscious awareness; although it may cause activity in other brain modules, such as the LSM model, which does result in awareness.

An adult monkey may have many hundreds of rule scripts as well as the factual scripts from its experience. At any one time, typically only two or three of the rule scripts may apply. Any interference from other rule scripts would tend to lead to wrong conclusions.

This suggests that there are at least two distinct logical components to the SIM – a processing module where the script for the current situation is constructed, and a few appropriate rule scripts are unified with it to plan and predict, and a long-term memory where all rule scripts, and the historic scripts which are intersected together to form rule scripts, are stored. The long-term memory has a retrieval capability, to retrieve into the processing module just those scripts likely to be relevant to the current situation.

The script operations of unification, intersection and inclusion form a neat mathematical structure – the script algebra – which is similar to elementary set theory. A typical relation of the script algebra, true for any two scripts A and B, is that A = A È (A Ç B). These relations help to guarantee the self-consistency of the whole theory; for instance, if a rule script R is induced by script intersection from example scripts A, B, C, then the algebra shows that this can be done in any order, and R will not conflict with the examples which gave it.

Are scripts a declarative or a procedural knowledge representation? They can be anywhere along the spectrum between the two. A script with many scenes may represent a fixed procedure to achieve some goal. The same knowledge may also be represented as several smaller scripts (each with fewer scenes) which can be unified together to reach the same goal; but the smaller scripts are more like declarative pieces of cause-effect knowledge, and can be used more flexibly than the single large script. Finally, as we shall see in the next section, a script can represent a purely declarative piece of factual knowledge.

This script theory is distinctive in linking together the operations for inference (script unification) with the operation for learning (script intersection) in a tight, self-consistent structure, to make clear predictions about what can be learnt, how fast it can be learnt, and how it is used.

4. Comparisons With Observation

We shall compare the script theory with some examples of primate social intelligence, particularly of vervet monkeys. Some examples which can be analysed in script terms are:

4.1 Using Kin Relations

Cheney and Seyfarth (1990) have made a series of observations on vervet monkeys, using hidden loudspeakers to replay various types of call of specific individuals to others in the group, in their natural surroundings. In one of these experiments, they replayed the screams of infant vervets to groups of females, including the infant’s own mother and controls.

Vervets can recognise the calls of individuals in their troop, and mothers generally go to help their infant if a scream indicates that juvenile play has got too rough. As expected, the mothers consistently paid more direct attention to the replayed calls of their own infants than did the controls. More interestingly, when a particular infant’s call was replayed, the control females would look towards that infant’s mother, often before the mother herself had responded.

Dasser (1987) has shown in laboratory conditions that monkeys know kin relations of others in their group. The control females’ reaction shows that they can combine this knowledge with other general knowledge (that mothers respond to their childrens’ calls), to anticipate who will respond in a particular case.

Factual knowledge of a kin relation can be embodied in a script, such as that in figure 6(a), which states that Shelley is Profumo’s child. This is a script in the mind of some other monkey (not Shelley or Profumo). I shall not discuss here how they learn these relations, although a script-based account can be given.

Figure 6: (a) A factual script, which says that Shelley is the child of Profumo ; (b) A typical incident of an infant screaming, and some individual paying attention; (c) the general rule which can be learnt from such incidents.

These kinship fact scripts are so important that, we suppose, they are continually and automatically unified in with the script of current scene – so that whenever any monkey observes Shelley, he or she automatically includes the fact that Profumo is Shelley’s mother in the script. A typical scene of an infant screaming, and some individual going to help, would be encoded as in the script of figure 6(b). The knowledge of who is the infant’s mother has been automatically included, by unifying a directly observed script with the script of figure 6(a).

After observing several scenes like figure 6(b), with different infant/mother pairs, taking the script intersection of these will give the rule script of 6(c) – that when any infant screams, his mother pays attention.

Suppose that the control animals in Cheney and Seyfarth’s experiment had learnt the factual script of figure 6(a) – that Profumo is Shelley’s mother – and the rule script of figure 6(c). Hearing Shelley’s scream, they made a script “Shelley screams”; unified in the script 6(a) “Shelley is Profumo’s child”; and then unified in the rule script 6(c) “Mothers pay attention to their infants’ screams”, to correctly deduce “Profumo will pay attention”. Thus they looked towards Profumo with this expectation.

4.2 Habituation to calls

Cheney and Seyfarth have shown that if a vervet habituates to a call from a certain individual, it does not thereby habituate to the same call given by different individuals, or to completely distinct calls from the same individual. There is habituation to similar calls given by the same individual, but ‘similarity’ depends on the denotation of the call, rather than acoustic similarity.

We can use the script theory, first to give an account of the meanings of calls, and then to describe the learning processes which (a) give calls their meaning to vervets, and (b) explain some of the habituation effects described by Cheney and Seyfarth.

Consider two different calls – vervets’ ‘wrr’ and ‘chutter’ calls – which are acoustically distinct but tend to be given in similar circumstances, when members of another group are seen. There must be at least two scripts associated with each call – one script which causes monkeys to make the call when appropriate, and another script which they use when hearing it. For the ‘wrr’ call, these scripts are shown in figure 7.

Figure 7: (a) The script which causes a monkey to utter a ‘wrr’ call when seeing a monkey from a different group; (b) the script activated in another monkey’s mind when she hears the call

Figure 7a is the simplest script which could cause a monkey to utter a ‘wrr’ call on seeing a monkey from another group. Slots which are in effect ‘executive commands’ from the SIM to other cognitive subsystems, to cause a monkey to do something, are marked with a ‘*”. Thus the *call slot in figure 7(a) is a command slot which causes the monkey to give a ‘wrr’ call.

Figure 7(b) is the simplest possible script which could enable a monkey to understand the meaning of a call – to convert a perception that the call has occurred to an expectation of an alien monkey.

The meaning of the ‘wrr’ call in a monkey group depends on both these scripts existing in the brains of all monkeys; for the ‘wrr’ call to serve as a useful communication, these two scripts must stay in line, associating the call with the same referent. The same applies to any other call. If, for instance, a call-giving script depended on one stimulus, whereas the call hearing script for the same call mentioned another, that call would systematically mislead, and so might not enhance vervets’ survival.

We might hypothesise that both scripts are innate, and that natural selection has ensured that they stay in line, with the same meaning. However,there is an alternative hypothesis, that at least the ‘hearing’ script of figure 7(b) is learnt; we can investigate that alternative.

A vervet will observe many occasions when some other vervet gives a ‘wrr’ call, and a member of another group is present. By forming scripts of these occasions, and intersecting them together as described in section 2, she will learn just the script of figure 7(b). If, for every call, the ‘hearing’ script analogous to that of figure 7(b) is learnt rather than innate, this guarantees that the meaning of each call in its two scripts stays in line.

Given that the ‘understanding’ script 7(b) can be learnt (and if ‘wrr’ calls are made, will be useful to the monkey) then one hypothesis is that the ‘calling’ script 7(a) is innate, and evolved through kin selection effects. (A more complex case, where the calling script is not innate, is analysed in the next section).

Then if a ‘wrr’ call from a particular individual (eg Brutus) is repeatedly played in circumstances when no monkey from another group is present (ie when the call is misleading) the same learning mechanism will lead the hearer to learn a more specialised rule script – that when the caller is Brutus, no monkey from another troop is present. A monkey can learn, and use, both the general script of figure 8b and the exception script at the same time.

Figure 8: (a) Script for innate fear of birds; (b) Script for giving an alarm call.

This can give rise to the habituation effects observed by Cheney and Seyfarth. It gives a simple account of the observations that:

(a) A monkey can habituate to a particular call by a particular individual

(b) Habituation to one call by one individual does not cause habituation to the same call by other individuals

(c) Habituation to one call by one individual does not cause habituation to completely different calls by the same individual.

The final observation – that habituation to ‘wrr’ leads to habituation to ‘chutter’, which is acoustically distinct but has a very similar referent – can be understood within the script theory, but not so simply; possible explanations depend on some detailed considerations and parameters.

Whenever presented with data which are consistent with one ‘target’ script, there is some tendency to learn more general scripts (which include the target script) at the same time. So if, for instance, there is a ‘wrr or chutter’ script which is only slightly more general than a ‘wrr’ script – if, for instance, ‘wrr’ and chutter’ are subclasses of the same class of call – there will be a strong tendency to learn or habituate to the more general script.

In this way (or others) the theory can be made to accommodate this last finding, rather than giving an immediate and satisfying account of it.

In general, however, the script theory gives a fairly satisfactory and economical theory of the evolution, learning and use of vervet monkey calls. It gives a minimal computational theory of the meaning of the calls, without, for instance, having to postulate that vervets represent the knowledge of others or intend to influence the knowledge of others – or even intend to influence the behaviour of others. Vervet meaning may be much simpler than human language meaning.

4.3 Learning Alarm calls

The adult vervet’s “eagle alarm” call is highly specific, given only on seeing those raptors which prey on vervets. Young monkeys’ eagle alarm calls are initially non-specific – being triggered by any bird, not just predators. However, they soon learn to be specific – long before they have seen enough predator attacks to learn directly which species are predators; it appears that they learn from the responses of older peers,who ignore their false alarms (Seyfarth & Cheney 1980).

To analyse this in the script theory, we need to postulate several different scripts, some of which are innate. Assume that:

(a) Vervets are born with an innate fear of birds. This is summarised in the script of figure 8a, where the slot *fear is a command slot (from the SIM to the monkey’s autonomous nervous system, endocrine system and so on) to show the symptoms of fear.

(b) Being fearful in the presence of a bird leads a vervet innately to utter an ‘eagle alarm’ call. This is summarised in the innate script of figure 8b, which also contains an executive command to give the call.

(c) Any fear reaction is enhanced or diminished by knowing whether one’s peers are fearful; this is summarised by the scripts of figure 9. These say “If your peers are frightened, you should be too” and “If your peers are not frightened, you need not be”.

Figure 9: a pair of scripts instructing a primate to show fear (or not) depending on whether his peers are showing fear

For a young vervet, scripts 8a and 8b will together lead it to give eagle alarm cries to any bird – as observed.

However, as it grows, it observes instances in which a martial eagle appears and its peers are very frightened, and other instances in which, for instance, a vulture appears and its peers are not at all frightened. Combining these instances by script intersection, it will learn the scripts of figure 10 – that martial eagles always frighten its peers and vultures do not.

Figure 10: Learned scripts to the effect that (a) martial eagles always inspire fear in one’s peers (b) vultures do not.

These learnt scripts can then unify with the script of figure 9 to alter the monkey’s own level of fear appropriately [1]. For a vulture, the anticipation of an un-scary situation will damp the fear reaction enough to suppress the alarm call; for a martial eagle, the reverse will occur.

This gives a script-based analysis of how vervets learn, from their peers’ reactions, which birds are worth fearing. The explanation is not unique; we could devise alternative explanations, and express them too in the script notation. It is not entirely black-and-white; it depends on graded quantities such as ‘level of fear’ and on how different scripts influence this quantity.

It also leaves some questions open. Suppose a group of vervets became unnecessarily afraid of some harmless bird – would this fear be propagated socially from generation to generation for ever? There must also be mechanisms whereby, in the long term, the real predatory habits of birds influence vervets’ fear of them.

The proposed script mechanism makes specific predictions as to how long it will take a young vervet to learn that a given species of bird is (or is not) feared. It predicts how many examples (typically a rather small number) must be observed to reliably learn a script such as that in figure 10a or 10b. We can start to compare these numbers with observations.

4.4 Rank and Alliances

In most primate groups there is a defined rank ordering of animals, which determines access to key resources, for feeding, reproduction, shelter and so on. The effects of rank are greatly complicated by alliances (Harcourt 1988), either permanent (eg based on matrilineal kin relations) or temporary; if one has a high-ranking ally one may, for short periods, be able to enjoy some of the privileges of high rank oneself. In most monkey groups, individuals of lower rank attempt to form alliances with those of higher rank, for instance by grooming them. We can describe many aspects of this behaviour in script terms.

The relative rank of two individuals defines how they interact with one another in a large number of ways – which one gives way to the other, and so on. So there are many ways in which a primate, observing two others together, can judge which one is of higher rank. A typical rank-judging script is shown in figure 11a. A typical fact about individual ranks, which can be learnt using this script [2], is shown in the script of figure 11b. If Cassius retreats from Caesar, then Caesar must out-rank Cassius.

Figure 11: (a) A rule script which can be used to learn about rank from behaviour; (b) a typical fact about rank which can be learnt in this way.

In this respect, learning about rank is much like learning about kin relations. Some rank-determining scripts, such as that in figure 11a, may be innate; but other similar scripts, describing other accompaniments of rank, may be learnt.

The rank facts such as that in figure 11b are very important for a monkey. In a troop of N monkeys, there are N(N-1)/2 rank facts to know, and it might be a disadvantage for an individual to have to learn every one of them by observation; it might take a long time to observe all the necessary dyadic interactions. As has been discussed by several authors (Cheney & Seyfarth 1990; d’Amato and Colombo 1988) it would be useful to use the fact that rank is transitive; if A out-ranks B and B out-ranks C, then A out-ranks C. This general rule is easily represented in a script, shown in figure 12.

Figure 12: a script which expresses the fact that rank is transitive.

In this way, a monkey could determine the ranks of all the members of his or her group with comparatively few observations.

We suppose that the facts of rank, such as that in figure 11b, are, like the facts of kin, so important that they are continually, automatically combined with the visible facts of the current scene (by script unification), so that rank-dependent rule scripts can then be applied. It seems likely that monkeys have many scripts enabling them to judge rank, to know when and how to challenge it, to make alliances, to stop others making alliances, to exploit alliances and to call for help, and to know when it is worth helping an ally.

Monkeys may have an innate goal script – to try to increase their own rank – and many learned scripts to call upon to achieve it. Gaining rank is an autonomous goal within the SIM itself, rather than a goal defined by some other behavioural system.

4.5 Primate Emotional Responses

If emotion is regarded as a set of bodily responses (endocrine, expression, posture, vocalisation…) ensuing from a cognitive appraisal of the present situation, this is largely an appraisal of the social situation and its possibilities. Therefore many emotions arise from appraisal of the current situation by scripts in the SIM. In this view, many rule scripts (both innate and learned) result in emotional responses. When the current script matches the rule script, it is unified with it, causing the response.

We can give a script-based account of many aspects of primate emotion, including, for instance, the attachment behaviour which Bowlby (1969) noted is common across many primate species, including man. For instance, one initially puzzling aspect of attachment behaviour is the fact that infants of many species seem to show stronger and more persistent attachment behaviour to a parent who rejects them, than to a more loving parent.

As Bowlby (1980) has described, the attachment response (a goal to be close to a caregiver) is enhanced in situations of stress and anxiety. This serves a sound evolutionary purpose, because those situations (eg when a predator is near) are just the situations when a caregiver is likely to be most useful. This can be described by an innate script, shown in figure 13b. On the other hand, a rejecting parent is likely to cause anxiety. This (also innate) reaction is described by the script of figure 13a.

Figure 13: A script description of anxious attachment; (a) Parental rejection leads to anxiety (b) Anxiety leads to the goal of being close to a parent.

The two scripts of (13a) and (13b) combine to give the observed effect; rejecting parental behaviour leads the infant to cling to the parent. Note that they do not combine directly by script unification, as the slot *anxiety on the first script is a command slot which sets off the bodily symptoms of anxiety, while the slot ‘anxiety’ on the second script refers to perceiving those bodily symptoms; the two slots are distinct, and do not unify together. The chain of cause and effect runs through the body.

In this way, script theory could be used to build a principled computational model of emotional response (innate and learned) in typical primates such as monkeys, before going on to tackle the much more complex emotional responses (in chimps and mankind) which ensue when one appraises not only the actual situation, but also what others may think about it.

4.5 Tactical Deception

Amongst the most suggestive evidence for primate social intelligence are reports of deception, where primates appear deliberately to mislead one another. These reports are open to a wide variety of interpretations, from full-blown ‘theory of mind’ accounts through to basic behavioral accounts. The theory of this paper gives a framework in which possible accounts of some incidents of deception can be framed, without invoking a theory of mind, for comparison with alternative accounts.

Byrne and Whiten (1990) define tactical deception as ‘acts from the normal repertoire of the animal, deployed such that another individual is likely to misinterpret what the acts signify, to the advantage of the agent‘. By compiling data from many observers, Byrne and Whiten (1988, 1990, 1992) have built up a strong body of evidence that this kind of behaviour is widespread in some primate species, rare in others. It is most common in Cercopithecines (vervets, macaques and baboons) and in the great apes, particularly chimps.

Byrne and Whiten group their 253 reports of tactical deception into classes, depending on the evidence in the report. In level-0 reports, intepretations other than tactical deception are possible; for level-1 incidents the evidence for tactical deception outweighs competing explanations, and finally level-2 deception ‘implies that the primate can represent the mental states of others‘ – which requires a primate theory of mind, and so does not fall within the scope of this theory. Reports of level-2 deception are almost entirely confined to the great apes. I therefore assume, for the moment, that great apes have some capacity to represent the mental states of others, so that their deceptions should probably not be analysed in the simple script-based terms of this theory. However, we may use scripts to analyse deception in the Cercopithecinae (for which Byrne and Whiten report 45 incidents of deception at level-1 and above), assuming (as in previous examples) that the cercopithecines use simple scripts without representing others’ mental states.

Byrne (1993) has analysed several of these incidents in a production-rule formalism. Typical of these is his analysis of report. no 104, where a juvenile baboon, to get a food item (a deep growing corm, partially dug out by an adult of rank below his own mother), screamed as if hurt, so his mother came and chased away the adult; when both were out of sight the juvenile then continued to dig out the corm Byrne proposes a production rule of the form:

(need to remove A) & (mother dominant-to A) & (mother out-of-sight) => (scream).

Usually Byrne’s production rules are of this form (pattern) => (procedure), or (X) => (do Y), whereas scripts are of the form (pattern) & (do procedure) => (consequence) , or (X) & (do Y) => (Z) (in the rule script form (cause) => (effect) ); a more declarative form of knowledge, but one which will lead to the same action if Z is a desirable consequence. Apart from this small difference, it is straightforward to translate from Byrne’s production rules to rule scripts, or vice versa. Thus all the production rule analyses of tactical deception have closely equivalent script forms.

The script learning theory makes interesting predictions about the learning of this script (or its equivalent production rule). First, the juvenile could not have learnt the script from just one previous incident, or lucky accident. At least two previous ‘accidental’ successes are needed.

Second, we may ask: how is the qualification ‘mother out of sight’ learned as part of the rule ? Does the baboon need explicit negative evidence (that when mother is present, the trick does not work) to learn the full rule script ?

Following the previous discussion attachment behaviour, we expect that presence or absence of its mother is a very important variable, always represented in a young baboon’s factual scripts. We might also expect that when its mother is absent, its has a greater tendency to scream – giving it more opportunities to learn this rule script. But why should it not learn a more general rule script, which has no qualification ‘mother absent’?A priori , the simpler rule without the qualification is more likely to be true, by the ‘Occam’s Razor’ weighting of the prior probabilities.

Suppose the juvenile has three successful examples, when mother was absent and the trick worked. The script intersection mechanism projects out all the common information in these examples, including the fact ‘mother absent’. This more specific rule ‘explains’ more about these three examples, and so is favoured over a simpler alternative without the qualification (in spite of the smaller prior probability of the more complex rule). The more positive examples accumulate, the more the specific, qualified rule is favoured. This enables it to learn the specific rule, without over-generalising, in the absence of explicit negative evidence.

Pieces of explicit negative evidence – examples of ‘mother present, trick failed’ – are consistent with the specific rule, but do not actually help the baboon to learn it. Only if it experienced some ‘mother present, trick worked’ examples would there be any tendency to learn the more general, unqualified, rule in stead; and this is unlikely to happen, as its mother could see the trick.

Finally, the learning theory helps us to analyse why primate tactical deception is tactical – why it cannot be used more regularly with success. If, on some occasions, the mother can gain evidence that the third party whom she attacked was actually ‘innocent’, these examples would lead her to habituate to her child’s distress call, just as in the discussion of 4.2; the learning theory tells us how many examples are needed. It tells us not only how primates can learn to cry ‘wolf’, but also how their peers can learn to ignore them – all without needing any theory of mind.

4.7 Innate and Learned Scripts

In some of the previous examples, we have postulated certain innate scripts, as a basis from which script learning can begin. This might seem to be an uncontrolled process; could we not postulate as many innate scripts as we wanted, and perhaps even do without any script learning in the theory ? Fortunately this is not the case; there are firm evolutionary grounds to limit the number and complexity of innate scripts. Every script has an information content (typically 20 – 200 bits); if it is to be an innate script, this requires at least that much extra innate information in the design of the brain. Such extra design information can only accumulate, through selection, at a very slow rate. This places a lower bound on the time required to evolve new innate scripts.

In (Worden 1995b) I derived a speed limit for evolution, which bounds the rate at which useful new genetic information, expressed in the phenotype, can accumulate through natural selection. This leads to a quantitative relation between (1) the information content of a script (2) the selective advantage of having it innate, rather than having to learn it, and (3) the minimum number of generations needed to evolve it as an innate script.

If a certain selection pressure leads to differential survival rates of ±D percent per generation, then the evolutionary response to this selection pressure can accumulate useful new information in the phenotype only at a rate of dG/dn bits per generation, where approximately

dG/dn =< D/80 (4.1)

For instance, a selection pressure which leads to variances in survival rate of ±10% can accumulate useful new genetic information in the phenotype at a rate not more the 1/8 bit per generation [3]. This means that the minimum number of generations N needed to evolve an innate script with information content B bits, which gives a selective advantage of D percent must obey

N > 80 * B/ D (4.2)

Probably the simpler scripts involve around 20-50 bits of information; so under a 8% selection pressure, these would take at least 200 generations to evolve as innate scripts. For universal, species-dependent facts (such as those in figures 7 and 8) 200 generations is not a long time; these scripts might well be part of the innate makeup of the brain of any vervet monkey.

However, the scripts of figure 10, since they each mention a particular species of bird, and must depend on the specific sensory cues for that species, probably have an information content of 100 bits or more; and the differential survival value of knowing (from birth) that one species of bird is harmless is probably more like 1% than 10% (as we saw in 4.3, there are ways to learn such scripts, and an innate script only gives extra fitness at ages before this learning can take place). So to make the ‘vultures are harmless’ script innate would take of the order of 8,000 generations. Since primates often depend on their flexibility to colonise new habitats (where different predators prevail), an 8,000 generation evolution time is often too slow; predator-dependent scripts must be learned.

The evolutionary speed limit therefore gives a well-defined criterion for the dividing line between innate and learned scripts. It leads us to expect that a few simple general scripts are innate, but that complex, habitat-specific or group-specific scripts must be learnt.

5. Testing the Theory

From the above examples, script theory seems to be in broad agreement with the evidence. Scripts have the descriptive power to express the kinds of social knowledge which most primates show; the learning mechanism enables them to learn rule scripts rapidly, as primates do; and the mechanism of script unification provides enough inferential power to do the kinds of social reasoning which primates apparently do.

Yet these examples, on their own, leave much to be desired. We can devise a set of scripts, inferences and learning sets to account for each example – but what does this add to what we already knew? Does it bring any new insights, or will it simply adapt itself as required to each new observation? What data might prove the theory wrong ?

The test of the script theory comes not as we devise new scripts to account for each new observation, but when the same scripts appear repeatedly in accounts of different behaviour. (We began to see this in section 4, in the links between call habituation, attachment behaviour and tactical deception.) At that point, the precise computational basis of the theory constrains us, to stop us handwaving or bending the theory ad hoc to account for each new fact. It can then start making definite predictions, which can be proved wrong.

To make these tests, we need first to construct (for some well-studied species) a set of scripts which accounts – to a first approximation – for most of the social behaviour we observe. This would mean constructing the sum of social knowledge for a species; a sort of Primate Social Encyclopaedia expressed in scripts. For a species such as the vervet monkey this might involve of the order of 20 – 50 innate scripts and 100 – 300 learned scripts.

For each innate script, there should be a plausible account of the selection pressure which gave rise to it; and for each learned script, we should be able to observe the examples from which an individual can learn it. So constructing the encyclopaedia is not an unconstrained exercise of invention; in itself it is a useful test of the theory. Doing so will define what nodes, slots and values are needed for the construction of scripts.

This will define a framework and parameters, within which we can consider some specific aspect of social behaviour – such as predator alarm calls, or competition for food – which is describable using only a few (preferably simple) scripts. For that aspect we can use the theory to predict what is learnable, and how fast; and to devise new tests of the theory.

6. Discussion

6.1 Computational Theories of Primate Social Intelligence

Formal computational descriptions of primate social intelligence have been proposed by Byrne (1993), Shultz (1991) and Schmidt and Marsella (1991). Shultz and Schmidt and Marsella are mainly concerned with the higher-order problems of recognising agency and othersÕ plans within a primate theory of mind, rather than the first-order problem of primate social intelligence (without a theory of mind). Only Byrne addresses this issue, in a production rule formalism, so I shall only discuss his work.

Scripts are very similar in spirit to production rules; and as shown in section 4.6, we can make a close equivalence between scripts and production rules for describing any particular observation. The script theory differs from Byrne’s production rule formalism mainly by having a worked-out theory of learning, tailored to the social domain, which Byrne’s production rules do not yet have – but could be extended to have. Alternatively, as in section 4.6, we can simply translate the script learning theory into production rule terms, assuming that any near-optimal theory of production rule learning must have approximately this form.

6.2 Scripts in Human Cognition

The introduction and discussion of scripts by Schank and Abelson (1977) and observations of others (eg Bower et al 1979; Graesser et al 1980) have built up a wealth of evidence that some form of script-like information structure is an important component of human social cognition. In particular Nelson (1978,1985; Nelson & Gruendel 1981) has studied the development of script structures in childhood and its close relation to the development of language.

As noted in the introduction, this computational model has much in common with the models discussed by Holland, Holyoak, Nisbett and Thagard (1986) in their framework for induction. Like their models, it combines elements of scripts, mental models and rule systems, paying attention to how rules are induced and modified through experience. Several other features of the q-morphism models of Holland et al. are shared in this model – in particular, the induction of default hierarchies of rules, rule competition, and the use of statistical criteria of variability to decide when a new rule is supported by the evidence. However, this model does not share some of the mechanisms which they postulate, such as the learning of ‘inference rules’ and analogies. This difference is justified by the fact that their model is designed to account for human cognition, whereas this is a minimal theory, to model the social cognition of primates such as vervet monkeys, which is expected to be much simpler than human cognition.

The evidence for scripts in mankind provides an important corroboration of the idea explored in this paper, that scripts are important in general primate social cognition. At the same time, however, the human evidence is harder to interpret because of two very important, and largely human-specific, complications – the existence of a well-developed theory of mind in mankind, and language. Both of these give the growing chld an enormous advantage over other primates in forming and using scripts, and therefore complicate any analysis of script learning and use. That is why the examples used in this paper (section 4) have concentrated on primates which have neither a theory of mind, or language; they form a simpler test case in which the basic script mechanism can be studied. This basic script theory then forms a starting point from which the later developments – of a primate theory of mind, and language – can be discussed (Worden 1995a).

6.3 Computational Models of Learning

The script learning theory is an example of concept induction – inducing some complex concept or structure (in this case, rule scripts for social causal regularities of a primate group) from examples (in this case, an individualÕs social history, expressed in factual scripts).

Concept induction has been extensively studied in the literature of AI and machine learning over many years (Michalski 1986), and much of this work is directly comparable with the script learning theory. Broadly, one can discern two main flavours of concept induction work – approaches based on computational heuristics, and approaches based on a mathematical analysis of performance.

The space of possible concepts is typically very large, and many computational heuristics have been devised to arrive rapidly at interesting parts of this space. Typical of these are the ‘information gain’ heuristics embodied in algorithms such as ID3 (Quinlan 1986), which builds up a decision tree from its root by putting the largest information gains nearest the root, and in many conceptual clustering methods (e.g. Fisher 1987; Lebowitz 1986).While Mitchell (1980) has shown that any induction method needs to have some form of inductive bias (towards some parts of the concept space rather than others) if it is to do useful learning, the bias built into these heuristics is not always transparent. A drawback of heuristic methods is that they give no simple guarantee of performance; often one must simply try out the method on sets of ‘typical’ data to see how it performs. For instance, setting the bias towards simple concepts (the Occam’s Razor) too strongly may lead to over-generalisation. Nevertheless, techniques quite similar to the script intersection method of finding likely rule scripts have been extensively explored.

Neural nets and other reinforcement learning techniques tend to have a very weak inductive bias (Denker et al 1987), and so to be very slow learners – much slower than the fast social learning seen in primates. Primate evolution has clearly gone a long way to provide the required inductive bias; the problem is to know just what inductive bias has been built in by evolution.

Other approaches to concept learning start not from a plausible heuristic but from a mathematical analysis of the performance required. Much work in this vein uses Valiant’s (1984) framework for Probably Approximately Correct learning, or pac-learning. This framework defines a sub-class of the concept space (a restrictive bias) explicitly, and then analyses the number of training examples needed to find (with high probability) a concept which correctly classifies new examples (with high probability). However, the pac-learning framework is a worst-case analysis – guaranteeing performance for any concept in the sub-class, and any probabilistic mix of training examples, and for any consistent learning algorithm (Haussler et al 1994). For this reason, its predicted learning times (its sample complexity) tend to be over-pessimistic (Buntine 1990).

One can see intuitively (and it can be shown mathematically, as is done in (Worden 1995c)) that natural selection tends to optimise average performance, rather than worst-case performance; it is average learning performance which determines lifetime survival. A monkey which failed to learn some ‘worst case’ rule script, but learnt most scripts rather well, would do better than one which handled the worst case at the cost of slower learning of many other scripts. So the measure of performance in pac-learning analyses is not appropriate for this problem.

Average performance is optimised by Bayesian methods, where the inductive bias towards some concept (or rule script) is defined by a prior probability for different sets of rule scripts to hold in the habitat. Evolution effectively builds some moderately realistic model of these priors into the species’ brain. To learn the best set of rule scripts means to find the peak of the posterior probability, in the light of the factual scripts. The Bayesian approach to learning is also well represented in the ML literature; one important example is Anderson’s (1990) Rational Analysis, which uses an approach similar to this one, to successfully analyse several human problem-solving tasks, and classical conditioning; but has not been applied to learning of structures as complex as rule scripts. Anderson and Matessa (1991) have applied this rational approach to human categorisation; the success of their comparisons illustrates two points:

(1) The theoretical optimality of the Bayesian approach does in practice lead to good performance – at least as good as the many heuristic approaches which have been used for the same problem.

(2) If it did prove to be necessary to include categorisation directly within social learning, then it would be fairly straightforward to combine Anderson and Matessa’s (1991) model of categorisation with this model of scripts, as they are both Bayesian – by defining joint prior probabilities over a larger space.

Haussler et al (1994) have developed a unified framework within which both Bayesian and pac-learning performance bounds can be derived as ends of a spectrum. Although the case they analyse (learning boolean-valued functions from concentrated ‘pure’ training data) is not as complex as script learning (learning probabilistic scripts from noisy training data ‘diluted’ in many irrelevant scripts) the results they derive at the Bayesian end of the spectrum are broadly extensible to this case – showing how script learning can be fitted into the general framework of guaranteed-performance learning.

Therefore the script learning mechanism is closely related to a number of existing computational learning theories, both heuristic and mathematically-based; but since previous approaches have not been explicitly designed for optimum fitness in the social learning problem, it is not identical to any of them.

6.4 Neat Theories Versus Piecemeal Theories

The theory proposed here is a tight, concise computational theory; scripts are very simple information structures, and three basic operations on them (intersection, inclusion and unification) support all the learning and inference needed in the theory. However, one might wonder whether such simple ‘neat’ mechanisms can really be the basis of primate social behaviour, or whether some more piecemeal account is more valid. Perhaps different bits of social intelligence evolved at different times in different ways – a neural net here, a reflex circuit there – without the tight coherent structure I propose. Script theory may seem more of a computer scientist’s theory than a biologist’s; would not a larger, looser theory be more biologically plausible?

Arguments in support of a small neat theory are:

1. High Performance Demands Tight Design: A large, loose theory would, I believe, discount both the direct evidence that primate social cognition is so flexible and powerful, and the evolutionary argument that 50 million years of intense social competition must have made it so.

Whatever the beginnings of primate social intelligence, evolution has honed it to faculty with great representational power, fast learning and flexible inference. To be this powerful, social cognition must be coherent and consistent; it should not contradict itself when faced with some new problem, as a loose, ad hoc design might do. Script theory can be shown to be self-consistent, and to be a near-optimal solution to the problem of social cognition.

We have abundant evidence that when really high performance is required, nature chooses simple, precise designs – such as the optical design of the eye, or the protein-encoding in DNA. While it may be hard to discern such simplicity in the primate brain, we should at least think it possible that social intelligence is based on a simple, spare mechanism such as the script theory, which demonstrably gives the high performance (eg fast learning) which we observe in primates.

2. It is understandable and testable: Scripts can be easily envisaged, and their information content understood; the key operations of script intersection and unification are easily done by hand. So incisive tests of the script theory, as discussed in the previous section, are feasible.

In contrast, a theory which relied on an ad hoc collection of neural nets and specific mechanisms, tied together in arbitrary ways, would be much more difficult to envisage and test. It could always be bent to accommodate new data.

3. It is the Occam’s candidate: Scripts are designed to be the simplest possible cognitive model which can account for the data, and so far, seems to be descriptively adequate. Occam’s Razor requires us to consider simple theories first; so we should try to test this theory and prove it wrong before developing more complex ones. The ways in which script theory fails may be the clues to building a better theory.

4. It may be the origin of human symbol processing: The human mind has a powerful symbol processing capability; the main evidence for this is our remarkable and unique faculty of language. There is evidence that language, like the script theory, uses neat, powerful operations on tree-like information structures (eg syntax trees). While language is clearly much more powerful , it is possible that the basic symbolic script operations of primate social intelligence – as described in this paper – were extended first to the primate theory of mind, then to human symbol processing and language.

You may still feel that such a concise computational theory must somehow belittle the great richness of primate social behaviour. There are three reasons why it does not – first, the script theory is itself capable of generating quite complex learning and behaviour; second, the SIM interacts with other parts of the brain in complex ways to produce the behaviour we see; and third, we need to extend the theory to give a primate ‘theory of mind’ for higher apes and mankind.

Those are the arguments for favouring a tight, concise theory such as this over any looser, piecemeal theory. I hope readers are persuaded to try using scripts to express their own observations and ideas of primate social behaviour.


Anderson, J.R (1990) The Adaptive character of thought, Lawrence Erlbaum Associates

Anderson, J. R and M. Matessa (1991) A rational analysis of categorisation, Machine Learning, proceedings of the seventh international workshop (ML90)

Bower, G. J. B. Black and T. J. Turner (1979) Scripts in memory for text. Cognitive Psychology 11, 177-220

Bowlby, J. (1969) Attachment and Loss 1: Attachment, Hogarth, London

Buntine, W. (1990) A theory of learning classification rules. PhD thesis, Technology University of Sydney

Byrne, R.W. and A. Whiten (1988) Machiavellian Intelligence: Social intelligence and the evolution of intellect in monkeys, apes and humans, Clarendon Press

Byrne, R.W. and A. Whiten (1990) Tactical deception in primates: the 1990 database. Primate report 27, 1-101

Byrne, R.W. and A. Whiten (1992) Cognitive evolution in primates: evidence from tactical deception, Man 27, 609-627

Byrne, R. W. (1993) A formal notation to aid analysis of complex behaviour: understanding the tactical deception of primates, Behaviour 127 (3-4) 231 – 246

Charniak, E. and McDermott, D. (1989) Introduction to Artificial Intelligence

Cheney, D.L. and R.M.Seyfarth (1990) How monkeys see the world, University of Chicago Press

Clocksin, W. F. and Mellish C. S. (1979) Programming in Prolog

D’Amato, M. and Colombo, M. (1988) Representation of serial order in monkeys (Cebus Apella) J. Exp. Psychol. Anim. Behav. Proc. 14 131-9

Dasser, V. (1987) A Social Concept in Java Monkeys, Animal Behaviour 36, 225-30

Denker, J. , Schwarz, D., Wittner, B., Solla, S. Howard, R., Jackel, L. and Hopfield, J. (1987) Automatic learning, rule extraction and generalisation. Complex Systems 1: 877-922

Dennett, D. C. (1983) The Intentional Stance, Behavioral and Brain Sciences 3, 343-350

de Waal, F. (1982) Chimpanzee politics: power and sex among apes, Johns Hopkins University Press

Dickinson, A. (1980) Contemporary animal learning theory, Cambridge University Press

Fisher, D. (1987) Knowledge acquisition via incremental conceptual clustering, Machine Learning 2:139-172

Fodor, J. A. (1983) The Modularity of Mind, MIT Press, Cambridge, Mass.

Fodor, J. A. (1987) Psychosemantics, MIT Press, Cambridge, Mass.

Fodor J. and Z. Pylyshyn (1988) Connectionism and Cognitive Architecture, Cognition 28, 3-71

Graesser, A. C., S. B. Woll, D. J. Kowalski, and D. A. Smith (1980) Memory for typical and atypical actions in scripted activities. Journal of experimental psychology: human learning and memory 6(5) 503-515

Harcourt, A. H. (1988) Alliances in Contests and Social Intelligence, in Machiavellian Intelligence: Social intelligence and the evolution of intellect in monkeys, apes and humans, ed. Byrne, R.W. and A. Whiten , Clarendon Press

Haussler, D., M. Kearns and R. E. Schapire (1994) Bounds on the sample complexity of Bayesian learning using information theory and the VC dimension, Machine Learning 14, 83-113

Hinde, R. A. (1982) Ethology, Fontana

Holland, J. H. , K. J. Holyoak, R. E. Nisbett and P. R. Thagard (1986) Induction: Processes of Inference, Learning and Discovery, MIT press, Cambridge, Mass.

Humphrey, N. K. (1976) The Social Function of Intellect, in Growing Points in Ethology, ed. P. P. G. Bateson and R. A. Hinde, Cambridge

Lebowitz, M. (1986) Concept learning in a rich domain: generalisation-based memory, in Machine Learning: an artificial intelligence approach, Vol. II, R. S. Michalski, J. G. Carbonell and T. M. Mitchell (eds), Morgan Kauffman, Los Angeles

Jackendoff, R. A. (1992) Languages of the Mind: Essays on Mental Representation, MIT Press.

Jolly, A. (1966) Lemur Social Behaviour and Primate Intelligence, Science 153, 501-6

Jolly, A. (1985) The Evolution of Primate Behaviour, 2nd edition, Macmillan, New York

Johnson-Laird, P. N. (1983) Mental Models, Cambridge University Press, Cambridge

Judge, P. G. (1982) Redirection of Aggression Based on Kinship in a Captive Group of Pigtail Macaques, International Journal of Primatology, 3, 301

Kummer, H. (1967) Tripartite Relations in Hamadryas Baboons, in Social Communication Among Primates, ed. S. A. Altmann, University of Chicago Press.

Marr, D. H. (1982) Vision, W. H. Freeman

Michalski, R. S. (1986) Understanding the nature of learning: issues and research directions, in Machine Learning: an artificial intelligence approach, Vol. II, R. S. Michalski, J. G. Carbonell and T. M. Mitchell (eds), Morgan Kauffman, Los Angeles

Mitchell, T. M. (1980) The need for biases in learning generalisations, Rutgers University technical report, reprinted in Readings in Machine Learning (1990), J. W. Shavlik and T. G. Dietterich (eds), Morgan Kauffman, San Mateo, Calif.

Nelson, K. (1978) How young children represent knowledge of their world in and out of language. In R. S. Siegler (ed) ChildrenÕs thinking: what develops ? Erlbaum, Hillsdale, N.J.

Nelson, K. (1985) Making sense: the acquisition of shared meaning, Academic press, N. Y.

Nelson, K. and J. M. Gruendel (1981) Generalised event representations: basic building blocks of cognitive development. In A. Brown and M. Lamd (eds) Advances in developmental psychology (vol 1) Erlbaum, Hillsdale, N. J.

Premack, D. and Woodruff, G. (1978) Does the Chimpanzee Have a Theory of Mind ? Behavioural and Brain Sciences 3, 111-32

Quinlan, J. R. (1986) Induction of decision trees, Machine learning 1, 81-106

Rumelhart, D. E. (1991) The architecture of mind: a connectionist approach, in M.I.Posner, ed., Foundations of Cognitive Science , MIT Press, Cambridge Mass.

Schank, R.C. and R.P.Abelson (1977) Scripts, Plans, Goals and Understanding: an Inquiry into Human Knowledge Structures, Lawrence Erlbaum Associates, Hillside, New Jersey

Schank, R. C. (1982) Dynamic memory: a theory of reminding and learning in computers and people, Cambridge University Press, Cambridge, UK.

Schmidt, C. F. and Marsella, S. C. (1991) Planning and plan recognition from a computational point of view, in Natural theories of mind: evolution, development and simulation of everyday mindreading, A. Whiten, ed., Blackwell, Oxford

Schulz, T. R. (1991) From agency to intention: a rule-based, computational approach, in Natural theories of mind: evolution, development and simulation of everyday mindreading, A. Whiten, ed., Blackwell, Oxford

Seyfarth R. M. and Cheney D. L. (1980) The Ontogeny of Vervet Monkey Alarm Calling Behaviour: A Preliminary Report, Z. Tierpsychol. 54, 37-56

Smuts, B. (1985) Sex and friendship in Baboons, Aldine, Chicago

Valiant, L. G. (1984) A theory of the learnable, Communications of the ACM, 27, 1134-1142

Vera, A. and Simon, A. H. (1993) Situated Action: A Symbolic Interpretation, Cognitive Science, 17:49-59

Worden, R.P. (1995a) The Primate Theory of Mind (Paper in draft)

Worden, R.P. (1995b) A Speed Limit for Evolution, Journal of Theoretical Biology 176, 137-152

Worden, R.P. (1995c) An Optimal Yardstick for Cognition (published in the electronic journal Psycoloquy)


[1]That is, without necessarily seeing the level of fear of one’s peers, merely predicting their level of fear is enough to alter one’s own level of fear.

[2]This example works in just the same way as the “Profumo is Shelley’s mother” example of section 4.1.

[3]The bound is approximate, and holds for the average rate over many generations; for details see (Worden 1995b).