Artificial Intelligence☆

864KB Sizes 2 Downloads 225 Views

PDF Reader
Full Text

Artificial Intelligenceq D Partridge, University of Exeter, Exeter, United Kingdom Ó 2017 Elsevier Inc. All rights reserved.

2 2 2 2 3 3 3 3 3 3 4 6 6 7 7 7 7 7 8 8 8 9 9 9 10 10 10 10 11 11 11 11 11 11 11 12 12 13

Approaches Functional Equivalence Structural Equivalence Abstract Structure: Cognitive Science Physical Structure: Cognitive Neuroscience Computational Paradigms Classical Programming Procedural Programming Rule-Based Systems Logic Programming Neural Computing Inductive Methods Subdivisions Knowledge Representation Semantic Networks Knowledge Bases Expert Systems Machine Learning Rote Learning Parameter Optimization Inductive Generalization Vision and Perception Pattern Recognition Image Understanding Natural Language Processing Spoken or Written Syntactic Analysis and Grammars Semantics Pragmatics Machine Translation Machine-Readable Dictionaries Robotics Moving, Sensing, and Reasoning Hand–Eye Systems Cognitive Systems Philosophical Perspectives Conclusions Further Reading

Glossary Cognitive level A viewpoint in which structures and mechanisms to support a scientiﬁc interpretation of intelligent behavior bridge the gap between neurophysiology and intelligent behavior. Structures and mechanisms at the cognitive level may or may not be found to have a simple mapping to the structures and mechanisms of brain anatomy.

q

Heuristic A rule of thumb used to generate adequate solutions to intractable problems. Knowledge base A set of “if–then” rules and facts that captures the information necessary to support intelligent activity within some circumscribed domain. Neural network A computational system composed of highly interconnected simple processing units that operate

Change History: December 2015. Derek Partridge made some changes to the text and updated the Further Reading section.

Reference Module in Neuroscience and Biobehavioral Psychology

http://dx.doi.org/10.1016/B978-0-12-809324-5.02995-3

1

2

Artificial Intelligence

in parallel to process all incoming activity values, compute an output activity value, and send it to all directly connected neighboring units. A unit’s output value transferred by each

interunit link is modiﬁed by the “weight” (a real number) associated with each link. It is this set of link weights that is altered during the initial network training.

Artiﬁcial intelligence (AI) has no ﬁrm deﬁnition. The term refers to attempts to simulate or replicate with computer systems functions that are normally assumed to require intelligence when manifest in humans. The goal of AI is to reproduce some aspect of human intelligence within a computer system. Thus, the bulk of AI work consists of programming computers, although signiﬁcant engineering can be involved in important subareas such as robotics. Intelligence is thus seen as running the right program on an adequate computer, and the grand goal of AI is to write that program.

Approaches There are a variety of approaches to the question of what aspect of human intelligence to attempt to simulate or replicate with a computer system as well as how to go about doing it. The basic division, which is particularly germane for brain scientists, is whether the approach is to reproduce functional equivalence with no regard for the mechanisms and structures of the human brain or whether it is important to reproduce structure as well as function.

Functional Equivalence Within a functional equivalence approach to AI the goal is to reproduce (or supercede) some aspect of human intelligence, such as medical diagnosis or chess playing, by using whatever means are available. It is believed, for example, that the best human chess players examine only a few of the many possible move sequences that may be available at any given point in a game. The success of chess-playing computers has been founded on very fast exploration of thousands of possible move sequences. The result is very highquality chess games from these machines but based on mechanisms that would not appear to be used by humans who exhibit equally impressive chess-playing behavior. The bulk of work in AI has taken this purely functional approach for two good reasons. First, the mechanisms underlying human intelligence tend to be options for debate rather than secure knowledge. At the cognitive level, the mechanisms and structures, such as short-term and long-term memories, are by their very nature difﬁcult to describe in detail with certainty. Also, although details of brain physiology may be asserted unequivocally, the uncertainty lies in their roles in the production of the aspects of human intelligence being reproduced. The second reason for neglect of structure and mechanisms is the suspicion that the human brain may not be an optimum architecture on which to build intelligence. Its basic design evolved long before it was required to support intelligent behavior, and there are many seemingly obvious weaknesses in the human brain when viewed as a computational machine, such as error-prone memory retrieval. Therefore, most of the work in AI must be viewed as attempts to reproduce some aspects of intelligence without regard for the way that the human brain might be thought to be operating to produce human intelligence.

Structural Equivalence For those researchers of AI who do claim some measure of structural validity to their work, two levels of structure and mechanisms may usefully be distinguished: the cognitive level and the level of brain anatomy. At the cognitive level, the psychologist posits structures and mechanisms to account for the observed behavior, and although these objects may have no clear implementation in terms of brain anatomy, it is a requirement that such a mapping could be plausibly found. The other level of structure is that of brain anatomy: the structures and mechanisms that can be physically observed, probed, and generally examined. At the cognitive level, we deal with abstract structures such as short-term memory and mechanisms such as forgetting caused by attention shift. At the physical level the structures may be an observed pattern of interconnectivity of neurons, and the mechanisms may be pulse frequency between the component neurons. Neither level is simple, nor are they always distinct, but this twofold division is a useful approximation.

Abstract Structure: Cognitive Science The majority of AI work that has not eschewed structural equivalence has accepted a structural model of the brain at the cognitive level, and much of the AI work in this subarea is also known as cognitive science. Alternatively, most of the AI work in this subarea is aimed at evaluating and exploring cognitive models of human intelligence. Cognitive-level structures may be posited but have no grounding in brain anatomy; neuron-based structures with any but the most general connections with cognition are still few and far between, despite the continued advances in brain-scan technologies (see Chapter 3 of Partridge, 2014).

Artificial Intelligence

3

Physical Structure: Cognitive Neuroscience There is a relatively small but growing body of work in AI that does accept constraints that derive from anatomical knowledge of brain structure and function. The impetus for this type of approach to AI has been the surge in interest and activity based on computation with neural networks. A neural network is a computational system that is founded on simple communication between highly interconnected networks of simple processing elements, ie, a computational basis that appears to reﬂect some aspects of brain physiology. Conventional programming is also used to construct AI models of this type, but it tends to be restricted to the reproduction of a limited aspect of human intelligence. A system might, for example, attempt to reproduce certain aspects of human learning behavior based on a model of synaptic facilitation at the molecular level. The reality is that neural-network models typically bear no close correspondence with neuro-anatomy, simply because close, detailed connections between brain structures and cognitive functions are still virtually non-existent. We know that certain brain structures may be “important for” certain aspects of cognition, but our knowledge is typically restricted to this level of generality.

Computational Paradigms In addition to the type of approach taken, work in AI can be distinguished by the computational paradigm used. By this I mean the style of computational mechanism, both in general and in the speciﬁc class of programming language used (eg, procedural versus declarative languages).

Classical Programming Classical or conventional programming has dominated AI (as all other forms of computer usage). It is characterized by the processes of ﬁrst deriving an algorithmic procedure for achieving some desired goal within the accepted constraints and then casting this algorithm into a machine-executable form [ie, the classical task of writing (or coding) a program]. Typically, the elements of the program are the conceptual objects of the problem [eg, a small array of storage locations might be labeled short-term memory (STM)], and the algorithmic mechanisms programmed will directly implement the conceptualizations of the way the model works (eg, subparts of the program using STM will be direct implementations of storing and removing objects from the STM object in the program). In this classical usage of computers, the speed and accuracy of the computer are used to perform a series of actions that could, in principle, be performed by hand with a pencil and paper. The main drawback from an AI perspective is that the programmer must decide in advance exactly what detailed sequence of actions will constitute the AI system.

Procedural Programming Most programming languages are procedural languages; that is, they permit the programming to specify sequences of action, procedures, that the computer will then execute. Several programming languages have a particular association with AI. One such language is LISP, which dominated the early decades of AI. LISP serves the needs of AI by being a general symbol manipulation language (rather than a numeric computation language such as FORTRAN); it is also particularly useful for AI system development because it demands a minimum of predeﬁned conditions on the computational objects, which permits maximum freedom for changing program structure in order to explore an AI model by observation of the computational consequences. This ﬂexibility minimizes automatic error checking and can lead to fragile programs that may also be large and slow to execute. It is thus not a favored language and method for mainstream computing, in which the emphasis is on fast and reliable computation of predeﬁned procedures.

Rule-Based Systems Early in the history of AI it was realized that the complexity of AI models could be reduced by dividing the computational procedures and the objects they operate on into two distinct parts: (1) a list of rules that capture the knowledge of some domain and (2) the mechanisms for operating on these rules. The former part became known as the knowledge base and the latter as the inference engine or control regime. The following might be a simple rule in a chess-playing system: IF a potential exchange of pieces will result in the opponent losing the more valuable piece THEN do the exchange of pieces. The general form of such rules is IF/conditionS THEN/actionS. When the/conditionS is met, then the/actionS is executed, which generally results in new conditions being met and/or answers being generated. The control regime might be that the ﬁrst rule whose condition is satisﬁed is executed, with this simple control repeated until some goal is achieved or no rules can be executed. This strategy for building AI systems became inextricably linked with the AI subarea of expert systems.

Logic Programming Development of the knowledge-base and inference engine division was formalized in the idea of logic programming. The basis for logical reasoning, which has always been an attractive foundation on which to build AI, is a set of axioms and the mechanism of logical deduction, and this pair map easily onto the knowledge base and inference engine, respectively. This particular development was implemented in detail in the logic programming language PROLOG, which became a popular AI language in the 1970s, especially in Europe and Japan.

4

Artificial Intelligence

Neural Computing Neural networks are computational structures that are composed of many simple, but highly interconnected, processing units or nodes. Each interconnection, or link, is associated with a weight that is a number that controls the strength of the link; for example, with the weight as a simple multiplier, positive weights larger than 1.0 will increase the activity value from one node to the next. There are many varieties of neural network in use and most are trained in order to generate the desired system. Training consists of adapting an initial set of link weights (initially set as random numbers) so that the network reproduces a training set of input–output pairsdthe training set, a set of examples of the behavior that we wish to simulate. The training procedure (which consists of adjusting the link weights until the network reproduces the training set) is an automatic algorithmic process. The training cycle typically consists of inputing a training pattern, executing the network, calculating the error between what the network computes and the correct result for this particular input, and adjusting the link weights to reduce this error. It is not usually possible (and not desirable) for a given network to reproduce the training set exactly. Training is thus repeated until a measure of the error between network performance and training set is acceptably small. The training algorithms usually guarantee that this error will not increase with further training, but there is no guarantee that it will always be reducible to the desired value. This process of AI system developmentdrandom initialization and repetitive trainingdis clearly different from that of classical programming and it does not always succeed. At best, training only promises a local minimum error, which means that training can become stuck with no further improvement from successive repetitions of the training cycle, but the network is a poor approximation to the training data. Fig. 1 presents a neural network known as a multilayer perceptron. It is a system designed to predict risk of the bone degradation disease osteoporosis. It can be used to prioritize patients for screening in order to improve early detection of the disease. The inputs are 15 risk factors (measured from a given patient) and the single output is a measure of how likely the patient is to develop osteoporosis, with 1.0 the highest likelihood and 0.0 the lowest. Approximately 500 patients both with and without osteoporosis were used to train the network, and it proved to be approximately 72% correct on a set of 200 previously unseen cases.

Figure 1

A multilayer perceptron.

Artificial Intelligence

5

Figure 2 A system designed to recognize shape from shading. Adapted from Lehky, Sejnowski, 1988. Network model of shape from shading: neural function arises from both receptive and projective ﬁelds. Nature 333, 452–454.

The network structure and mechanisms illustrated in Fig. 1 bear little more than a tenuous analogical relationship to actual brain structures. Other uses of neural computing have accepted more realistic constraints from neuroanatomy. Fig. 2 presents a simpliﬁed version of a system that was designed to recognize shape from shading (ie, to reproduce the human ability to perceive a threedimensional shape from a two-dimensional pattern of light and dark). The neural net system used was built from hexagonal arrays of formal neurons that were conﬁgured to mimic the various known receptor types in the optic tract (eg, circular receptive ﬁelds). Although AI systems constructed with neural computing methods are necessarily approximations, they offer the advantage of not requiring a prior problem speciﬁcation. For many AI subareas the details of how the input information is transformed into an “intelligent” output are incompletely understood. In fact, the essence of the problem is to develop such an understanding. A data-driven technology, such as neural computing, that enables the construction of a system from a set of examples is thus particularly useful. Neural computing is a powerful technology but it tends to suffer from lack of transparency: Trained neural networks are resistant to easy interpretation at the cognitive level. In the trained neural network illustrated in Fig. 1, for example, all the individual risk factors are totally and equally (except for the individual link weightings) intermingled in the computation at the hidden layer (all input nodes have a connection to every hidden-layer node). Therefore, how is each risk factor used in the computation of disease risk? It is impossible to determine from the neural net system; its computational infrastructure cannot be “read off” as it can be for a classical program. However, different styles of explication are emerging: It is possible, for example, to associate a conﬁdence with each network prediction based on an analysis of the particular input values with respect to the set of values used for training. Another result of the total intermingling of all the input features and the fact that the internal structure of a neural network is nothing but units with different combinations of weighted links in and out is that no conceptually meaningful representation can be isolated within the computational system. In the osteoporosis system, for example, the input feature “age” is assessed in

6

Artificial Intelligence

Figure 3

Partial decision tree for predicting whether an aircraft will level off at a certain ﬂight level.

comparison to some threshold in order to add its contribution to the ﬁnal outcome. However, no comparison operations or threshold values are evident within the trained network. It is usual to say that such computational mechanisms use distributed representations. In contrast to neural networks, there are other inductive, or data-driven, technologies that operate directly at the cognitive level.

Inductive Methods An inductive method of system building is one that produces a generalized computational system from a set of speciﬁc examples. Neural computing offers one general class of inductive method, and rule-induction algorithms provide another. From a set of examples, a rule-induction algorithm will construct a decision tree, which is a computational system that embodies general rules that encapsulate the classiﬁcations (predictions, decisions, diagnoses, etc.) provided in the set of examples. Just as with neural networks, a decision tree cannot always be constructed, and when it can it is unlikely to be 100% correct even when classifying the training examples. Decision trees tend to train faster than neural networks, but they tend to be more difficult to train to low error because they are more sensitive than neural networks to the choice of input features. However, the ﬁnal decision tree is structured as sequences of decisions at the cognitive level; therefore, these computational systems may admit an interpretation and an understanding of the computational details of the problem in terms of cognitive-level features. Fig. 3 presents part of a decision tree for predicting whether an aircraft will level off at the ﬂight level it is approaching or not. Such trees have been automatically generated using several thousand examples of aircraft leveling and not leveling at various ﬂight levels at London’s Heathrow Airport. The examples consisted of radar tracks (sampled at 4-sec intervals) and the particular ﬂight level that was being approached (eg, 7000-ft level, which is FL7000). Various features are extracted from the radar tracks; for example, in Fig. 3 d45 is the vertical distance traveled by the aircraft between radar points at interval 4 and interval 5 before the ﬂight level is reached. The internal structure of this (simpliﬁed) decision tree is composed of branch choices, and the tree leaves are the predicted outcomes (ie, level off or non-level off). The branch choice at the top of the illustrated tree tests feature d45: If the value of d45 for the sample to be predicted is less than or equal to 100 feet, then the left branch is taken; otherwise, the right branch is taken and non-level off is immediately predicted.

Subdivisions Intelligence, even if we limit interest to human intelligence rather than intelligent possibilities in some abstract sense, is a broad topic. From the early days of AI (1950 and 1960s) when initial attempts, which dealt with many aspects of intelligence, were

Artificial Intelligence

7

revealed to be far too ambitious, the history has been one of scaling down. Consequently, the ﬁeld of AI has been subdivided into many (often more or less isolated) aspects of intelligent behavior.

Knowledge Representation There is a long-held belief in AI that knowledge is crucial to intelligent behavior. If a system does not have a signiﬁcant fund of knowledge (about the world or about some particular subarea), then no matter how sophisticated the reasoning, intelligent behavior will not be forthcoming. Quite apart from the difﬁcult questions of what knowledge is needed and how much of it is needed, there is an initial question of how that knowledge must be structured, or represented, within an AI system.

Semantic Networks One enduring belief about how knowledge should be structured is that there should be much and various interconnectivity between the elements of knowledge, whatever they may be. Concepts should be associated with each other with various strengths and according to various relationships. Words, for example, seem to be associated in our brains by semantic similarity and by frequency of cooccurrence. A representational scheme known as semantic networks is one attempt to provide the necessary associativity. A semantic network is a set of nodes and links between them in which each node is labeled with a concept and each link is labeled with a relationship between the concepts. This scheme can be effective for small and static domains. However, in the wider ﬁelds of intelligent behavior it is evident that both the relationships and their strengths are context dependent, which implies that the semantic network must be dynamically reconﬁgurable. The concept “chair,” for example, might be represented in terms of physical relationships as having legs (usually four), a seat, and a back (not always). In terms of functional relationships, it is used for sitting on. However, there are chairs with no legs (eg, when suspended), and there are chairs that must not be sat on (eg, in a museum). Neither of these situations would be expected to disrupt normal intelligent behavior, but such exceptional circumstances as well as a detailed computational realization of common qualiﬁers, such as “usually,” cause major problems in AI systems. Dynamic reconﬁgurability is the ﬁrst example of adaptivity, of learning, which seems to be fundamental to intelligent behavior but is a source of great difﬁculty in computational systems. Neural networks have a fundamental ability to adapt and learn: (they are constructed by adapting the link weights to learn the training examples), but they are severely limited in scope and do not begin to approach the apparent complexities of human learning. In AI, this is considered as yet another subﬁeld, machine learning.

Knowledge Bases A further drawback of the semantic network representation of knowledge is the complexity of the representation. For example, if a node is removed from a network, then all links to it (or from it) must be removed or redirected to other nodes. Therefore, explicit representation of the desired associativity actually works against efforts to introduce dynamic adaptivity. A knowledge base, or rule base, is a representation of the knowledge as a set of facts and rules that capture much the same information as a semantic network but do so with a set of distinct elements (the individual facts and rules). This does make the addition and deletion of knowledge elements (and hence adaptivity of the knowledge as whole) computationally much simpler. The cost is that relationships between the rules are no longer explicit within the representation. They reside implicitly in the rule ordering in conjunction with the searching strategies.

Expert Systems Despite the loss of explicit representation of the interrelationships between the various elements of knowledge, knowledge bases have become ﬁrmly established as a useful way to represent knowledge in relatively limited, ﬁxed, and well-structured domains of human endeavor. They have provided the basis for the large and relatively successful subﬁeld of expert systems. An expert system reproduces (or even exceeds) human capability in some specialized domain, some area of human expertise. Initially, the use of knowledge bases with appropriate control strategies so dominated the expert systems subﬁeld that this representational scheme was used to deﬁne an expert system. Subsequently, it has been accepted that it is the reproduction of intelligent activity within a very limited domain that characterizes an expert system, whatever the means of realization. However, it is not uncommon to ﬁnd the terms expert system and knowledge base used with virtual synonymity.

Machine Learning The ability to adapt to changing circumstances (ie, to learn) appears to be fundamental to the nature of intelligence. However, the construction of computer programs is a difﬁcult and demanding exercise that usually requires that much time be spent on what is euphemistically called debuggingdtracking down and eliminating errors in a program. The intricacies and complexity of debugging can escalate dramatically if a program is self-modifying (ie, if it learns), because learning changes some aspects of the original program. This means that when a programmer needs to delve into a learning program to alter or correct its behavior, he or she ﬁrst has to understand the detailed results of the learning in addition to the complexities of the basic program as originally written. Consequently, most AI systems have little or no learning behavior built into them. Explorations of mechanisms to support adaptive behavior tend to be pursued as an end in themselves, and these explorations constitute the AI subarea of machine learning.

8

Artificial Intelligence

Rote Learning Rote learning is the process of memorizing speciﬁc new items as they are encountered. The basic idea is simple and easy to realize within a computer program: Each time a new and useful piece of information is encountered, it is stored away for future use. For example, an AI system might be designed to recognize faces by extracting a variety of features (such as distance between the eyes) from an image and searching for a match within a database of 1000 stored feature sets. If it ﬁnds a match, it has recognized the person; if not, it reports “unknown person.” In this latter case, the person or some human operator can provide the necessary details of this unknown person, and the system can enter the details in its database so that next time this person is presented to the system he or she will be correctly recognized. However, every rote learning event increases the size of the database, and speed of recognition (as well as accuracy if new faces are similar to old ones) will decrease as the size of the database grows. Depending on the application of this system, certain new faces may be seen once and never again, in which case rote learning wastes time and space. Alternatively, this new face may be commonly seen at ﬁrst but may over time become rare and ultimately no longer seen. In the ﬁrst case, rote learning is immediately detrimental to system performance, and in the second case it becomes detrimental over time. Rote learning (and in general any sort of learning) implies some compensating mechanism of forgetting, which suggests that the apparent human weakness of imperfect recall may be a fundamental necessity to long-term intelligence. AI has largely been preoccupied with learning mechanisms usually without the compensating mechanism of forgetting. The second general point is that most situations change over time, so an AI system with any longevity will have to be able to track and react appropriately to the natural drifts in the problem domain. This is not an easy problem to solve. The fundamental difﬁculty is one of distinguishing between short-term variation and long-term drift when both short term and long term are ill-deﬁned context-dependent quantities. If after appearing twice a day, a face is not seen for a week, should it be deleted? There is no correct answer and probably no more than guidelines even if the full details of the application of this face recognition system are known. Therefore, a good rote learning system must exhibit ﬂexibility and sophistication. It cannot simply store every new event and delete every one that is not used during some speciﬁed period of time. Human intelligence appears to embody some mechanisms of partial recall that, like forgetting, may not be weaknesses; for example, learned faces may gradually fade over time if not used but may be more quickly and easily reestablished if they reappear before they have been totally forgotten. Neural networks, in which input information is blended and distributed across the network (distributed representation), do offer hope of a mechanism for partial learning and forgetting. Currently, however, sophisticated control of such abilities proves elusive. Classical programs appear to be less amenable to schemes for partial representation.

Parameter Optimization If a problem-solving procedure can be speciﬁed in terms of a set of parameters, then mechanisms of machine learning can be used to reset, or learn, optimum values for these parameters. To grossly oversimplify the face recognition system introduced previously, identiﬁcation of a face might be based on the distance between the eyes (DE), the length of the mouth (LM), and the depth of the forehead (DH). However, what is the relative importance of each of these features? They are probably not exactly equally valuable in the recognition task. The system can learn this information. Suppose we deﬁne recognition score as follows: Recognition score ¼ X DE þ Y LM þ Z DH where X, Y, and Z are the parameters that determine the relative importance of the three features for face recognition. By trying different values for these parameters against recognition speed and accuracy for a test set of faces, the system can set them to optimal values. Therefore, from some arbitrary ﬁrst-guess values for X, Y, and Z, optimal values have been learned and a better recognition score will be computed. Quite apart from the issue of problem drift, this technique presents many difﬁculties. First, the form of the initial equation severely constrains what can be learned, and it might be a poor choice. The most important features might have been omitted because their importance was not recognized. More subtly, there may be interactions between features such that attempts to optimize each parameter independently lead to a highly suboptimal system. Finally, this mechanism offers very little scope for the system to learn anything that was not foreseen explicitly by its designer.

Inductive Generalization As discussed previously, inductive generalization is the process by which a general principle is extracted from a set of instances. It is the basis for several computing technologies (neural networks and rule-induction algorithms). It can also be viewed as a method for machine learning quite independent of the technology used to implement it. The inductive technologies described in section Computational Paradigms are thus ways of building an AI program by learning from a set of examples. Further interest in this general idea has arisen because the inductive technologies are each based on a speciﬁc class of induction algorithm, and many other types of induction algorithm remain to be explored and possibly developed into new computational technologies. There is an unlimited scope for further developments in this subarea of machine learning because there are an unlimited number of ways to extract generalities from a set of instances, and there is no correct way. Inductive generalization is a fragile procedure. The classic illustrative example derives from philosophy: Suppose you see a large white waterbird and are told it is a swan, and then you see another that is also white, and then another, and so on. How many speciﬁc instances of white swans do you need to see before it

Artificial Intelligence

9

is safe to generalize to the rule that all swans are white? The answer is that the inductive generalization can never be guaranteed no matter how many white swans you see. However, a single sighting of a black swan can tell you that your generalization is false; it represents an apparent contradiction to your rule. However, you might choose a different interpretation of the black swan sighting. The ﬁrst being that it is not a swan because it is not white. In this case you have transformed your initial tentative generalization into a ﬁrm rule that is used to further interpret the world. An intelligent agent must do this eventually because this is the whole purpose of generating such generalizations. A different rejection of the black swan sighting might be based on the idea that your generalization is essentially correct, it just needs reﬁningdand this is always possible to do. The black swan sighting might have been on a Saturday, so a new (and now uncontradicted) version of your rule might be that all swans are white except on Saturdays. This last, somewhat absurd dodge is particularly instructive because it highlights the problem of what features to extract from the instances to include in the generalized rule. We know that a swan’s color does not depend on the day of the week, but an AI system, with its necessarily limited knowledge, might not know this to be true. Therefore, the difﬁculties in constructing algorithms for inductive generalization are determining what features (from the inﬁnitely many possible ones, such as “3 p.m. on a cloudy day”) to extract to focus the generalization on and determining when features from two instances are sufﬁciently similar. For example, most swans will not appear pure white, and we must accept that swans that are shades of gray or discolored due to dirt are all still essentially the same color (ie, white). At what point do we have to admit that a bird’s plumage is not sufﬁciently similar to white, and that the bird is thus an exception to our rule? This, like so much else in AI, is an ill-deﬁned and context-dependent issue.

Vision and Perception Sight, although not essential, appears to be an important component of human intelligence. AI vision systems have many potential applications, from automated security camera systems to automatic detection of pathological images in medicine. Because of the accessibility (and assurance of general functional commitment) of the optic tract, mammalian visual systems offer some of the best opportunities for basing AI systems on neuroanatomical data rather than cognitive-level abstractions. Fig. 2 is one example of an attempt to build an AI vision system with regard to neuroanatomy.

Pattern Recognition Pattern recognition usually refers to attempts to analyze two-dimensional images and recognize (ie, label) within them prespeciﬁed subareas of interest. Mathematics and statistics feature strongly in this subarea by providing algorithms for noise reduction, smoothing, and segmentation. Processing of images at the pixel level leads naturally into the classic bottom-up approach to pattern recognition. In this strategy, the idea is to associate areas in the image of similar texture or intensity with features of interest and to associate discontinuities with boundaries that might be developed to circumscribe features of interest. When the image has been segmented, segments may be collected into recognizable objects or may represent the expected objects. This ﬁnal process of labeling image segments may be classically programmed or may be learned through a set of examples and the use of an inductive technology. This latter approach has been found to be valuable when it is difﬁcult to specify exactly what is to be labeled (eg, a tumor in a radiogram), but many examples are available that can be used to train a system. It has been argued, however, that an intelligent agent generally sees what it expects to see; this is the top-down approach to vision. In this case, an image is scanned with an expectation of what the system wants to ﬁnd in the image. The image need only be processed where and to whatever depth necessary to conﬁrm or reject the expectation. In the bottom-up approach, it is anticipated that the features will emerge from the image, and in the top-down approach it is hoped that knowing what one is looking for will facilitate quick and accurate recognition or rejection. Of course, it is possible to combine bottom-up and top-down processing.

Image Understanding One point of distinction in AI vision systems is that of ﬁnding and labeling objects in an image (from a stored list of possibilities) as opposed to the development of an understanding of an image. The latter, somewhat ill-deﬁned goal seems to be what an AI vision must achieve, and it probably involves some object labeling as a constituent process. Image understanding is the placement of an image in a global context. A pattern recognition system, for example, might ﬁnd and label a tree, sky, clouds, and grass in an image, whereas an image understanding system might declare that the image is a countryside scene in some temperate climatic zone, perhaps land used for grazing cows or sheep (an implication of the short-cropped grass). Sophisticated image understanding has not been achieved, except in some very restricted subdomains (eg, automatic monitoring of the conﬁguration of inuse and free gates at a speciﬁc airport terminal). The inductive technology of neural computing has been trained to assess the crowdedness of speciﬁc underground station platforms. In the latter case, a neural net system can only be trained to generate answers in one-dimension, ie, it can assess crowdedness but no other characteristic of the people such as category (eg, business or tourist). Other neural nets might be trained to assess other desired features but this implies that the important features are decided in advance of system construction and that a general understanding will only be gained from a set of trained networks. This approach to image understanding implies that a classically programmed harness system will be needed to collect and combine the outputs of the individual neural networks. Image understanding, however it is achieved, is expected to require substantial knowledge of the world within which the image is to be interpreted. In the previous example of a rural scene, the only basis for drawing the implication about cows and sheep must be

10

Artificial Intelligence

knowledge of why grass is cropped in the countryside (as opposed to a lawn, for which the grazing implication is likely to be wrong) and which grazing animals are likely to be responsible in a temperate climate. Thus, intelligent image understanding will require image processing, pattern recognition, and a knowledge base together with an appropriate control strategy, and all must be smoothly integrated. Like so much else in AI, image understanding beyond very limited and heavily constrained situations is currently beyond the state of the art.

Natural Language Processing Human language, which is generally recognized to be a phenomenon quite distinct from (and in advance of) any known animal communication mechanism, is often considered to be a characteristic feature of intelligence. The Turing Test for AI hinges on whether a computer system can communicate in natural language sufﬁciently well that a human observer is likely to mistake it for another person; If so, then we must concede that the system exhibits AI. From this widely accepted viewpoint, natural language understanding, both analysis and synthesis, is the key to AI. This view is defensible because, as with image understanding, natural language understanding implies many other AI subﬁeldsda knowledge base, inferential control strategies, adaptivity, and so on. Natural language processing (NLP), computational linguistics, computational semantics, cognitive linguistics, and psycholinguistics are interrelated areas within which AI work on language processing may be found.

Spoken or Written A ﬁrst issue for any proposed AI NLP system is the modality of the input (and output): Will the language be written or spoken. Speech, and thus sound waves, as the input medium is normally viewed as a separate pattern recognition task (ie, the recognition of words within the sound wave pattern). This is not an easy task but considerable progress has been achieved in recent years, particularly with systems that employ some inductive technology and can be trained to the particular idiosyncrasies of a user’s voice. Many relatively cheap systems are commercially available for speech recognition, but they do not extend (other than trivially) to language understanding. This is a different and separate subﬁeld, and it typically requires that the language utterances be typed.

Syntactic Analysis and Grammars The formal basis of language appears to be syntactic structure, and such structure is deﬁned with a grammar. Prior to AI, linguists formulated speciﬁc grammars to capture the allowable structures of various languages, and they also formulated classiﬁcation schemes such that a given grammar (and hence the language it purports to specify) could be assigned to a class. Formal deﬁnitional schemes, such as these grammars, were ideal for computerization and were seized on by the ﬁrst computational linguists. Simply stated, the expectation was that computers could apply such grammars to check the correctness of a sentence in, for example, English, and also determine the syntactic category of each word. From this point a dictionary of word meanings could be applied, and a meaning for the sentence could thus be generated. Even the most optimistic did not think that this would yield perfect understanding, but they thought it would be good enough, for example, to add the dictionary of meanings from another language and thus translate from one language to another. However, like all other aspects of AI, it was not this simple. Much human communication, especially spoken communication, is ungrammatical. We often communicate in broken structures; furthermore, neither we nor those we speak to appear to be unsettled by or even notice the ungrammaticalities. In addition, there is no ﬁxed grammar for any living language precisely because any language in use evolves and is modiﬁed by those who use it. Further complicating this is the fact that a widely used language develops differently at different points in its range. This leads to dialects and eventually to distinct different languages. These problems do not bode well for any AI system that aspires to deal with the NLP problem beyond the conﬁnes of a small, well-deﬁned, and ﬁxed subset of some language. Consequently, there was a movement among NLP workers to develop systems that were not based on a complete and correct prior syntactic analysis. Efforts in this direction have been founded on ﬁnding main verbs and then using these to seek parts of the surrounding sentence that fulﬁll the anticipated roles. Therefore, for example, if a sentence in English contains the word “hit” the implication will be that the subject of the sentence will be a potential “hitter” (which probably requires an animate object) and that the object of the sentence will be something that “is hit.” In addition, the sentence could have the role of “what was used for the hitting” speciﬁed and introduced by a word such as “with.” Clearly, some initial syntactic analysis can be used to build a framework of further syntactic expectations that may be further constrained by semantic expectations. Continuing the previous example, the hitter is likely to be an active animate object (a person rather than a ball or a bat, which might also be mentioned in the sentence). A shift to consider such constraints on words, rather than simply grammatical categories such as verb or noun, leads us into the realm of semantics.

Semantics Words have meanings, and so it seems reasonable that the NLP researcher might expect to build up through word meanings to sentence meanings, paragraph meanings, and complete document meanings. However, words sometimes have a variety of meanings. Consider the sentence, “The pig is in the pen”. The meaning of the word “pen” here must be “enclosure” rather than “writing implement” or “female swan.” The semantics of “pen” is disambiguated by the meaning of the sentence as a whole, so the sentence meaning cannot simply be built up from word meanings. In general, NLP requires a complex interaction between expectations at all levels that yields a most likely meaning when all (or a maximum) of the mismatches have been eliminated. In the previous example sentence the mismatch between large animal “pig”

Artificial Intelligence

11

being inside small writing implement “pen” is eliminated by consideration of the enclosure meaning of “pen.” However, suppose this sentence is followed by another: “When will that bird learn to stop swallowing the children’s toys.” A whole new semantics for the previous sentence emerges, but only if this latter sentence is taken as a further comment on the ﬁrst sentence and not as a break to a new topic. Decades of work on AI NLP systems has resulted in progress, but it has also revealed the complexity of language. The surprise is not that we cannot make computers understand anything beyond small and well-deﬁned language subsets, but that we as humans all intercommunicate freely in an unbounded medium. Part of the answer, of course, is that miscommunication occurs, and that communication is seldom perfect (whatever that might mean) because it does not need to be.

Pragmatics If the NLP problem is not daunting enough already, consider that what we might term the direct or explicit meaning of a string of words may bear little relation to the “real” meaning. Said in an ironic tone, “That’s smart” may mean “that was a stupid thing to do.” Ultimately, the meaning of a natural language utterance is colored by the predisposition of the listener, then by what the listener thinks the utterer is intending to convey, and then by what the listener thinks the utterer thinks the listener will think. There is, of course, no end to this sequence of presuppositions. The utterer may bring an entirely equivalent but quite different suppositions to bear on what he or she intends to mean. Pragmatics is the collection of all these factors beyond and outside of language that inﬂuence meaning. AI NLP systems have hardly come to grips with any aspects of pragmatics other than in the context of theoretical analyses of pragmatic issues.

Machine Translation Due to the commercial value of adequate machine translation (MT), attempts to construct such systems occurred at the birth of AI NLP. Another impetus for MT springs from the potential of such systems to begin to solve the difﬁcult question of whether some text has been adequately understood. For example, in English-to-Russian and a Russian-to-English translation systems, a given utterance in English can be translated into Russian and then translated back to English. In which case, the question of whether the two systems extract accurate meanings can be simpliﬁed (without major distortion given the current state of the art) to one of meaning preservation: Does the initial English sentence say the same thing as the ﬁnal English sentence? MT systems have been in use for years but only in niche applications either in simple and restricted applications or where a ﬁrst crude translation is valuable. The former case might be technical manuals that are written according to simple, well-deﬁned structures using a small, well-deﬁned vocabulary. The latter situation might arise as a precursor to human reﬁnement of the documents.

Machine-Readable Dictionaries One spin-off from MT work has been the computerization of dictionaries. Words with their various meanings can be stored in a computer for fast access by any other computer system such as a word processor or by an inquiring human. This is little more than a fast form of the traditional book dictionary, but appropriate computerization can offer much more. A computerized dictionary can be proactive in suggesting words and even phrases, which implies an AI aspect to certain operations.

Robotics The idea of a robot embodies the ultimate goal of AI, and robots have been designed and built since AI’s inception.

Moving, Sensing, and Reasoning One feature of the early years of AI was that researchers did not appreciate the depth and difﬁculty of the problems they faced. Many early robotics systems were mobile machines with a variety of sensors (vision, infrared, tactile, etc.). They were also supposed to analyze their situation and solve problems (such as how to traverse realistically cluttered rooms and corridors). They were conceived as crude simulations of complete humans with the hope that reﬁnements and additions would improve the similarity year by year. This did not happen; instead, each component problem escalated into a full-blown subarea of AI.

Hand–Eye Systems One common simpliﬁcation of the robotics problem has been to construct hand–eye robots. These systems attempt to integrate visual processing and manipulation of the seen objects. This difﬁcult task requires pattern recognition and some degree of image understanding to be integrated with ﬁne motor control and tactile feedback. Industrial companies such as automobile manufacturers employ a dazzling array of robotics systems but these include little or no AI. The elaborate welding systems are preset to weld at speciﬁc points in three-dimensional space. The problem of fast, accurate welding is then reduced to one of engineering the means by which the parts to be welded are moved to precisely the correct placement.

Cognitive Systems Some of the most imaginative and ambitious work on AI robots is being done at one of the long-term bastions of AI. At the Massachusetts Institute of Technology, research teams are developing various systems to implement cognitive robotic functions (eg, an attentional system for social robots).

12

Artificial Intelligence

Philosophical Perspectives The ﬁeld of AI has always spurred extensive philosophical controversy. Can human intelligence be no more than the right algorithm running on a biological computer, sometimes called wetware as distinct from the hardware of modern computers? The separation of mind and brain, the Computational Theory of Mind. which permits the researcher to ignore brain structure, is the underlying assumption of almost all AI projects. There are various arguments that human intelligence must be more than this see, for example, “emergent mind” theory. Is the spiritual or emotional dimension of humanness an essential ingredient of intelligence, or are such aspects merely side issues that humanity happens to exhibit? Whatever the outcome, some assert that an intelligent machine can never really feel like we do, enjoy the taste of strawberries or the thrills of love, so it must always fall short of full human intelligence. At best, machine will only simulate such feelings, and this is a far cry from actually experiencing them as a human does. A distinction is sometimes drawn between strong AI and weak AI. The former is a machine intelligence that encompasses all aspects of human intelligence (whatever they may be), whereas the latter category only aims to reproduce limited aspects of intelligence. Further controversy centers on the notion of intelligence as an abstract concept within which human intelligence is just one (the only one we currently know) particular manifestation. In this view, arguments for the nonoptimality of human intelligence emanate from several sources: Empirical evidence of, for example, imperfect memory suggests that certain speciﬁc improvements should be possible, and the realization that the basic “machinery” of the brain was not designed to support intelligent activity but evolved for other purposes is also suggestive of nonoptimality. Given a custom-designed machine executing a similarly crafted suite of computational procedures, an AI that supercedes the human version in terms of speed, accuracy, etc. should be possible. However, the notion of a super-intelligence runs into all sorts of problems: for example, if the system forgets nothing, does it remember everything including all the moment-to-moment trivia it encounters? Could we communicate with such a system? There are a variety of reasons why this seems unlikely (see Chapter 13 of Partridge, 2014). Supporting this latter possibility, evolution may have produced a set of compromises that is optimal for intelligence on Earth. It is plausible that imperfect memory, for example, is an unavoidable negative outcome from a system that would be seriously disadvantaged by the clutter of useless memories if it forgot nothing. Similarly, feelings and emotions might be vital signals of what courses of action to pursue, and what ones to avoid, in a world full of choices that must be made on incomplete evidence. What claim can there be that a machine is intelligent just because it behaves as if it is? The famous Turing Test (devised in 1950 by the mathematician Alan Turing) rests on the idea that if a machine can converse in a human language as if it is intelligent then we must concede that AI has been achieved. This opens the more general issue of whether and to what extent form can be inferred from function: By observing what a system does, can we determine anything about how it is doing it? If we ignore the physical components of the brain (as most AI work does), then what can we expect to deduce about the detailed mechanisms that constitute intelligence based only on observations of intelligent behavior? Finally, there are challenges to the accepted wisdom that intelligence is, or can be exhibited by, any algorithmic procedure running on any modern digital computer. There may be something about the determinism and deﬁniteness of this combination that is totally at odds with the seemingly unconstrained openness of human intelligence. Objections of this nature tend to be founded on either the known limitations of formal systems (such as abstract algorithms) or the apparent difﬁculties posed by certain aspects of human intelligence (eg, free will).

Conclusions In the ﬁeld of AI, there is a range of often autonomous subﬁelds. This occurred as a reaction to the extreme difﬁculty revealed by all approaches to the more global goal of complete AI. It is a classic divide-and-conquer approach to excessive complexity, but can it work for AI? On the negative side, note that investigations within most (if not all) subﬁelds seem to reveal that knowledge of the world and sophisticated contextual awareness are essential for high-quality AI. In other words, real progress in each subﬁeld seems to require major progress in every other as a corequisite, if not a prerequisite. On the positive side, there are some clear indications that intelligence is not all-or-nothing. There are many examples of intelligent humans who lack sight, speech, and hearing. In addition, there is a case to be made for the modularity of mind both from localization of functional components of intelligence in speciﬁc anatomical regions and from observed phenomena such as optical illusions. Many optical illusions remain compelling illusions even when the basis for the illusions is physically demonstrated or explained. This persistence might be interpreted as an absence of close integration between visual processing and reasoning abilities. One pervasive conundrum is that a broad and deep knowledge of the world appears repeatedly in the subﬁelds of AI as a necessity for intelligent activity, with an associated implication that more knowledge supports higher levels of intelligence. However, from a computational perspective more information (in a larger knowledge base) must lead to more searching for relevant pieces, which takes more time. Stated succinctly, the more you know, the slower you go. This is true for AI systems but not, it seems, for humans. In fact, the growth of a human’s expertise is often associated with a faster performance rather than a slowdown. Why? For some, the answer is that classical programming on conventional computers is the wrong basis for intelligence. Therefore, other computational paradigms are being explored. For others, the answer is less radical but no less obscure: Information cannot just be added on, it must be properly integrated and indexed in sophisticated ways.

Artificial Intelligence

13

A ﬁnal general point of difﬁculty is that intelligence in humans is characterized by tentative conclusions and sophisticated recovery from errors. We do not always get things right, but we can usually readjust everything appropriately once we accept the error. In AI systems, the compounding of tentative conclusions is difﬁcult to manage and recovery from error is often no easier because we typically have to foresee each source of error and supply a recovery procedure that will always work. One general conclusion to be drawn from this survey is that AI has not been as successful as is popularly believed. One rejoinder to this common charge is that as soon as an AI problem has been solved by the construction of a programmed system, that problem ceases to be called AI. There is some truth in this, but not much. The real AI problems of adaptivity, natural language understanding, image understanding, and robotics are still out there and waiting to be solved. Work in AI does not undermine a belief in humanity by aspiring to mechanize it. Quite the contrarydthe more one examines the problems of intelligence, the more one’s admiration grows for a system that seems to solve all of them at once, often effortlessly.

Further Reading Brookes, R.A., 1999. Cambrian Intelligence: The Early History of the New AI. Bradford/MIT Press, Cambridge, MA. Dennett, D.C., 1998. The Dennett Quartet. Bradford/MIT Press, Cambridge, MA. Dreyfus, H.L., Dreyfus, S.E., 1986. Mind over Machine. Macmillan, New York. Gazzaniga, M.S. (Ed.), 2000. The New Cognitive Neurosciences. Bradford/MIT Press, Cambridge, MA. Heinke, D., Humphreys, G.W., Olson, A. (Eds.), 1999. Connectionist Models in Cognitive Neuroscience. Springer-Verlag, London. Hinton, G., Dayan, P., Frey, B., Neal, R., 1995. The wake–sleep algorithm for self-organizing neural networks. Science 268, 1158–1161. Hutchins, W., Somers, H., 1992. Introduction to Machine Translation. Academic Press, San Diego. Kistler, W.M., van Hemmen, J.L., 2000. Modeling synaptic plasticity in conjunction with the timing of pre-and postsynaptic action potentials. Neural Comp. 12, 385–405. Luger, G.F., Stubbleﬁeld, W.A., 1998. Artiﬁcial Intelligence. Addison-Wesley, Reading, MA. Parks, R.W. (Ed.), 1998. Fundamentals of Neural Network Modeling. Bradford/MIT Press, Cambridge, MA. Partridge, D., 1991. A New Guide to Artiﬁcial Intelligence. Ablex, Norwood, NJ. Partridge, D., 2014. What Makes You Clever: The Puzzle of Intelligence. World Scientiﬁc, Singapore. Pazienza, M.-T. (Ed.), 1997. Information Extraction. Springer, New York. Rich, E., Knight, K., 1991. Artiﬁcial Intelligence. McGrawHill, New York. Searle, J.R., 1992. The Rediscovery of the Mind. Bradford/MIT Press, Cambridge, MA. Wilks, Y., Guthrie, L., Slator, B., 1996. Electric Words: Dictionaries, Computers and Dictionaries. MIT Press, Cambridge, MA.