Learning by an autonomous agent in the pushing domain

19 Robotics and Autonomous Systems 8 (1991)19-29 North-Holland Learning by an autonomous agent in the pushing domain Tatjana Zrimec University o...

Download PDF

505KB Sizes 0 Downloads 68 Views

Report

PDF Reader
Full Text

19

Robotics and Autonomous Systems 8 (1991)19-29 North-Holland

Learning by an autonomous agent in the pushing domain Tatjana Zrimec University

of Ljubl

ana, Faculty

of Electrical

Engineering and Computer Science, Trfas`ka 25, 61001 Ljubljana, Slovenia

Peter Mowforth The Turing Institute, 36 North Hanover St., Glasgow GI 2AD, UK

Abstract Zrimec, T . and Mowforth, P., Learning by an autonomous agent in the pushing domain, Robotics and Autonomous Systems, 8 (1991) 19-29 . The work presented in this paper is concerned with developing an algorithm for the extraction and representation of knowledge which allows a real robot to learn predictable behaviour . We have designed an experiment in which a robot can randomly explore the domain of object pushing using signals recorded before and after a controlled movement . The goal is to let the system discover how its world domain works through experimentation and unsupervised learning. The learning method involves a combination of learning from examples along with partitioning, constructive induction and determination of dependencies which together allow the system to be independent of human help . Each experiment may be considered as a state transformation recorded as a sequence of attribute values . Treating each transformation as a training example, a large set of data was collected and subjected to the learning system. Robust and useful transformations were discovered which were represented as a hierarchical qualitative model. One important result is that representation in an actor-oriented, coordinate frame provides the most compact description for the problem domain . Unlike other domains, this style of experimentation offers the potential for closed-loop learning where, as far as the robot is concerned, its world is the oracle . Keywords : Unsupervised learning; Constructive induction ; Causal qualitative modelling .

1 . Introduction While learning systems have found considerable application in the domain of expert systems, their use in the emerging field of advanced robotics

Tatjana Zriniec received a B .S .c in computer science in 1977, an M .S .c in electrical engineering in 1980 and a Ph .D . degree in computer science in 1990 from the University in Ljubljana, Slovenia. She is now working as an assistant professor at the Faculty of Electrical Engineering and Computer Science, University of Ljubljana where she is a member of both the Artificial Intelligence research group and the Robotics Research group . Dr. Zrimec has written several recent research papers on her current research interest which is the intersection of logic programming, advanced robotics and autonomous learning.

Peter Mowforth was born in Sheffield, England . He originally studied neurophysiology and physics in London and then moved to Cambridge to study human psychophysics and neurophysiology . He returned to Sheffield to achieve a Ph.D . on the requirements and constraints for human stereo vision. Following a brief lectureship in psychology at Coleg Harlech in Wales, a move was made to Edinburgh's Machine Intelligence Research Unit in 1982 to investigate the role of induction in 2D vision systems . In 1983 the Turing Institute was formed and the group moved to Glasgow . Since that time, the author has been one of the Institute's Directors and has had overall responsibility for a wide range of commerical and basic research projects in the areas of computer vision and advanced robotics in addition to providing a wide base of consultancy for many international companies . The author has published over 40 recent articles in artificial intelligence whilst maintaining a particular interest in the representational requirements for early visual processing . The author is a member of the British Computer Society and the IEEE, Chairman of the British Machine Vision Association (Scottish branch), academic editor of Turing Institute Press and organiser and chairman of the First International Robot Olympics held in Glasgow in 1990 .

0921-8830/91/$03 .50 it 1991 - Elsevier Science Publishers B .V . All rights reserved

20

T Zrlme('. P. hlowforth

has been rare . One reason for this is that machine learning has typically been studied in static domains where learning algorithms have a different form and requirement to that found in dynamic situations . Thus, appropriate learning algorithms must first be developed as a precursor task . In static domains, the examples for inductive learning are typically in the form of 'situation-action' pairs . For dynamic systems, examples need to he in the form 'situation-action-situation' - a loop we may loosely refer to as behaviour [12] . An early related work by Michie and Chambers [6J involved a simulation of a pole on a cart (a two degree of freedom mobile robot) for which a simple credit assignment algorithm slowly learned to balance the pole. Another related work, by Dufay and Latombe [3], used a two-phase approach for building robot programs : a training phase which produced a number of execution traces and an induction phase which transformed the traces into an executable program . The task was to insert a pin into a chamfered hole, a task which, due to

geometric uncertainty, involved the planner generating several different sequences of motion which were then subjected to iterative transformations such as the merging of nodes and arcs labeled by motions and states regarded as equivalent by rewriting rules . One final related project described by Dechter and Michie [2] involved the use of an inductive learning algorithm which was used in the construction of a robot plan for arch building . The idea behind the current research is to discover if a real robot, is able to discover any generalizable knowledge, by interacting with its world in a closed-loop, while using the paradigm of causality learning [14] . Such knowledge should express regularities which relate perception and action and so underpin predictable behaviour . We decided to start the research by studying the problem of pushing on a flat surface . Through experimentation we were able to generate real-world data containing information about both action and perception . Our goal was to formalise a methodology for learning in a dynamic domain, via a

Puma 260 +

pneumatic gripper +localsensors

Fig . 1 .

Subset of the Freddy 3 advanced robotics research test-bed used for the experiment .

Learning by an autonomous agent

21

Fig. 2. The image sequence shows the process of pushing a block .

suitable case study in which human help was excluded . The requirements for unsupervised learning for this problem are : (1) Automatic description of the domain - tessella-

(2)

(3) (4)

tion (pattern of partitions) of the variable space. Constructive induction - filtering or constructing relevant variables among all supplied variables . Automatic discovery of dependencies - distinguish classes from attributes . Incremental learning - providing knowledge refinement in a loop .

2. Method The experiments described in this paper were conducted using the Turing Institute's advanced robotics test-bed, Freddy 3 [7] . The Freddy 3 environment currently consists of two Puma 260 robot arms, vision systems, speech systems and other sensors . Each of them is controlled in realtime from Quintus-prolog running on a Unix network . The experiment described in this paper uses one robot, a vision system, a speech synthesiser and a Prolog control process (see Fig. 1) . Given that our goal is concerned with the bootstrapping of dynamic knowledge, the learning system should be given access to no more than the raw signals driving its actions and recordings from its sensors . Since there is no model to work from, the experiment consists of random movements of

a robot hand in the working area in which an object is placed . The working area is viewed by a camera mounted on the ceiling which provides two-dimensional information about the robot and its local environment. Within a predefined area of the robot's workspace a rectangular block was randomly placed so as to start the experiment . Using the vision system, the state of the world was recorded . Next, the robot performed a random action which was also noted and then, finally the state of the world following the action was again recorded (see Fig. 2) . The vision system provides the position and orientation of the block whilst the robot action is recorded by the start and end position of its straight line movement . The robot carried out 106 experiments with a wooden block of dimensions 2 cm x 2 cm x 4 cm . The information from each experiment was written on a file . At a later stage, 157 further experiments were carried out using the same block . All of the experimental results were then used as a basis for machine learning experiments using the method of `learning from examples' [1] .

3. The learning process Whilst the objective of the experiment was unsupervised learning, there were several stages in the development of the qualitative model where we had to intervene manually . The reason was that, first, we did not have a suitable algorithm

I . Znrwec . P . MU,iforili

available for the task and, second . we did not even know what form the algorithm should take . Our task was twofold : (1) to develop a specific machine learned model for the pushing experiment, (2) to develop a general purpose machine learning algorithm for dynamic domains . Hence, it was not simply a case of writing down the algorithm and then applying it, but rather, a series of refining exercises trying to achieve both tasks in parallel . To understand what constituted an effective learning system for the domain, several steps taken in the original experiments [8] were manually developed but later automated . Automatic rule induction systems for inducing classification rules from examples have already proved valuable as tools for assisting in the task of knowledge acquisition from examples . Our first experiments were designed to determine whether any regularities existed within the data . We used the learning from examples system . Assistant : 86 [1] which is similar to Quinlan's ID3 algorithm []1] . This algorithm has the ability to induce a general description of concepts (classes) from examples of these concepts (classes) . Examples are objects of a known class described in terms of attributes along with their values . The product of this type of learning is a symbolic description of concepts (classes) in the form of a decision tree . The path from the root to a leaf is the description of that leaf-class (rule) in terms of visited attributes . To work with this algorithm we had first to prepare the training examples in the form of a set of attribute values along with the corresponding classes . The data collected in the experiment described in this paper consists of information from the vision system and information about the robot movement during each trial . So one trial can be described with three sequence situations in which the data was stored as : Situation before action : object position defined by

the X, Y coordinates and the orientation defined by the angle, all expressed with the symbolic names respectively : X obj_start, Y -obi _start and Angle _ obj _ start . Robot action : defined by the coordinates of the starting point( X . rob- start, Y -rob start), and the ending point (X- rob-end and Y rob end) .

Situation after action : again the information about

the object position defined by the symbolic names (X obj end, YI obj_end and Angle obj _end) . The rule induction package requires that classes and attributes are defined, so that, the next step was to prepare the training examples by introducing this distinction . In the early experiments we acted as domain experts when we defined the domain . In later experiments, automated partitioning was used which provided a better and a human independent description of the domain . The partitioning algorithm is as follows : (1) Sort data for each variable . (2) Calculate distance between i, i + 1 point :

D(i, i+1)=X(i+1)-X(i) . (3) Find maximum distance Max D (4) Calculate similarity S between two successive points on the base of the distance S(i, i+1)=-100 •

D(i . i+1) +100 MaxD

where S is expressed in % . (5) Calculate average similarity . (6) Choose M most dissimilar points (less then average similarity) and place partitions at their mid-points . Because we did not specify a goal for the system . classes and attributes were compounded . Distinguishing classes and attributes goes a long way towards determining the outcome of learning . Hence, learning systems requiring that the distinction be made beforehand largely prejudge the outcome and, second, still require a considerable domain expertise to be provided by experts . If we assume that any change in the object position is a result of the robot action, then this assumption can be used to define the class values . As the goal is to discover what causes a change in perception, the class values were related to perceptual change . In the first learning experiment we made a simplification and worked with two classes Change and Nochange : change = 0, named Nochange, and change # 0, named Change . Change is defined as an alteration in either position or orientation of the object following the robot action . Due to small quantisation effects of

Learning by an

autonomous

agent

23

w

Fig. 3 . The decision trees shown above are for the window-oriented coordinate frame, the object-oriented coordinate frame, and the actor-oriented coordinate frame.

the vision system a small threshold was chosen to separate the two classes . The attributes were the values for X, Y coordinates and orientation of the object before the action (X _ obj start, Y_ obj start, Angle obj _ start), and coordinates for both starting and ending points which determine the robot action (X-rob-start, Y_ rob_ start, X rob end and Y rob end) . To avoid the requirement of domain experts, an algorithm was developed which performs a dependency analysis. Highly dependent variables were then used as the basis for specifying classes whilst variables showing low dependency were assigned as attributes . The algorithm for this task is based on an excess entropy defined as the difference between the sum of the entropies taken

separately compared with the usual single entropy measure [13] : H(a) + H(b) - H(a

X

b) =C(a, b) .

C(a, b) = 0 -> a independent of b. C(a, b) > 0 - a is dependent of b . 3.1 . Determining an appropriate representation After numerous experiments we found that the induced rules were complex and the attribute values were strongly dependent on the experimental environment . Originally, the data was expressed in a coordinate frame which was common to the robot and the vision system with the orientation and origin of the frame taken from the vision

24

7'. Lrimer, P. MMowforrh

Table I Parameter

Coordinate frame type

No. of nodes (pruned) No. of leaves (pruned) No. of null leaves No. of different attributes appearing in the tree

Window

Object

Actor

15 8 I

13 7 1

9 5

6

4

system . After performing several learning experiments in which we were unable to produce any useful knowledge, we decided to perform experiments using different representations for the data . Two additional training sets were prepared by applying homogenous transformations of coordinate systems . In the first set, the data was represented in the coordinate frame of the object whilst in the second set the data was represented in a robot-action (actor) coordinate frame. Each of the three data sets were then passed to the rule-induction algorithm . Both complete and pruned trees were generated for each of the three experimental conditions (see Fig. 3). The trees were analysed by looking at the number of nodes, leaves and attribute types present in the induced trees for each experimental condition (see Table 1) . The results clearly indicate that the representation can be simplified following a simple change in geometry . Given that a prime goal for machine learning is to produce compact representations, we decided to follow the principle offered by Occam's Razor . This suggests that, given multiple descriptions, one should choose that which is the simplest, i .e. low complexity is synonymous with high credibility [10] . We chose the actor-oriented coordinate frame representation . The presence of null leaves in the trees for the other two representations supports the choice . The transformation from one coordinate frame to another involves translation and rotation of the data given by the homogeneous transformation : X rob end - X_ rob_ start tan a = Y robend - Y_ rob _ start ' Xo = X-rob _start, Y„ = Y -rob - start . rI

X' LY'~

cos a -sina

sin a X - Xo cos a~ Y-ya

The transformation causes the values of three variables to become zero, and other variables to take on new values calculated by the above expressions . The change in geometry is an example of what is meant by constructive induction [5] . Here, a new set of variables are synthesised which are more relevant for representing a model for the domain . The automation of constructive induction must involve some global set of mathematical (or logical) operations which, when applied, result in a new set of synthesised variables which better describe the domain . By better we mean simpler or more understandable although in practice the two are often taken to be the same . It is unlikely that the task of constructive induction could be performed without access to some form of background knowledge ]9] . Following constructive induction, any patterns in the data should be more obvious ; i .e . the data is more compressible (greater redundancy) . An algorithm was developed for measuring the amount of redundancy in the set of data on the basis of an entropy measure : (1) For a given k < n find subset (pit, pit, pi3 . . . pik) of the predicates such that : H(pl, p2 . . . . pn)-H(pil, pit, pi3 . . .pik) - min (2) For k = 1, 2, 3 . . . n find the k such that the compression achieved in (1) is in some respect optimal . Whilst this algorithm is able to give a measure of success of the constructive induction, the form of the constructive induction itself is, at present, still performed manually . Information about the object provided by the vision system consists of three components - coordinates X, Y and the orientation of the principle axis (longest axis of symmetry) . To investigate all patterns in the data we generated therefore three new training sets in which class values corresponded to change in X, Y or angle through the variables : DX=(X obj end-X_obj start) . DY=(Y_obj_end-Y-obj start), DA (Angle_ obj _ end-Angle . obj-start) . The attributes for all three training examples were values transformed into the actor-robot coordinate frame for : X' obi-start -- X coordinate of the object centroid and the action vector ;

Learning by an autonomous agent

Y' obj start - Y coordinate of the object centroid and the action vector ; Angle obj-start - object orientation relative to the action vector ; Y rob -end - length of the action vector .

derived classes corresponded directly with the original, manually developed classes and split the training examples approximately into two equal parts (change and no change) . The three groups of preprocessed data were then subjected independently to the learning algorithm, Assistant : 86, whose output is in the form of a decision tree containing rules describing the relationships between classes and attributes . All three decision trees were heavily pruned so as to exclude both fine detail or noise from the results . The decision trees expressed as rules are :

Whilst the original results [14] used these expertderived classes, we have since automated this process . Indeed, the original experiments would not have required a distinction among classes and attributes if a proper learning system had been available . For the classes to be useful they should denote symbolic categories . To achieve this we automated the task of grouping-clustering for automatic class definition and description . The algorithm clusters data using a variable by progressively merging two adjoining variable instances to minimise the difference between the information content taken separately and the information content of the merged values . The algorithm terminates when only two groups remain . Each group defines symbolic (logical) regions which are named by a human oracle. These automatically

LEVEL 1

S-LEVEL 2 •

INO - PUSH

25

if k1 < X_obj start < k2 else DX = 0 if kl < X_obj start < k2 else DY= 0 if k1 < X_obj-start < k2 else DA = 0

then DX # 0 then DY# 0 then DA * 0

In the first experiment the parameters kl and k2 are values -2 .5 cm and 2 .5 cm which correspond to the maximum length of the object as viewed by

LEVEL 3; 1

LEVEL 4

I

>max_projected radius

x obj start -. .!
(PUSH LEFTI

<0 X obj start- . >'0 ILITTLE MOVEI

TRANSLATIONI

(PUSH RIGHTI Extreme

Around 90 degrees

Push Not -extreme BIG MOVE

Between & rel angle

I

Nol around 90 degrees Center & rel angle

IBOHI

Around 90 degrees ROTATIONI

Fig . 4 . The final machine-learned, hierarchical, qualitative model, PANIC .

26

7 Zrimac. P. Mowtorth

EXPERIMENTATION : Data collection of variables

MOTIVATION : Directing the experiment Using the rules

PREPROCESSING THE DATA : Partitioning, constructive Induction and derive the dependencies

MODEL CONSTRUCTION : Hierarchical construction Organisation of rules

Fig.

5.

Sequence

of

INDUCTIVE LEARNING : Generation of rules

operations constituting causality learning .

the actor (maximum projected radius) . All trees share a common root node in which

By substitution, we can express the above with a single rule :

X' obj start = maximum projected radius

if all =v then DZ=0, DZ- DX & DY & DA,

of the object relative to the actor . For each decision tree, this attribute value splits the examples into those showing Nochange on one branch and Change on the other . By applying the logical operators and and or manually, the classes for DX, DY and DA were merged into two common classes which we named PUSH (change in all dimensions) and NO PUSH (no change in any dimension). To test the correctness of these operations, an additional learning experiment was performed using the same attributes and new merged classes PUSH and NO-PUSH . The results produced a decision tree which supports the class compression operations . The result is shown as level 1 of the model depicted in Fig. 4 . To formalise the class compression step we use : D ; = 0-'D;=0 . D ;#0 D;=1, for i = X, Y, A . We first express the left-hand branch of the trees : if att = v then D,=0 .

Applying the same procedure to the right-hand branch of the trees : if att = v1 then D, = 1 . By substitution, we can express the above with a single rule : ifatt=vl then DZ=1, DZ <-DX v DY v DA . Using level I of the model we can predict that whenever the action vector of the robot hand intersects with the object then the object is pushed . This is represented with a change in position or orientation . Alternatively, if there is no intersection then the object is left in the same place, i .e. it is not pushed . Such knowledge is the fundamental basis for collision avoidance . An important requirement for an autonomous agent is to have a basis for being able to use discovered knowledge as a basis for learning further things . Because the experimental paradigm did not have a specific goal other than exploration

27

Learning by an autonomous agent

and learning we had to introduce a mechanism for directing the behaviour of the system . We introduced a primitive form of motivation by directing behaviour towards maximum sensory change . The knowledge from level 1 was implemented as an operational module and used by the robot . Given the constraint added by the primitive motivation, the robot directed its exploratory behaviour into performing PUSH experiments . This meant that before the robot carried out an experiment, it first checked to see whether a PUSH would occur . If the model predicted that NO-PUSH would be the result, the robot calculated a new random action vector and repeated the experiment . A further training set of data was collected . One byproduct of this experiment was that it verified that the robot had learned the difference between pushing and not pushing. This new data was used as a basis for all later learning experiments . Fig. 5 shows the complete loop for the causality learning paradigm . The data derived from successful PUSH experiments was prepared for rule induction in the same way as for the earlier experiments. This time the classes for DX and DA were grouped into two large clusters : one containing positive values and the other containing negative values. The values of DY were all positive due to the choice of representation (for DY to be negative a PULL would have had to occur) . Again, with the learning algorithm we generated two decision trees . The induced trees had a similar form as in the level 1 experiment and the same logical operation was applied . The class merging operation produced an interesting result . The positive values of the class DX were combined with the negative values of the class DA and vice versa for the other class values . The results were used to express level 2 of the model shown in Fig. 4 . Level 2 of the model predicts that if an actor pushes an object on the left then it tends to move away, to the right, and also to spin clockwise . If, however, the object is pushed on the right then the object moves away, to the left, and tends to spin counter-clockwise. By finding further subclusters in the data and repeating the learning procedure, two further levels were discovered which are shown in Fig. 4. Level 3 allows us to predict that if the object is pushed on the extreme left or the extreme right then we may expect a small change in rotation and translation . Alternatively, if we push near the center of the object then

•

t Level

Push

Actor

one

•

•

Move right

Actor

•

1

No push

Move left

t

Level two

1 Little move

•

Actor

•

Big move

t Level three

Little move

1 Rotation Actor Translation

Level four

Rotation

l

Fig . 6 . Graphical representation of PANIC.

we would expect a large translational or rotational change to occur . Level 4 allows us to refine our predictions from level 3 . The form of the refinement is that if we take into account the relative angle between the actor and the object, we can predict pure translation whenever we push in the centre and with the force acting orthogonally to the contact surface . Alternatively, when we push orthogonally on the side of the object we may expect maximum rotation . Any other combination does not allow us to make a reliable prediction . Fig. 6 shows a graphical representation of the model . The position on the block along with the qualitative class name resulting from an action is indicated for all levels of the model .

PANIC Level I More simple More coarse More general

More complex Level 2

More detailed

Level 3

More specific

Level 4 Fig. 7. Levels of abstraction .

I Znmcc, P . Mowforth

3.2 . Levels

of abstraction

At different levels of the hierarchy, the knowledge may he described in different ways . These descriptions may be summarised as in hig. 7. The complete hierarchical qualitative model was named PANIC which stands for Perception and Action as Naive Causality .

4. Discussion and conclusion The experiment that was performed introduced a novel paradigm into robot learning research . The paradigm is that of causality learning where an autonomous agent equipped with sensors, carries out experiments in a partially random, partially directed fashion . The knowledge derived for object pushing using this method offers support for that derived using physical laws [4] . The computational steps involved in the automation of such a process are: (1) Tessellation (domain description) . (2) Constructive induction (use background knowledge) . (3) Tessellation (repartition) . (4) Dependency discovery (distinguish classes and attributes) . (5) Rule induction (machine learning) . (6) Compression (logical merging and structuring) . These steps are carried out in a control loop where experiments are directed by motivation . One of the important findings of the experiment was that by changing the representation a considerable simplification of the model was possible . Once discovered, the representation change made a lot of sense, to describe actions relative to whoever or whatever carried them out is after all the natural way of describing causality . To make the learning more effective the robot requires an explicit strategy by which it can direct its exploration towards `interesting' clusters of results . The term `interesting' suggests that the robot has some form of motivation which directs its behaviour in ways which increase the likelihood of sensory change . Such ideas are common in biological systems where things that are bright, loud or quickly moving are typically more interesting than things which are not . Indeed, biological sensors appear to be specifically designed to re-

spond more strongly to change rather than to steady-state stimulation . Unfortunately, if strong sensory change was the only parameter to effect primitive motivation, the robot would work its way into a single local minimum and get stuck there . To avoid such situations, we propose that a complementary process be used which can be referred to as habituation or boredom . This means that whenever the robot carries out a certain number of experiments within one portion of the problem space and has found some reasonably reliable transformation there, it would get bored and would jump out of that local minimum using a random number generator . Thus each cluster of the model would have an associated weighting factor whose value would be determined by the two opposing forces of strong sensory change and habituation . Whenever the weighting factor falls below a certain level, the robot redirects itself to play with some other problem . Future work will be directed to producing closed-loop, incremental robot learning incorporating primitive motivation of the type described above .

Acknowledgements This work was supported by SERC, The Slovenian Research Council and The British Council . The authors thank Franc Solina for many helpful comments and to their fellow researchers both in Ljubljana and in Glasgow for useful ideas .

References Ill B . Cestnik, I . Kononenko, and I . Bratko, Assistant 86 : a knowledge-elicitation tool for sophisticated users, In : 1 .

[2] [3]

[4

5l

Bratko and N . Lavrac, eds ., Progress in Machine Learning: Proc. of the EWSL 87: European Working Session on Learning, (Sigma Press . Wilmslow, 1987) 31-45 . R . Dechter and D . Michic. Induction of Plans, 'FIRM 84-006 (The Turing Institute, Glasgow, 1984 .) B . Dufay and T.C. Latombe, An approach to automatic robot programming based on inductive learning, First Int. Symp. on Robotics Res., Bretton Woods (1983) 97-115 . M .T. Mason . Mechanics of pushing, Robotics research : Second International Symposium (MIT Press, Cambridge, MA, 1985) 421-428 . R.S. Michalski, Inductive inference as rule-guided transformation of symbolic description, International Workshop on Program Construction, Chateau de Bonas, France (1980).

Learning by an autonomous agent

[6]

[7) [8]

D . Michie and R .A . Chambers, BOXES : an experiment in adaptive control, in : E . Dale and D. Michie, eds., Machine Intelligence 2 (Oliver and Boyd, Edinburgh, 1968) 137-152 . P.H . Mowforth and 1 . Bratko, At and robotics : flexibility and integration, Robotics 5 (1987) 93-98 .

P.H . Mowforth and T . Zrimec, Learning of causality by a robot, in : J . Hayes-Michie, ed ., Machine Intelligence 12 (Turing Institute Press . Glasgow, 1991) 225-240 . [91 Yoh-Han Pao . A mathematical model system experimental exploration of constructive induction, Technical Report TR105-86, Case Western Reserve University (1986) . [10] J . Pearl, On the connection between the complexity and credibility of inferred models . Int. J. General Systems 4 (1978) 255-264 .

29

[111 J .R . Quinlan, Discovering rules from large collections of examples : a case study, in : D . Michie, ed ., Expert Systems in the Micro Electronic Age (Edinburgh University Press, Edinburgh, 1979) 168-201 . [121 E .D. Sacerdoti, A Structure for Plans and Behaviour (North-Holland, Amsterdam, 1977) . 1131 M.H . VanEmden . Hierarchical decomposition of complexity, in : Machine Intelligence 5 (Edinburgh U .P., Edinburgh, 1969) 361-380. [14] T . Zrimec and P.H . Mowforth, An experiment in generating deep knowledge for robots, in : J .S . Gero, ed ., Artificial Intelligence in Engineering. Robotics and Processes AIENG-88 (Computational Mechanics Publications, California, 1988) 21-33 .

Learning by an autonomous agent in the pushing domain

Learning by an autonomous agent in the pushing domain

Recommend Documents