Building a database for product design knowledge retrieval—A case study in robotic design database

Building a database for product design knowledge retrieval—A case study in robotic design database

ARTICLE IN PRESS Robotics and Computer-Integrated Manufacturing 26 (2010) 224–229 Contents lists available at ScienceDirect Robotics and Computer-In...

147KB Sizes 1 Downloads 171 Views

ARTICLE IN PRESS Robotics and Computer-Integrated Manufacturing 26 (2010) 224–229

Contents lists available at ScienceDirect

Robotics and Computer-Integrated Manufacturing journal homepage: www.elsevier.com/locate/rcim

Building a database for product design knowledge retrieval—A case study in robotic design database Jie Sun a,n, Wen Feng Lu a,b, Han Tong Loh a a b

Department of Mechanical Engineering, National University of Singapore, Singapore Centre for Design Technology, National University of Singapore, Singapore

a r t i c l e in f o

a b s t r a c t

Article history: Received 25 October 2009 Accepted 2 November 2009

In product design process, when dealing with technical problems or initiating a new design, R&D personnel would often turn to technical database to seek inspiration. The building of a database with such documents has not been systematically dealt with. In this paper, several issues on how to build up a product design database are investigated: input source, sampling scheme and quality control. A case study of building a database for robotic design is used to demonstrate the concept. It is an archive of more than 1500 relevant technical papers. A total of 16 graduates are employed as operators in the labeling process and subsequently the hypothesis tests are utilized to process the labeling results. To ensure this database quality, the labeling consistency of each operator and the understanding of each category are tested. With the use of statistical methods, this work proposes a feasible and practical way to create such a database for product design. & 2009 Published by Elsevier Ltd.

Keywords: Product design t-test Labeling policy F-test Robotic design database

1. Introduction With increasing global competition and dynamic market needs, companies are focusing more on product design. Product design is a problem solving process, with the consideration of design materials, manufacturing processes, product personality, function, manufacturing process, total price and environmental aspects. In this process, generally a list of requirements is translated into design objectives. All the available solutions are screened or translated into specific features and then historical designs with these features are explored. Designers may also find designs closely related to the previous design, and utilize the available information. A successful design depends on the relative field knowledge and design experience. The lack of enough knowledge in product design will result in delivery delay, cost increase and probable poor quality problems. A review of past papers has also revealed a change in product design and manufacturing from being that of skill-based to one based on knowledge and information [1–4]. Data analysis of product design was first reported by Moteki and Arai [5], who used principal components analysis to analyze data from a polymer production facility. Jaeckle and MacGregor [6] used multivariate statistical methods such as partial least squares and principal components regression to investigate product design problem. To achieve a desired product design, Lakshminarayanan [7] presented a methodology to analyze the

n

Corresponding author. E-mail address: [email protected] (J. Sun).

0736-5845/$ - see front matter & 2009 Published by Elsevier Ltd. doi:10.1016/j.rcim.2009.11.004

database with historical operating conditions and product quality. However, all the above-mentioned linear data analysis tools are grossly inadequate when working with design database for industry [8]. One promising and viable alternative is data mining. It has been applied to the manufacturing domain, especially in the area of design, quality control, and customer service. Ferguson [9] suggested product designers to access a range of corporate databases, in particular, customer complaints, product material features and R&D testing, with the use of data mining techniques. This investigation enabled information from later life cycle stages to be used by earlier ones, and made this information understandable and useable to other product functions. Data mining technique has attracted much industrial attention since they are capable to extract useful information from large number of technical documents. It can provide valuable insights into these documents, which improves product understanding, quality and reliability. To do automatic data mining, we need a collection of technical documents, e.g. database. This database should include product structure, functionality, manufacturing processing and materials, all have an effect on the final design quality. It is a mixture of text, sketches, drawings, photos as well as numeric input. Such database is becoming electronically accessible and growing at an explosive rate. The increasing demand to formulate design task also greatly motivate the needs of setting up database. Hence, the development of a product design database is essential to allows designers more effective and innovative in their task. In order to effectively use past design knowledge, Wu [10] built up a design knowledge database through an elaborate process of data collection, mining, integration, storing,

ARTICLE IN PRESS J. Sun et al. / Robotics and Computer-Integrated Manufacturing 26 (2010) 224–229

management and maintenance. It includes design standards and requirements, product samples, experimental data, computational models, designers’ experiences, and failure information. Product designers need material information relevant with product issues to adjust design solutions. To meet their needs, Kesteren [11] investigated the current utilization of material properties database, and proposed some strategies for database developer to improve information presentation. MacGregor [12] built multi-block partial least squares models to integrate diverse industrial databases for previous products and current products. The results are used in an optimization framework to select raw materials, ratios to combine them and condition to process them, in order to yield a product with specified end properties at a minimum cost. Hung et al. [13] developed a knowledge-based database to support the framework in product design planning. A practical application in the semiconductor industry, system-on-achip product design planning, was used to validate the completeness and benefits of this approach. It has been seen that the existing database buildup methods for product design are largely problem specific. The lack of general and systematic methods is mainly due to the diversity and complexity of product design process. In an attempt to establish a somewhat general methodology, this paper discusses all the relevant concerns, including input source, sampling scheme, labeling policy and quality control. Using a specific problem for the proposed general methodology, development of robotics design database is studied and evaluated, which could support designers or researchers in discovering and managing past corporate design knowledge. In this study, the information from sketches, drawings and photos are not considered. Our focus, however, would be largely directed towards textual components. With such database, the design information can be organized with a comprehensive and universally accessible way, which provides a path for innovation, or exploration and integration with collaboration.

2. Robotic design database Robotic design is a combination of knowledge and experience from sensors and actuators, manipulation and planning, mechanics and control, working environments, behavior strategies and intelligence. Setting up a database for robotic design should archive technical documents from all the above mentioned topics, and consider the following issues. 1. The ideal database should well embody the original documents related with design, and experience gained from the design process. It is conceptually easy to understand, simple in manipulation and well supported by a large amount of relevant data. The content of this database should be explicitly stated, so that designers and customers can read and analyze easily. 2. This database not only intends to catalog every perspective of design knowledge by multiple data formats, but also bring meaningful and relevant information to designers. With text mining techniques, designers can have a high level view of the entire database, and quickly drill down to relevant details. Thus, they may easily look for a solution or alternatives to a similar problem, or may create a new solution. 3. This database requires constant upgrading with ongoing accumulation and revision to stay current. Since the product design knowledge has its own life cycle, designers need to capture new knowledge and utilize it to improve product quality. Historical design solutions are also kept for maintenance and repair of products.

225

With these considerations in mind, a robotic design database is setup as a case study. In the following section, the document categories and input data source of this database are discussed. 2.1. Categories in robotic design database Robots usually work in extremely dirty, dangerous or tedious environment, and they might have to sense their surrounding environment. They would do this in a way like human sense their surroundings, with vision sensors (eyes), touch, force and tactile sensors (hands and fingers), chemical sensors (nose), hearing and sonar sensors (ears) and taste sensors (tongue). These sensors can be used separately and integrated together using sensor network and sensor fusion techniques to give the robot awareness of its environment. Design of robots may also be divided according to working environments such as field robots, mining robotics, service robots, autonomous underwater vehicles, marine robotics, space robots and construction robots. The IEEE International Conference on Robotics and Automation (ICRA) sponsored by the IEEE Robotics & Automation Society (RAC), is a very active world-wide research association working on the future of robotics. The technical committee of ICRA has listed all the relevant topics about robotic design in its official website [14,15]. Table 1 lists the topics of ICRA 2005, ICRA 2007 and the topics in technical activities in RAC. In the developing database, topics given in Table 1 are summarized into seven categories of robotic design information, as shown in Table 2. Each category consists of a few subcategories which might have direct or indirect relationship with the corresponding category. 2.2. Input sources and sampling scheme The ICRA proceeding is one of the most important publications in the field of robotics and automation for technical communications and discussion. In this study, technical papers from ICRA 2003–2007 are utilized to construct this database. To ensure the analysis quality of text mining, effective and sufficient database is a must. In general, three factors influence the text mining performance of design document database: the quality of database; network architecture and problem complexity (uncontrollable factor). Among them, database is the essential controlling factor. Design knowledge, technical documents and relevant techniques result in an extremely large amount of documents. They do not need to be totally included in the developing database, since manual labeling all the product design related documents is a big burden for database developing personals. Furthermore, some documents may correspond to redundant information, and others may be less relevant to the target. A random selection of documents to build design document database cannot ensure text mining work reliably in practical tasks, since different database can give substantially different generalization error. Even if some documents may be located in the identification boundary, the use of this excessively large database will be unlikely to provide additional information, since the identification boundary has already been established. A reasonable sampling scheme could improve the developing database’s performance and support robust text classification. In order to comprehensively and sufficiently cover the entire population space with consistent and reproducible results through limit sampled data, a multi-stage cluster sampling scheme is employed. First, the population P (the whole collection of technical papers and documents from 2003 to 2007) is divided into non-overlapping clusters in terms of the above mentioned

ARTICLE IN PRESS 226

J. Sun et al. / Robotics and Computer-Integrated Manufacturing 26 (2010) 224–229

Table 1 Topics in robotics design area. ICRA 2005

ICRA 2007

Technical activities in RAC

Automation and Manufacturing Actuators and Sensors Vision and Sensing Control Planning Learning and skills Localization and navigation Human–robot interaction Special applications Humanoid robots Mobile robots Biologically inspired robots

Automation Cognitive robotics Field and service robotics Human-centered and life-like robotics Manipulation Mechanics, design and control Mobile and distributed robotics Sensing and perception Simulation, interfaces and virtual reality

Aerial robotics and unmanned aerial vehicles Agricultural robotics Algorithms for planning and control of robot motion Bio robotics Computer and robot vision Haptics Human–robot interaction and coordination Humanoid robotics Intelligent transportation systems Manufacturing automation micro/nano robotics and automation

seven categories, i.e. PC ¼ fPc1 ; Pc2 ; . . .; Pc7 g. Then, each clusterPcj ,j ¼ 1; 2; . . .; 7, is further divided into five sub-clusters according to the conference year, i.e. Pcj ¼ fPcj_2003 ; Pcj_2004 ; . . .; Pcj_2007 g. The target size of the building database is about 1500 documents, and the sampling rate for each sub-cluster is xcj ¼ f10%; 15%; 25%; 35%; 15%g, which is constant for all the categories. For example, the sampling rate in Year 2003 (10%) is the ratio between the number of sampled documents used to build robotic design database and the total number of documents under sub-clusterPcj_2003 . The sampling rates from years 2005 and 2006 are 25% and 35%, respectively. Since ICRA 2005–2006 technical papers are used as the reference for robotic design database, the sampling rate for the two years are higher than that of other years. To realize ongoing accumulation and revision to stay current, 15% of ICRA 2007 documents are used to upgrade this database. The historical information and experience are also kept in this database, represented by 10% of ICRA 2003 documents and 15% of ICRA 2004 documents. With this multi-stage cluster sampling scheme, the developing database cannot only reflect the practical distribution of the original documents population, but also cover all the relevant topics in robotic design database. Finally, an archive of 1574 English language engineering documents is collected in this robotic design database.

3. Labeling policy in the robotic design database The labeling policies serve as the rules to guide operators. They need to be explained explicitly at the beginning of the labeling process. This will not only reduce the labeling errors and maintain the good quality of developed database, but also increase consistency and usefulness of labeling process [16]. Three labeling policies have been adopted in this work. (1) Each article has at least one automatic category label. All the technical documents within this database already had subtopics, which were initially assigned by technical committee members of ICRA. These subtopics have been clustered into seven topics, so all the documents under subtopics will be

Networked robots Programming environments in robotics and automation. Prototyping for robotics and automation Rehabilitation robotics Robo-ethics Safety security and rescue robotics Semiconductor manufacturing automation Service robotics Space robotics Surgical robotics Telerobotics Underwater robotics

mapped to one of the seven categories according to Table 2. Therefore, these documents can automatically obtain one category label. For example, a document under ‘‘computer vision’’ would be automatically assigned to the category ‘‘Sensing, sensor and actuator’’. (2) When looking through the product design documents, the overlapping information can be widely found. If a document involved with multi-categories is compulsory labeled into one category, some important information may be lost in knowledge retrieval process. To avoid this kind of loss, multi-labels are utilized to deal with this issue. For example, the document titled ‘‘A UAV vision system for airborne surveillance’’ would be labeled into two categories: ‘‘Field and service robotics’’, and ‘‘Sensing, sensor and actuator’’. (3) There is no upper limit on the number of the most specific suitable labels assigned to any documents, since maximizing the information coverage is desirable. In database labeling process, operators are required to assign all the suitable labels to each document.

4. Labeling validation analysis To develop this robotic design database, sixteen graduate students are involved as labeling operators. Most of the students are either working on their doctoral or master degree in the area of Robotics at the National University of Singapore. 4.1. Measuring inter-label consistency As given by Eq. (1), Zi_j denotes the labeling percentage of the ith operator (Op) at category j(Cj), and Li_j is the number of labels in Cj assigned by Op i, i= 1,2, y, M, j= 1,2, y, N. M is the total number of operators and N is the total number of categories. N P Li_j is the total number of labels assigned by Op i. A higher j¼1

value of Zi_j means Op i categorizes more documents into Cj. With

ARTICLE IN PRESS J. Sun et al. / Robotics and Computer-Integrated Manufacturing 26 (2010) 224–229

the use of labeling percentage, the distribution of labels under each category could be compared among different operators.

Zi_j ¼

Li_j M P Li_j

ð1Þ

j¼1

Table 2 Categories in Robotic design database. Category (Cj )

Contents

1 Cognitive robotics

(1) agent-based systems, autonomous agents (2) artificial intelligence reasoning methods, learning and adaptive systems (3) human-robot interaction (4) teleoperation, telerobotics (5) virtual reality and interfaces, haptics & haptic interfaces

2 Field and service robotics

(1) aerial robotics and unmanned aerial vehicle (UAV) (2) domestic robots and entertainment robotics (3) field robots, mining robotics, service robots and space robotics (4) marine robotics and autonomous underwater vehicles (AUV) (5) robotics in agriculture and forestry (6) robotics in construction, and hazardous fields (7) search and rescue robots

3 Human-centered and life-like robotics

(1) biologically-inspired robots (biped robots, legged robots, snake robots, biorobotics) (2) biomimetics, humanoid robots, neurorobotics, medical robots and systems (3) personal robots and rehabilitation robotics (4) robot companions and social robots in home environments

4 Manipulation and planning

(1) dexterous manipulation and compliant assembly (2) grasping, handling (3) motion planning, path planning, roadmap, obstacle avoidance (4) fingers and hands

5 Mechanics, design and control

(1) control (adaptive control , force control, motion control) and control architecture (2) neural and fuzzy control. (3) flexible arms (4) calibration, identification and fault diagnosis (5) dynamics, kinematics (6) mechanism design, modeling and simulation. (7) parallel robots, redundant robots, underactuated robots, wheeled robots, micro/ nano robots

6 Mobile and multirobotics

(1) slam (simultaneous localization and mapping) (2) cellular and modular robots, self-reconfiguring robots (3) distributed robot systems (4) localization, navigation and mapping (5) cooperation system (6) nonholonomic robots, omnidirectional robots and multiple mobile robot systems

7 Sensing, sensor and actuator

(1) computer vision and omnidirectional vision. (2) force and tactile sensing (3) range sensing and surveillance systems, sensor networks and sensor fusion. (4) sonars, visual servoing and visual tracking (5) smart actuators and microactuators

227

Table 3 shows the labeling percentage of each category in the 1st round. In this table, xj is the average of labeling percentage of Cj, and sj is the corresponding standard deviation. As shown in Table 3, Op 14 is found to have an unusual pattern of labels assignment. All the categories assigned by him stay outside one sigma, which is greatly different from the results of the other operators. This indicates Op 14 has a very obvious bias or incomplete understanding towards either the categories or their potentially related documents or both. Since it is very difficult for him to link up the relationship between the categories and corresponding technical documents, his labeling results were rejected after the 1st round. The rest of 15 operators move to the 2nd round. From results in Table 3, we also noted that the labeling percentage Z1_2 (the 2nd category assigned by Op 1) and Z12_6 (the 6th category assigned by Op 12) are beyond two sigma, and only two categories assigned by Op 11 are within one sigma. The summary of labeling results are listed in the last two columns of Table 3. For each operator, the number of categories where the labeling percentage is located within one sigma and two sigma are plotted separately. Generally, the number of labels

Table 3 Labeling percentage results in the 1st round. i

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 xj Sj

Zi_j C1

C2

C3

C4

C5

C6

C7

7 Sj

7 Sj

0.0856 0.1037 0.1176 0.1174 0.0953 0.1178 0.0905 0.0973 0.0893 0.0968 0.0857 0.0968 0.0931 0.1266 0.1084 0.0894 0.1007 0.0130

0.1208 0.0736 0.0603 0.0596 0.0512 0.0603 0.0558 0.0492 0.0642 0.0612 0.0575 0.075 0.0622 0.0962 0.0708 0.0584 0.0672 0.0181

0.1142 0.1348 0.1273 0.1258 0.1041 0.1275 0.0895 0.1162 0.1362 0.1061 0.1628 0.1661 0.0859 0.1492 0.1211 0.1196 0.1242 0.0227

0.1824 0.1875 0.1818 0.1812 0.1451 0.1816 0.1403 0.1246 0.1644 0.1535 0.1979 0.1782 0.1341 0.2111 0.1830 0.1577 0.1690 0.0243

0.1443 0.1671 0.1608 0.1643 0.2369 0.1610 0.2657 0.2484 0.1894 0.2580 0.0995 0.0886 0.2622 0.1201 0.1618 0.2214 0.1840 0.0583

0.1913 0.1943 0.1919 0.1908 0.1787 0.1917 0.1905 0.1802 0.1967 0.1730 0.2198 0.2349 0.1804 0.1630 0.1722 0.1801 0.1889 0.0180

0.1613 0.1388 0.1604 0.1608 0.1889 0.1602 0.1677 0.1840 0.1598 0.1515 0.1754 0.1601 0.1821 0.1338 0.1827 0.1734 0.1655 0.0163

5 6 6 6 6 6 4 4 7 6 2 4 3 0 6 7

6 7 7 7 7 7 7 7 7 7 7 6 7 7 7 7

Table 4 Labeling percentage results in the 2nd round. i

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 xj sj

Zi_j C1

C2

C3

C4

C5

C6

C7

0.1132 0.1037 0.1176 0.1174 0.0953 0.1178 0.0905 0.0973 0.0893 0.0968 0.1080 0.0829 0.0931 0.1084 0.0894 0.1014 0.0117

0.0639 0.0737 0.0603 0.0596 0.0512 0.0603 0.0558 0.0492 0.0642 0.0612 0.0578 0.0650 0.0622 0.0708 0.0584 0.0609 0.0064

0.1387 0.1348 0.1273 0.1258 0.1041 0.1275 0.0895 0.1162 0.1362 0.1061 0.1207 0.1420 0.0859 0.1211 0.1196 0.1197 0.0169

0.1837 0.1875 0.1818 0.1812 0.1451 0.1816 0.1403 0.1246 0.1644 0.1535 0.1537 0.1532 0.1341 0.1830 0.1577 0.1617 0.0205

0.1454 0.1671 0.1608 0.1643 0.2369 0.1610 0.2657 0.2484 0.1894 0.2581 0.2010 0.2366 0.2622 0.1618 0.2214 0.2053 0.0436

0.1927 0.1943 0.1919 0.1908 0.1787 0.1917 0.1905 0.1802 0.1967 0.1730 0.1814 0.1832 0.1804 0.1722 0.1801 0.1852 0.0079

0.1624 0.1388 0.1604 0.1608 0.1889 0.1602 0.1677 0.1840 0.1598 0.1515 0.1763 0.1368 0.1821 0.1827 0.1734 0.1657 0.0158

ARTICLE IN PRESS 228

J. Sun et al. / Robotics and Computer-Integrated Manufacturing 26 (2010) 224–229

Table 5 Comparison of revised results between Op 11 and the average result. category

j

C1

C2

C3

C4

C5

C6

C7

1st round(n =16)H0 : Z11 _j ¼ xj Ha : Z11 _j a xj

xj Sj

0.1007 0.0130 0.0857 4.6107 reject

0.0673 0.0181 0.0575 2.1544 accept

0.1242 0.0227 0.1628  6.7988 reject

0.1690 0.0243 0.1979  4.7526 reject

0.1840 0.0583 0.0995 5.7988 reject

0.1889 0.0180 0.2198  6.8452 reject

0.1655 0.0163 0.1754  2.4172 accept

0.1014 0.0117 0.1080  2.1947 accept

0.0609 0.0064 0.0578 1.8941 accept

0.1197 0.0169 0.1207  0.2354 accept

0.1617 0.0205 0.1537 1.5175 accept

0.2053 0.0436 0.2010 0.3784 accept

0.1852 0.0079 0.1814 1.8653 accept

0.1657 0.0158 0.1763  2.5983 accept

Z11_j t11_j t.01/2 = 4.073 2nd round(n= 15)H0 : Z11 _j ¼ xj Ha : Z11 _j a xj

xj Sj

m11_j t11_j t.01/2 = 4.140

Table 6 Comparison of revised results of Op 1 and 12 with the average results. i

1st round (n= 16)

Op 1C2

x2 0.0673 x6 0.1899

Op 12C6

S2 0.0181 S6 0.0180

2nd (n= 15)

Z1_2 0.1208 Z12_6 0.2349

t1_2  11.8232 t12_6  10.2222

t.01/2 =4.073 reject t.01/2 =4.073 reject

from each category assigned by an individual operator should be controlled within two sigma. To enhance understanding and prompt agreement, an expert is involved to communicate with the three operators and improve the understanding of related categories. They are required to modify their labeling results of the corresponding categories. Table 4 shows the labeling results of the 2nd round, after rejecting the results from Op 14 and collecting the revised results from Op 1, 11 and 12.

x2 0.0609 x6 0.1852

S2 0.0064 S6 0.0079

Z1_2

t1_2  1.8155 t12_6 0.7466

0.0639 Z12_6 0.1832

t.01/2 = 4.140 accept t.01/2 = 4.140 accept

Table 7 Comparison the labeling variance in the 1st and 2nd round. category

C1

C2

C3

C4

C5

C6

C7

Sj (1st round) Sj (2nd round) F12_j F0.10(15.14) =2.01

0.0130 0.0117 1.2346 accept

0.0181 0.0064 7.9983 reject

0.0227 0.01693 1.7978 accept

0.0243 0.0205 1.4051 accept

0.0583 0.0436 1.7880 accept

0.0180 0.0079 5.1915 reject

0.0163 0.0158 1.0643 accept

4.2. Analysis of labeling results In order to verify the performance gained after the communication is significant, t-test is used to evaluate the labeling results of the three operators (Op 1, 11 and 12) in the two rounds. In Table 5, two sets of results are compared. The labeling result of the 11th Op is compared with the average labeling result of 1st round, and the revised result of the 11th Op is compared with the average labeling result of 2nd round. In each comparison, the null hypothesis is H0 : Z11 _j ¼ xj and the alternative hypothesis is Ha : Z11 _j axj . Z11 _j is the labeling percentage of Op 11 in the j th category and t11_j is the corresponding value in t-test. In the 1st round, only the performance from C2 and C7 are accepted as the same as the average result, and the rests are significantly different. In the 2nd round, no significant difference is found between them. Table 6 shows the other two sets of comparison results: one is the revised results from Op 1 in C2 with the average result of C2 and the other one is the revised result of Op 12 in C6 with the average result of C6. In contrast with the result of the 1st round, the two operators’ understanding to the individual categories has no significant difference with the average result of the 2nd round. Hence, their upgraded results can be used for further analysis. 4.3. Analysis of labeling variance As shown in Table 7, the variation comparison of labeling percentage of all the categories is listed. F12_j is the ratio of standard deviation between the 1st and 2nd round. The null hypothesis is the standard deviation in the 1st and 2nd round is the same. The results from F-test show that the labeling variation for C2 and C6 in the 2nd round is less than that in the 1st round,

and the rest are equal. Hence, the labeling qualities for C2 (Op 1), C6 (Op 11 and 12) have been improved. These experiments have confirmed the merits of joint discussion and a further step to organize results together. Since people dedicated to a certain domain may not be fully knowledgeable with all the sub-knowledge braches, the joint discussion with at least one expert in this domain has been proved as an effective means to promote the knowledge understanding during labeling process. Meanwhile, the labeling history of different operators can be tracked, and creating a complete database of robotic design in the near future can be probably operated in a more efficient way with fewer operators involved without sacrificing the quality. The total 15 operators’ results are organized together as the final labels in the developed database. Organized effective results of operators can effectively enhance the knowledge understanding and improve the overall database quality. This database could greatly assist designers to retrieve information and mine for knowledge in robotic design area. The automatic identification and indexing of concepts within this database will be realized, so that designers would be relieved from strenuous effort of going through many irrelevant records before zooming in on those that are of concern to them. Designers could also perform quick extraction of salient information from large previously unseen databases, and get fast feedback.

5. Conclusion In product design process, designers usually begin with prototype from a library of historical designs, similarly R&D

ARTICLE IN PRESS J. Sun et al. / Robotics and Computer-Integrated Manufacturing 26 (2010) 224–229

engineers may turn to archival documents to look for solutions or inspiration to solve this engineering problems. Therefore, building a database with rich source to keep design information, experience and knowledge is essential, and can be utilized by designers to solve design problems. In this paper, robotic design database is built up as a case study based on the proposed methodology. t-test is used to analyze operators’ behavior; F-test is used to analyze the understanding of each category. The future work is to integrate the labeling results together as the final label in the developing database, so that further research and application of text mining and information retrieval will be carried out.

Acknowledgement The authors would like to thank the support from Singapore Ministry of Education’s AcRF Tier 1 funding (R-265-000-209-112/ 113). References [1] Argyris C, Schon DA. In: Organizational learning: a theory of action perspective. Reading, MA: Addison-Wesley; 1978. [2] Dosi G. In: Dosi G, editor. The nature of the innovative process in technical change and economic theory. London: Pinter; 1988. p. 221–38. [3] Brown JS, Duguid P. Organizational learning and communities of practice: toward a unified view of working, learning and innovation. Organization Science 1991;2/1:40–57. [4] Nevis EC, DiBella AJ, Gould JM. Understanding organizations as learning systems. Sloan Management Review 1995;1995:73–85 Winter.

229

[5] Moteki Y, Arai Y. Operation planning and quality design of a polymer process. ZFAC DYCORD. Bournemouth, UK: Pergamon Press; 1986. [6] Jaeckle CM, MacGregor JF. Product design through multivariate statistical analysis of process data. American Institute of Chemical Engineers Journal 1998;44(5):1105–18. [7] Lakshminarayanan S, Fujii H, Grosman B, Dassau E, Lewin DR. New product design via analysis of historical databases. Computers and Chemical Engineering 2000;24:671–6. [8] Borosy AP. Quantitative composition-property modeling of rubber mixtures by utilizing artificial neural networks. Chemometrics and Intelligent Laboratory Systems 1999;47:227–38. [9] Ferguson, C-J, Lees,B, MacArthur,E, Irgens, C An application of data mining for product design. In: Proceedings of the IEE colloquium on knowledge discovery and data mining, London, UK, 1998. [10] Wu Lv; Fuyan Lin; Wenjuan Wang; Bingfeng Guo, Web-based knowledge reuse in product design, 2005. In: Proceedings of the ISCIT 2005 IEEE international symposium on communications and information technology 2005, vol. 2, p. 1092–5. [11] Kesteren, I.E.H. van Product designers’ information needs in materials selection, Materials and Design. PhD thesis, Delft university of Technology. [12] MacGregor John F, Koji Muteki, Toshihiro Ueda, On the rapid development of new products through empirical modeling with diverse data-bases. In: Marquardt W, Pantelides C, editors. Proceedings of the 16th European symposium on computer aided process engineering and ninth international symposium on process systems engineering. Elsevier B.V., 2006. [13] Hsu-Fang Hung Hsing-Pei, Kao, Juang Ying-Shen. An integrated information system for product design planning. Expert Systems with Applications 2008;35(1–2):338–49. [14] 2005 IEEE International conference on robotics and automation, Barcelona, Spain, April 18–22, 2005, /http://www.icra2005.org/frontal/ Topics.aspS. [15] 2007 IEEE international conference on robotics and automation, April 10–14, 2007, Roma, Italy, /http://www.icra07.org/S. [16] Lancaster FW. In: Indexing and abstracting intheory and practice, 2nd ed. Champaign, IL: University of Illinois; 1998.