Advanced Engineering Informatics 25 (2011) 131–146
Contents lists available at ScienceDirect
Advanced Engineering Informatics journal homepage: www.elsevier.com/locate/aei
Semantic-based information retrieval in support of concept design Rossitza Setchi *, Qiao Tang, Ivan Stankov Cardiff University, School of Engineering, The Parade, Cardiff CF24 3AA, UK
a r t i c l e
i n f o
Article history: Received 12 November 2009 Received in revised form 1 June 2010 Accepted 19 July 2010 Available online 12 August 2010 Keywords: Concept design Creativity Inspiration Image retrieval Semantic technology Semantics
a b s t r a c t This research is motivated by the realisation that semantic technology can be used to develop computational tools in support of designers’ creativity by focusing on the inspirational stage of design. The paper describes a semantic-based image retrieval tool developed for the needs of concept cars designers from two renowned European companies. It is created to help them find and interpret sources of inspiration. The core innovation of the tool is its ability to provide a degree of diversity, ambiguity and uncertainty in the information gathering and idea generation process. The tool is based on the assumption that there is a semantic link between the images in a web page and the text around them. Furthermore, it uses the idea that the more frequently a term occurs in a document and the fewer documents it occurs in, the more representative this term is of that document. The new contribution is linking the most meaningful words in a document with ontological concepts, and then finding the most powerful set of concepts representing that document and consequently the images in it. This is based on the observation that monosemic words (with a single meaning) are more domain-oriented than polysemic ones (that have multiple meanings), and provide a greater amount of domain information. The tool tags images by first processing all significant words in the text around them, extracting all keywords and key phrases in it, ranking them according to their significance, and linking them to ontological concepts. It generates a set of concept numbers for each text, which is then used to retrieve information in a process called semantic expansion, where a keyword query is also processed semantically. The proposed approach is illustrated with examples using the tool developed for the needs of Stile Bertone and Fiat, Italy, two of the industrial partners in the TRENDS project sponsored by the European Community. Ó 2010 Elsevier Ltd. All rights reserved.
1. Introduction Contrary to the popular belief, great design ideas are not the result of a sudden breakthrough, a lightning strike of a genius or a ‘‘Eureka” moment [13,5]. As testified by numerous experimental studies, creative design involves hard work and iterative cycles of prototyping, testing, and refinement. It is a thoughtful process, which is based on a systematic generation and evaluation of design concepts, whose form and function is to achieve users’ goals while satisfying a specified set of constraints [13]. Concept designers are sometimes referred to as ‘‘visual futurists” as they create visual designs for the future. These designs might be impractical, non-operational and too expensive; often they never reach a production line. Such designs however frequently dominate show rooms and trade shows with their style and unconventional look. Their mission is to convey a visual representation of an idea, mood, style or new technology before it is incorporated in an industrial design. Research indicates that the originality and creativity of concept designers could be stimulated by using sources of inspiration, i.e. * Corresponding author. Tel.: +44 29 20875720. E-mail address:
[email protected] (R. Setchi). 1474-0346/$ - see front matter Ó 2010 Elsevier Ltd. All rights reserved. doi:10.1016/j.aei.2010.07.006
the conscious use of previous designs [14,55]. Sources of inspiration help designers define the context of their new designs, inform their creation and reflect on their emotional impact. By observing and interpreting sources of inspiration, creative designers form vocabularies of semantic expressions, pallets of colours, or mood boards with images, which express their emotions, inspire their creativity and help them communicate ideas to colleagues and clients [4,28,29,50]. This research is motivated by the realisation that semantic technology can be used to develop computational tools in support of designers’ creativity by focusing on the inspirational stage of design and the process of creating mood boards and colour pallets. The paper describes a semantic-based image retrieval tool developed for the needs of concept cars designers from two renowned European companies. The core innovation of the tool is its ability to provide a degree of diversity, ambiguity and uncertainty in the information gathering and idea generation process. The paper is organised as follows. Section 2 analyses the specific information requirements of concept designers. Section 3 reviews two main areas of research closely related to the semantic-based image retrieval approach proposed in this paper: (i) content-based retrieval of images and (ii) semantic-based information retrieval. Section 4 describes the semantic image retrieval tool created to
132
R. Setchi et al. / Advanced Engineering Informatics 25 (2011) 131–146
help concept designers find and interpret sources of inspiration, and its evaluation. The proposed approach is illustrated with examples using the tool developed for the needs of Stile Bertone and Fiat, Italy, two of the industrial partners in the TRENDS project sponsored by the European Community [51]. Section 5 presents the conclusions and the directions for future research.
2. Information requirements of concept designers Concept designers (often called stylists in the automotive industry) and industrial designers have different goals and perspectives [49,50]. Industrial designers use their sensibility to capture users’ needs and convert them into customer values and market opportunities. The products they create must be technologically feasible and commercially viable. Concept designers on the other hand are driven by different goals. Their role is not just to make new technology more attractive to customers or introduce new functional features, as some industrial designers do. Concept designers have to imagine the world from multiple perspectives, generate novel ideas and create new forms of value. Their mission is to be creative. The creative process normally starts with a design brief, which outlines the design intent and is often deliberately vague. For example, the design brief of a recently held design competition organised by a large car manufacturer asked the entrants to ‘‘imagine a car for the megalopolis of tomorrow” and consider four important aspects: environmental friendliness, social harmony, interactive mobility and economic efficiency. The concept development stage then starts; it is aimed to produce an initial representation of the design concept. Sketches are used to focus and guide non-verbal thinking, externalise and refine ideas [50]. In the automotive industry and other areas of design where visual identity and originality is important, designers also create mood boards displaying lifestyle images which help them find suitable semantic adjectives and create palettes of colours, shapes and forms [4,28,29,50]. Fig. 1 shows typical representations used by concept designers in the automotive industry. The focus of this research is the idea generation process during which, as reported by Westerman et al. [57], designers extensively use various content-based image and text-based information retrieval tools. In particular, the research described in this paper investigates the use of semantic technology in assisting designers in finding suitable images which they may use to create palettes and mood boards. This section concentrates on the specific information retrieval requirements of creative designers. As stated by Vijaykumar and Chakrabarti [53], the creative value of the design solutions generated and their novelty is directly dependent on the availability of information and knowledge influences. This is particularly valid for the inspiration stage of design and the availability of sources of inspirations. This section explores the information needs of creative designers during the process of collecting sources of inspirations. The starting point of this analysis is Ford’s assertion [18] that creative thinking involves both high level of abstraction and high level of dissimilarity. According to Sternberg [45], creativity requires a confluence of intellectual abilities, knowledge, styles of thinking, personality, motivation, and environment. The intellectual abilities are considered of particular importance. As stated in the same study, creative thinkers need synthetic skill to see problems in new ways and to break the bounds of stereotypical thinking, analytic skill to recognise the ideas worth pursuing, and contextual skill to know how to persuade others of the value of one’s ideas. Creative thinking involves the ability to break conventional rules of thinking, develop new strategies, tolerate ambiguity, maintain sight of the big picture, handle uncertainty and make decisions [13,17].
Design is often seen as a series of transformations from the problem to the solution domain. In this process, analytical and creative thinkers use different sets of cognitive tools. As indicated by Ansburg and Hill [2], good analytic thinking is characterised by sustained directed attention because solutions to analytic problems require focus on the problem elements. Creative thinkers on the other hand usually employ data that is not directly linked to the problem addressed, take advantage of incidentally presented cues, and tend to collect a wide range of ideas, sometimes seeming irrelevant and highly dissimilar, that may lead to insight [2,47]. Divergent thinking helps designers imagine the world from multiple perspectives, see problems in new ways and escape stereotypical thinking. Design thinking is regarded as a transformation from the concept domain to the knowledge domain. Research on divergent and convergent thinking has shown that the knowledge domain is dominated by convergent thinking (e.g. asking low-level questions and considering fine detail) while divergent thinking operates in the concept domain [13]. Research indicates that creativity is directly linked to the ability to conceptualise and think using abstract terms. As stated by Ford [18], ‘‘the level of creativity is dependent on the level of abstraction of the entities concerned”. This is particularly relevant to conceptual design which is defined as the act of handling problems and solutions at high level of abstraction [7]. As shown in the same study, customer requirements can exist in multiple levels of abstraction; their representation in limited abstraction depth can lead to design solutions of inadequate details. Maintaining a high level of abstraction helps concept designers keep sight of the big picture and avoid conventional thinking. Creativity involves associating ideas and concepts previously seen as unrelated. According to Gomes et al. [21], creative reasoning involves cross-domain transfer of ideas (analogy), combination of ideas, and exploration and transformation of conceptual spaces. Analogy is an important reasoning tool in creative design as it enables the generation of new solutions using ideas from semantically distant domains. Idea association is also an important strategy for stimulating creativity as it uses long-term memory to link ideas in the conceptual design stage and generate novel solutions. Idea association involves linking designer’s long-term memory internally with the various sources of inspiration used externally by applying similarity, contrast and contiguity principles to link ideas [26]. Taura et al. [47] study the use of concepts in creating new designs by employing concept abstraction (analogy), concept blending and concept integration. They claim that concept combination (i.e. combining two unrelated concepts to build a new design) creates innovative solutions especially in those cases when the two concepts are highly dissimilar (for example, a new type of chair shaped like a swan could be designed by taking a cue from the words ‘‘swan” and ‘‘chair”). The study concludes that although analogical reasoning (transferring the characteristics of an existing concept to a new concept) plays an important role in creative design, concept blending (the process of blending concepts at an abstract level and producing a new concept that inherits the abstract features of the two base concepts without reusing their concrete features) is much more creative. Furthermore, the need to support designers’ divergent thinking, abstract representations, analogical reasoning, and idea associations during the information retrieval stage of design is evidenced by a number of specific studies from two application domains: fashion and concept cars. A study by Ward et al. [55] of image browsing in the fashion industry shows that the fashion sector often relies on historical collections for design inspiration. Working in a highly competitive and creative environment, fashion designers expect their image
R. Setchi et al. / Advanced Engineering Informatics 25 (2011) 131–146
133
Fig. 1. Concept design in the automotive industry.
retrieval system not just to retrieve ranked similar matches as in the query-by-example paradigm. It has to support user-directed navigation and browsing leveraging human perception and design knowledge and to be able to provide relevance based on high-level semantics rather than low-level visual similarity. The paper concludes that most browsing/navigation models nowadays are more suitable for a retrieval task that assumes that the user has a mental image of what they hope to find and are very unsuitable for scenarios where users are looking for inspiration. Another study from the same domain [30] investigates the way creative designers analyse past fashion trends and create new colour proposals. The process involves selecting appropriate words and/or concepts, which reflect the designer’s ideas, associating them with images, and finally choosing colours, which best represent the concepts identified in the first step. As stated in the paper, the image retrieval system in support of designers’ creativity should be able to choose among all colours those, which better express the words, concepts and pictures chosen by the stylist. All words used in the study (e.g. ‘‘magnetico”, ‘‘passione”, ‘‘sensibilita”, ‘‘sportivo”, etc., all in Italian) are semantic words, which express abstract concepts. The use of abstract concepts in information search is also observed by Westerman et al. [56] who explore the use of computer search facilities to locate inspirational images for a concept car design task. The study reports that designers sometimes use one car-related keyword to ‘‘anchor” their search to the design domain, and couple this with more semantically distant search terms (e.g. ‘‘car” + ‘‘exciting” + ‘‘futuristic”; ‘‘bold” + ‘‘colourful” + ‘‘cars”). Designers use a variety of query strategies. They use emotionally evocative terms, search terms that are diverse to the target of their design task (i.e. cars), under-specified search queries using fewer rather than more search terms, or abstract rather than concrete search terms. The study concludes that structured diversity and serendipity can be advantageous in the context of creative design [57].
Serendipity is defined by Andre et al. [1] as the finding of unexpected information or making an intellectual leap of understanding while engaging in information activity. However, although obtaining information in an accidental, incidental, opportunistic or serendipitous manner is one of the three typical ways of acquiring information [48], most information search engines do not support serendipity. Another specific challenge addressed by this research is the realisation that what is normally considered to be an effective information retrieval algorithm (i.e. producing high precision and recall) may be rather poor for serendipity, and thereby creativity [56,57]. The literature review presented in the next section explores how current image and text retrieval search technologies address the specific information requirements of concept designers: support for divergent thinking, serendipity, and high levels of abstraction and dissimilarity. 3. State-of-the-art review There two main types of methods aimed at extracting semantic concepts from images: content-based and text-based. The contentbased methods use classifiers to extract semantics directly from the images while the text-based approaches rely on the semantic link between the image and the text around it. 3.1. Content-based image retrieval As discussed in the previous section, concept designers primarily operate in the concept space and use rich semantic and abstract terms to search for inspirational images. Most commercially available search tools however offer content-based image retrieval based on low-level feature extraction. A promising recent development in this area is the research aimed at reducing the semantic
134
R. Setchi et al. / Advanced Engineering Informatics 25 (2011) 131–146
gap between the low-level image features used in content-based retrieval and the high-level concepts employed in queries [52,11]. As defined by Smeulders et al. [42], the semantic gap is ‘‘. . .the lack of coincidence between the information that one can extract from the visual data and the interpretation that the same data have for the user in a given situation”. In other words, the semantic gap is the discrepancy between the limited descriptive power of the low-level image features and the richness of the user semantics. Content-based image retrieval uses features such as colour, texture, shape and spatial location. Normally the keywords used in the queries either correspond to identifiable items describing the visual content of an image or relate to the context and the interpretation of that image. The user may be searching for ‘‘an image of” a particular entity (e.g. a flower or a child) or ‘‘an image about something” (e.g. an adventurous trip abroad). Advances in image analysis, object detection and classification techniques may facilitate the automatic extraction of the first type of keywords. However, keywords belonging to the second category are unlikely to be automatically obtained from images [54,27,16] because, from the point of view of feature representation, there is no single visual feature, which best describes the image content [52]. A particular challenging aspect in this context is dealing with concepts, which have no visual appearance in the images. Examples include concepts related to categories such as time, space, events and their significance, as well as abstract terms and emotions [15]. It should be noted that this aspect was identified in the previous section as a particularly important requirement for concept designers. Such concepts could be extracted from annotations or the text complementing the image. Most advanced image retrieval approaches nowadays employ hybrid techniques, which combine the use of visual features and text [60,36,16]. 3.2. Text-based image retrieval Text-based image retrieval is based on processing metadata, annotations or free text. In its simplest form, it involves forming queries using database fields (such as ‘‘creator”, ‘‘location”, etc.) and Boolean logic. Keyword-based search methods however are not suitable in those cases when the user does not have a clear goal in mind. In addition, as stated by Hyvönen et al. [22], a keyword in a document does not necessarily mean that the document is relevant, and relevant documents may not contain the keyword used in the query. An important development in the recent years is the use of semantic technologies and metadata languages as they provide means for defining class terminologies with well defined semantics and a flexible data model for representing metadata descriptions [22]. In particular, controlled vocabularies, taxonomies, free text descriptions and annotations are employed to describe or classify the images in order to ease the retrieval. Other approaches rely on the use of ontologies to provide different views for navigation and the terminology for creating the metadata or the annotations of the images [22,12,60,44]. It must be noted, however, that different ontologies may not have the same degree of formality. Controlled vocabularies, dictionaries, thesauri, and taxonomies are some of the most lightweight ontology types that have been widely used in annotation. These forms of vocabularies are not strictly formal and the annotations produced using them are normally pointers to terms in the vocabulary, which can be used to improve the search by using synonyms, antonyms, hyponyms and hypernyms [9]. On the other hand, heavyweight axiomatised and formal ontologies are employed to incorporate formal semantics in the description of documents’ content [3]. However, most of the formal ontologies do not include the vast number of terms that a thesaurus has. As stated by Corcho [9],
thesauri, controlled vocabularies, and heavyweight ontologies are complementary since the first two can be used to provide agreed terms in specific domains while the latter provides formal semantics and constraints evaluation. More advanced approaches use Natural Language Processing (NLP) algorithms, lexical ontologies and thesauri to index image collections, extract concepts, resolve terminology problems and disambiguate words. Indexing large document and image collections using ontologies is a relatively new area of research. Ontological indexing is defined as linking the words in the indexed text to ontological concepts [24]. A second definition proposed by Setchi and Tang [40] suggests that words can be treated as entities or concepts. They define concept indexing as ‘‘the analytic process of identifying instances (entities) and abstract ideas (concepts) within a text document, and linking them to ontological concepts”. In this definition, an entity is an identifiable and discrete instance existing in a text document, while a concept is an abstract idea inferred or derived from specific instances. A concept index is a machine understandable index of entities and concepts contained in a document collection. Concept indexing can be used both for representing a document using abstract terms, and for assigning concepts to specific words in documents. The process normally involves two steps: (i) extracting entities from unstructured text-based content using a lexical ontology, and (ii) identifying concepts with the help of a concept knowledge base. Once entities and concepts are isolated, they are used to build a concept index [41]. The semantic concepts could be extracted and identified by disambiguating the sense of each word using linguistic repositories [43,24], semantic repositories [8] or domain-specific ontologies [19]. As highlighted by Conesa et al. [8], linguistic repositories such as WordNet do not capture the semantic relationships between concepts [59]. On the other hand, semantic repositories such as OpenCyc [35] developed to capture and represent common sense, do not contain linguistic relationships (e.g. whether two concepts are synonyms), and domain dependent repositories like the Gene Ontology only represent certain aspects of a domain, not the complete domain. From the point of view of the problem in the focus of this paper, information support for concept design, it is clear that semantic technologies present an opportunity, which needs to be exploited. A recent study by Vijaykumar and Chakrabarti [53] suggests that the usefulness of the information captured in a design history depends on its indexing. Designers need help in information gathering, where they have to manage and categorise a huge amount of data. A study by Cox et al. [10] confirms that, compared to indexing based on low-level image features, indexing using keywords provides improved retrieval performance as it is much more suitable in terms of human similarity perception. A work in that direction is reported by Kuroda and Hagiwara [25] who combine words referring to specific objects with words which express ambiguous kansei feelings. Semantic technologies enable information retrieval using high-level concepts, which are closer to the way creative designers think and search for sources of inspiration. In addition, semantic expansion (defined as query expansion driven by the semantic similarity of the words and the concepts they are associated with) provides a degree of diversity and serendipity, both very important in the domain of creative design.
4. Semantic-based image retrieval 4.1. Algorithm The semantic-based image retrieval algorithm proposed in this paper is based on the earlier work of the authors [40,41] on
135
R. Setchi et al. / Advanced Engineering Informatics 25 (2011) 131–146
concept indexing. That approach involves generating ontology tags to index documents using supervised machine learning, a general purpose ontology OntoRo [33] and an ontologically tagged corpus OntoCorp. The experimental tests conducted show a tagging accuracy of 78.91% which is encouraging in view of the possible improvements of the algorithm. A substantial limitation of that approach however is the need to have a large ontologically tagged corpus available. Similar to that approach, OntoTag, the algorithm reported in this paper, uses a general purpose ontology and links content to concepts by generating semantic tags. OntoTag however does not rely on machine learning, mainly due to the lack of a domain-specific ontologically tagged corpus. OntoTag is based on the assumption that there is a semantic link between the images in a web page and the text around them. Furthermore, it uses the idea that the more frequently a term occurs in a document and the fewer documents it occurs in, the more representative this term is of that document. The td-idf measure [38] widely used in information retrieval is the embodiment of this idea. The new contribution is linking the most meaningful words in a document (defined as the most representative words for that particular document) with ontological concepts, and then finding the most powerful set of concepts representing that document and consequently the images in it. This is based on the idea that monosemic words (with a single meaning) are more domain-oriented than polysemic ones (that have multiple meanings), and provide a greater amount of domain information. This converges with the common property of less frequent words being more informative, as they typically have fewer senses [20]. Therefore, the probability of the word to belong to a certain concept is directly dependent on its polysemy. Words that relate to one concept only are therefore more significant for that domain than words that relate to many concepts. The algorithm involves four steps (Fig. 2) as follows. Step (i): Retrieving web pages by targeted crawling and creating a collection of documents and images. This involves grabbing pages from web sites and domains identified by the designers as their sectors of influence in a given context. For instance, designers of concept cars list 12 sectors of influence, among them ‘‘aerospace”, ‘‘automobile”, ‘‘architecture”, ‘‘advertisement”, ‘‘design”, ‘‘fashion” [32].
Step (ii): Identifying and ranking the most frequently used keywords and phrases using the tf-idf function (1) [38].
wtf-idf ðti ; dj Þ ¼ #ðt i ; dj Þ log
#D #ðt i ; DÞ
ð1Þ
where wtf-idf ðt i ; dj Þ represents the quantified weight of a term t i contained in a document dj , #ðti ; dj Þ is the term frequency in that document (i.e. the number of times a certain word appears in a document dj ) #D is the total number of documents in the collection D, and #ðt i ; DÞ is the document frequency in the collection, i.e. the number of documents in the collection, which contain the term t i . The term ti is selected as a meaningful word in a document dj if wtf-idf ðti ; dj Þ P e where e represents an empirically validated threshold value, which might be different for different domains and collections of documents and images. Step (iii): Associating the most frequently used keywords and phrases with ontological concepts, and computing the weight wck ðdj Þ of each concept Ck using (2).
wck ðdj Þ ¼
n X wtf-idf ðt i ; dj Þ i¼1
1 C k ðt i Þ
ð2Þ
where n is the number of terms t i related to a concept Ck in a document dj , C k ðti Þ is the number of concepts Ck the term t i is related to, wtf idf ðt i ; dj Þ, computed using (1) in step (ii), indicates the significance of a certain term within a document. Step (iv): Ranking the concepts according to their significance wck ðdj Þ and tagging the images with those concepts, which have the highest weight. A simple numerical example illustrating formulae (1) and (2) is given in [39]. The next section introduces the two ontologies used in this research. 4.2. Ontologies As shown in Fig. 2, the algorithm links the terms, which best represent the content of a document to ontological concepts. An ontology specifies a conceptualisation of a domain in terms of
Fig. 2. OntoTag: linking documents, terms and concepts.
136
R. Setchi et al. / Advanced Engineering Informatics 25 (2011) 131–146
concepts, attributes and relations. Concepts are typically organised in a tree structure and normally linked through semantic relations. The semantic algorithm proposed in this paper uses concepts from two ontologies: a generic lexical ontology called OntoRo [41] and a domain-specific ontology for designers based on the so called Conjoint Trend Analysis (CTA) method [4]. The method is developed by studying the cognitive activities of the designers during the information phase. Fundamental to the CTA method is the establishment of a valuefunction-attributes chain, which uses semantic adjectives to link the marketing and design worlds. It has been found that the same semantic adjectives are used by designers when working with images and sketching new design concepts. Examples of such value-function-attributes chains in the context of this study are the sequences ‘‘comfortable life” (value) – ‘‘ergonomic” (semantic adjective) – ‘‘soft” (functional attribute), ‘‘inner harmony” – ‘‘delicate” – ‘‘fragile”, ‘‘inner harmony” – ‘‘pure” – ‘‘bright”, ‘‘accomplishment” – ‘‘perfect” – ‘‘proportioned” and ‘‘world of beauty” – ‘‘elegant” – ‘‘refined”. The CTA ontology is defined in OWL and developed in Protégé by creating instances and linking them using abstraction, aggregation and dependency-based semantically-rich relations. The current version of the CTA ontology contains 10 classes and 503 instances of CTA concepts. The second ontology used in this research, OntoRo, is a generalpurpose lexical ontology based on Roget’s Thesaurus [37]. The decision to build and use OntoRo instead of employing WordNet or OpenCyc was primarily based on the fact that WordNet is a linguistic rather than a semantic repository, and OpenCyc is a common sense ontology. To illustrate the differences, a simple experiment with the entry ‘‘perfection” was conducted (Fig. 3), which compared the output of each ontology. As seen in the figure, OpenCyc recognises the concept as an instance of ‘‘quality” but produces no semantic links. WordNet generates three synsets for the word ‘‘perfection”, with the meaning: (i) flawless, (ii) idol, and (iii) making something perfect. In addition, WordNet provides part-of-speech (POS) information and examples. OntoRo, on the other hand, links the word ‘‘perfection” to 16 concepts, among them #54:completeness, #245:symmetry, #669:preparation, as well as #933:virtue, #950:purity and #965:diveness (see Fig. 3). One of the semantically related concepts, #694:knowledge, belongs to class ‘‘volition: the exercise of will” and section ‘‘voluntary action”. It contains 287 words and phrases semantically related to the concept of ‘‘perfection” such as ‘‘art”, ‘‘craft”, ‘‘talent” and ‘‘good at”. Concept #937:good person is an example of a completely different context. It belongs to class ‘‘emotion, religion and morality”, section ‘‘morality” and contains 46 words such as ‘‘angel”, ‘‘heart of gold” and ‘‘model of virtue”. Similar to WordNet, OntoRo also classifies the words using POS categories. The two main arguments in support of the use of Roget’s (and subsequently OntoRo) in the context of this research, supporting the work of the creative designers, are as follows. Firstly, Roget’s has a structure established over more than 150 years, where the words/phrases are grouped and linked by their meaning. Secondly, although Roget’s is organised hierarchically, it also maintains nonhierarchical, associative links between the words and phases in it. These findings are consistent with the observations made by a number of researchers who have analysed Roget’s structure and semantic relations. For example, Jarmasz and Szpakowicz [23] state that ‘‘in WordNet, only nouns are clearly organised in a hierarchy; verbs and adverbs are organised individually into various webs that are difficult to untangle”. Next, as observed by Cassidy [6], Roget’s contains ‘‘semantic relations considered important for linguistic expression, which are not defined in other publicly available semantic networks, such as WordNet”. This type of relation called ‘‘non-classical” is further studied by Morris and Hirst [31]
who emphasise that ‘‘NLP methods and applications need to take account not only of ‘‘classical” lexical relations, as found in WordNet, but the less structural, more context-dependent ‘‘non-classical” relations that readers intuit in text”. Furthermore, as they state, although Roget’s is organised hierarchically, it also has a non-hierarchical, non-classified ‘‘structure” for representing nonclassical relations such as ‘‘homeless” – ‘‘drunk”, ‘‘rain” – ‘‘flood”, ‘‘brutal” – ‘‘terrified”. The current version of OntoRo developed by Tang [46] includes 68,920 unique words and 228,130 entries, which are classified into 990 concepts, 610 head groups, 95 subsections, 39 sections and 6 top level classes. Fig. 4 shows the format of OntoRo (xml) and its hierarchical structure. This particular part of the code shows ‘‘motion” as a subclass of ‘‘space”, ‘‘vehicle on snow” as subclasses of ‘‘vehicle”, etc. Part-of-speech (POS) categories are used as object properties. ‘‘Sledge” is an instance of ‘‘vehicle on snow” and is a noun. 4.3. Illustrative example This example illustrates the operation of the algorithm using a page retrieved from the collection of the Allentown Art Museum in the USA (Fig. 5). The text in the page is: This small Tibetan bronze figure may be identified as a Bodhisattva because of its jewelry, crown, aureole (which surrounds the body), and nimbus. It sits on a lotus in a posture of meditation, with two of its four hands in the namaskaramudra gesture, which indicates prayer or greeting. In another hand is a strand of prayer beads. The last hand is empty, and may originally have held a lotus. The first step involves computing the term frequency #ðt i ; dj Þ for each word in the text. Two words are used twice in this piece of text: ‘‘lotus” and ‘‘prayer”; all other words occur only once. Next, the document frequency #ðti ; DÞ for each word in the text is obtained. The values used in this particular experiment are based on processing three million content pages of the English Wikipedia, the English language edition of the free online encyclopedia [58]. Next, the inverted document frequency (idf) value of each significant term in the text is computed using (3).
widf ¼ log
#D #ðt i ; DÞ
ð3Þ
Table 1 shows that terms like ‘‘aureole”, ‘‘Bodhisattva” and ‘‘posture” have high idf values (5.081670046, 4.055517328 and 3.52349961), which means that their use in the collection is sporadic. Words like ‘‘last” and ‘‘another”, on the other hand, have low idf values as they are frequently used in many contexts. Next, the tf-idf values are calculated using (4).
wtf-idf ðti ; dj Þ ¼ #ðt i ; dj Þ widf
ð4Þ
Then, each term with an tf-idf value above the threshold
e ¼ 1:45 is linked to a number of OntoRo concepts (in this example, only one ontology is used). Table 1 shows that some of the terms in the text are related to many concepts (e.g. the word ‘‘crown” has many meanings: it is related to 24 OntoRo concepts). Other terms with more specific connotations are linked to a single concept (‘‘lotus” is linked to concept #658:remedy while ‘‘Bodhisattva” is related to concept #965:divineness). Most terms however, are related to three or more concepts. The next step involves computing using (2) the weight wck ðdj Þ of each concept Ck. As Table 2 shows, there are many concepts with low weight, linked to only one word with rather low tf-idf. For example, concept #34:superiority with weight 0.09736487 is linked to the term ‘‘crown”, which has an tf-idf value of 2.336756893 and is related to 24 different concepts. Having even
R. Setchi et al. / Advanced Engineering Informatics 25 (2011) 131–146
137
Fig. 3. OpenCyc, WordNet and OntoRo: a comparative example.
two terms related to the same concept may not considerably increase its weight if their tf-idf values are low (see concept #371:humankind linked to the terms ‘‘figure” and ‘‘hand”). On the other hand, a term with a high tf-idf like ‘‘Bodhisattva” associated with one concept only, #965:divineness, substantially contributes to the relatively very high weight of that concept for the
document analysed. As expected, a combination of two or three specific words (‘‘meditation” and ‘‘prayer”; ‘‘crown”, ‘‘lotus” and ‘‘meditation”) produces high values for the associated concepts. Finally, the concepts are ranked according to their weight and those having values above a certain threshold are retained as tags for this document. In this particular example, the image shown in
138
R. Setchi et al. / Advanced Engineering Informatics 25 (2011) 131–146
Fig. 4. OntoRo’s format.
Fig. 5 is tagged with the strongest concepts #658:remedy, #965:divineness, #981:worship and #761:request, which adequately represent its essence. Furthermore, these four tags carry very rich semantics as each of these concepts is semantically represented through a great number of words and phrases, and in addition connected through OntoRo’s semantic links with other ontological concepts.
The benefit of using a concept-based tagging of images is best demonstrated by comparing keyword-based and semantic information retrieval. Suppose a user (a creative designer who is used to abstract terms and looks for inspiration) defines his/her query as ‘‘perfection, purity, spirituality”. A keyword-based search will never return the page shown in Fig. 5 because none of the words in the query occur in the text. However, a semantic-based search
R. Setchi et al. / Advanced Engineering Informatics 25 (2011) 131–146
139
Fig. 5. An example page from the collection of the Allentown Art Museum (http://www.allentownartmuseum.org/collection/coll_indian-10-Bodhisattva.html).
Table 1 An example showing the concepts related to the most representative words in the page shown in Fig. 5. Term ti
Term frequency idf, widf in document, #(ti, dj)
Small
1
1.458018026 1.458018026 12
Bronze Figure
1 1
2.374819886 2.374819886 3 2.150545486 2.150545486 12
Bodhisattva 1 Crown 1
4.055517328 4.055517328 1 2.336756893 2.336756893 24
Aureole Lotus Posture Meditation Hands Gesture
1 2 1 1 1 1
5.081670046 3.298308688 3.52349961 3.253143485 2.186123673 3.291457704
5.081670046 3 6.59661738 1 3.52349961 6 3.253143485 5 2.186123673 2 3.291457704 18
Prayer Greeting Another Hand
2 1 1 1
2.813689576 3.544285246 1.451626378 1.959505032
5.62737915 2 3.544285246 6 1.451626378 3 1.959505032 18
Strand Last
1 1
3.110058335 3.110058335 4 1.464459952 1.464459952 10
Originally Held
1 1
1.634768817 1.634768817 1.52243847 1.52243847
tf-idf, wtf-idf
Number of Related concepts related concepts, Ck (ti)
1 5
33:smallness, 35:inferiority, 53:part, 102:fraction, 132:child, 163:weakness, 196:littleness, 198:contraction, 558:letter, 636:insufficiency, 639:unimportance, 922:contempt 43:mixture, 430:brownness, 554:sculpture 85:number, 86:numeration, 233:outline, 243:form, 371:humankind, 445:appearance, 519:metaphor, 547:indication, 551:representation, 797:money, 809:price, 866:repute 965:divineness 34:superiority, 38:addition, 54:completeness, 213:summit, 228:dressing, 236:limit, 250:circularity, 279:impulse, 310:elevation, 547:indication, 617:intention, 646:perfection, 658:remedy, 725:completion, 729:trophy, 733:authority, 743:regalia, 751:commission, 797:money, 844:ornamentation, 866:repute, 876:celebration, 884:courtesy, 962:reward 250:circularity, 417:light, 866:repute 658:remedy 8:circumstance, 186:situation, 243:form, 445:appearance, 688:conduct, 850:be ridiculous 449:thought, 455:attention, 658:remedy, 979:piety, 981:worship 660:safety, 686:agent 20:imitation, 246:distortion, 265:motion, 279:impulse, 318:agitation, 445:appearance, 524:information, 547:indication, 557:language, 578:voicelessness, 579:speech, 676:action, 688:conduct, 737:command, 818:feeling, 850:be ridiculous, 878:insolence, 988:ritual 761:request, 981:worship 295:arrival, 583:allocution, 836:lamentation, 880:friendship, 882:sociality, 884:courtesy 15:difference, 38:addition, 65:sequence 53:part, 74:assemblage, 117:chronometry, 203:length, 239:laterality, 305:passage, 371:humankind, 378:suffer pain, 467:counterevidence, 547:indication, 586:writing, 628:instrumentality, 676:action, 686:agent, 742:servant, 773:possession, 778:retention, 783:apportionment 208:filament, 234:edge, 259:roughness, 344:land 1:existence, 23:prototype, 69:end, 108:time, 113:long duration, 125:past time, 144:permanence, 602:obstinacy, 715:resistance, 725:completion 68:beginning 632:store, 660:safety, 773:possession, 778:retention, 976:Orthodoxy
would identify this page as relevant to the query due to a process called semantic expansion where the query is semantically enriched using step (iii) of the algorithm described in this paper. In the context of this particular example, the word ‘‘perfection” is linked to 17 OntoRo concepts (see Fig. 3); ‘‘purity” has 11 meanings while ‘‘spirituality” is related to three concepts only. The tf-idf values for the words ‘‘perfection”, ‘‘purity” and ‘‘spirituality” are 2.278121, 2.215722 and 2.2600020. After applying formula (2), the search query ‘‘perfection, purity, spirituality” is
tagged with concepts #933:virtue (its concept weight is 1.08877), #979:piety (concept weight 0.954763), #935:innocence and #950:purity (which both have concept weight 0.335436), and #965:divineness (0.268014). The concept weight for the highest ranked concept #933:virtue has been calculated in the following way:
w#933 ¼ 2:278121
1 1 1 þ 2:215723 þ 2:260002 ¼ 1:08877 17 11 3
140
Table 2 An example showing the weight of the concepts related to the page shown in Fig. 5. Concept Ck
ti, terms related to concept Ck
n, number of terms related to Ck
Wck, weight of concept Ck
Crown
1
0.09736487
Hand
1
0.10886139
Small
1
0.1215015
Last Figure Gesture
1 1 1
0.146446 0.17921212 0.18285876
Figure, hand Held Another Posture Greeting Meditation Strand Bronze Originally Aureole Prayer Bodhisattva
2 1 1 1 1 1 1 1 1 1 1 1
0.28807351 0.30448769 0.48387546 0.58724993 0.59071421 0.6506287 0.77751458 0.79160663 1.63476882 1.69389002 2.81368958 4.05551733
Two terms 53:part 725:completion 797:money 279:impulse 676:action 38:addition 884:courtesy 243:form 688:conduct, 850:be ridiculous 686:agent 660:safety 773:possession, 778:retention 250:circularity 981:worship
Small, hand Crown, last Figure, crown Crown, gesture Gesture, hand Crown, another Crown, greeting Figure, posture Posture, gesture Hands, hand Hands, held Hand, held Crown, aureole Meditation, prayer
2 2 2 2 2 2 2 2 2 2 2 2 2 2
0.23036289 0.24381087 0.27657699 0.28022363 0.29172015 0.58124033 0.68807908 0.76646206 0.7701087 1.20192323 1.39754953 1.39754953 1.79125489 3.46431827
Three terms 547:indication 445:appearance 866:repute 658:remedy
Figure, crown, gesture, hand Figure, posture, gesture Figure, crown, aureole Crown, lotus, meditation
4 3 3 3
0.56829715 0.94932082 1.97046701 7.34461094
R. Setchi et al. / Advanced Engineering Informatics 25 (2011) 131–146
One term 34:superiority, 54:completeness, 213:summit, 228:dressing, 236:limit, 310:elevation, 617:intention, 646:perfection, 729:trophy, 733:authority, 743:regalia, 751:commission, 844:ornamentation, 876:celebration, 962:reward 74:assemblage, 117:chronometry, 203:length, 239:laterality, 305:passage, 378:suffer pain, 467:counterevidence, 586:writing, 628:instrumentality, 742:servant, 783:apportionment 33:smallness, 35:inferiority, 102:fraction, 132:child, 163:weakness, 196:littleness, 198:contraction, 558:letter, 636:insufficiency, 639:unimportance, 922:contempt 1:existence, 23:prototype, 69:end, 108:time, 113:long duration, 125:past time, 144:permanence, 602:obstinacy, 715:resistance 85:number, 86:numeration, 233:outline, 519:metaphor, 551:representation, 809:price 20:imitation, 246:distortion, 265:motion, 318:agitation, 524:information, 578:voicelessness, 579:speech, 557:language, 737:command, 818:feeling, 878:insolence, 988:ritual 371:humankind 632:store, 976:Orthodoxy 15:difference, 65:sequence 8:circumstance, 186:situation 295:arrival, 583:allocution, 836:lamentation, 880:friendship, 882:sociality 449:thought, 455:attention, 979:piety 208:filament, 234:edge, 259:roughness, 344:land 43:mixture, 430:brownness, 554:sculpture 68:beginning 417:light 761:request 965:divineness
R. Setchi et al. / Advanced Engineering Informatics 25 (2011) 131–146
The semantic link in this particular example is the semantic relation between the word ‘‘perfection” and the concept #965:divineness. If the search query were just the word ‘‘perfection”, #965:divineness would have been the concept with the highest weight because the word ‘‘perfection” is linked twice to this concept (see Fig. 3, the word appears in paragraphs 1 and 2 of concept #965:divineness). 4.4. Industrial implementation and evaluation The semantic-based algorithm described in Section 4.1 has been further integrated into a software tool developed within the EUfunded TRENDS project by a multidisciplinary team of designers, software engineers, and psychologists. The project aimed at developing a software tool that supports the inspirational stage of design by providing designers of concept cars with sources of inspiration. The tool was specifically tailored to the needs of Fiat, Italy and Stile Bertone, Italy, two of the industrial collaborators involved in the TRENDS project. The tool provides means for searching images using (i) design-specific elements such as shape, texture and colour; (ii) sectors of influence (e.g. ‘‘architecture”, ‘‘nature”, ‘‘toys” or ‘‘automotive”); (iii) keywords such as ‘‘boat” or ‘‘money”; and (iv) semantic adjectives, for instance ‘‘fresh”, ‘‘aggressive”, ‘‘luxury”, ‘‘comfort” or ‘‘soft”. The TRENDS software (Fig. 6) integrates all these in the search options, which are based on 3 methods of indexing: keywords, concepts, and low-level image features. The TRENDS software uses both OntoRo and the CTA ontology, which contains domain specific vocabulary and semantic relations (value-semantic adjective-functional attribute). The TRENDS collection consists of 1,888,525 images retrieved from more than 3000 web sites indicated by the designers of concept cars as their sources of influence. The images represent 25 sectors. Among them, design is the largest sector followed by automotive, advertisement, aerospace, fashion, architecture and other sectors (see TRENDS, Deliverable 2.8, 2007). Fig. 6 shows the TRENDS software in operation. The spheres represent personal and collaborative spaces with selected images, as well as spaces for viewing the images retrieved. In this particular example, the user has conducted three searches using the word ‘‘perfection”. The first search in all sectors (e.g. the whole collection) has retrieved 2247 images. One of these images is the one shown in Fig. 5. The second search limited to one sector only, sector ‘‘car”, has produced 198 images. The third search has retrieved 641 images from sectors: architecture, photography, nature and luxury. In addition to the word ‘‘perfection” the last search uses an example image and a design-specific element (shape). The figure also includes a slideshow of the images retrieved in the third search. The second screenshot in the figure shows how nine of the retrieved images are used to produce colour, shape and texture palettes. The evaluation of OntoTag involved several stages, described in detail in an extensive report available at the TRENDS web site [34]. The first stage involved examination of the TRENDS collection and manual tagging of 500 pages. The experiments showed problems in the manual evaluations, mainly related to subjectivity and difficulties in defining what is considered relevant when looking for diverse information. The second stage aimed at assessing the operational capability of the ontology tagger. A small collection of web pages was tagged in evaluation mode and checked manually to see if the corresponding output is tagged correctly from a semantic point of view. The results showed that 7 out of 10 pages were tagged with highly relevant concepts. The aim of the next
141
evaluation was to assess whether the information search improves when using semantic queries, by comparing OntoTag with a standard indexing desktop piece of software (Copernic Desktop Search). Using the same web document collection and given the same set of queries, the experiment evaluated how many relevant documents were returned by OntoTag and Copernic. Ten typical concepts from the CTA ontology were selected; they include abstract concepts used by designers (semantic adjectives) as well as abstract concepts relating to emotions. All results were manually examined by two experts. Table 3 shows the concepts used in the queries (e.g. ‘‘original”, ‘‘luxury”, ‘‘passionate”, etc.) and the results produced when using Copernic and OntoTag. Precision was used as a measure of relevance (defined as the fraction of all documents retrieved that are relevant to the user’s query). The results in the table are shown as fractions; the numerator showing the number of all relevant documents retrieved, and the denominator indicating the number of all documents retrieved. The results in Table 3 show the precision achieved by the Copernic and OntoTag algorithms. In 7 cases out of the 10 queries conducted, OntoTag gives equally good or better results. Some of the cases demonstrate high precision: see ‘‘high tech”, ‘‘well-being”, ‘‘luxury”, etc. Moreover, the analysis of the results clearly shows that the high precision of OntoTag is a result of the semantic approach used. For example, the search for the keyword ‘‘aggressive” (Fig. 7, Table 3) does not produce any results with Copernic. In this case, OntoTag returns 3 applicable results out of all 4 results in total (the discarded one is considered ‘‘marginally” relevant). In this experiment, OntoTag considers the space shuttle, Porsche sport car, military personnel and vehicles relevant to the concept ‘‘aggressive”. These web pages do not contain the words ‘‘aggressive”, ‘‘aggression” etc., but the contextual paragraphs include statistically significant information to classify them as related to the concept ‘‘aggressive”. Note that in the same example (Fig. 7), OntoTag produces 4 results but only 3 of them are considered relevant on the basis that the image of the flying helicopter does not invoke such a connotation. However, analysis of the OntoTag results show that although the picture may not be considered particularly of aggressive nature, the text on the page refers to military operations, which explains the tagging. On the other hand, although the text on the page containing the car is not explicitly related to any military themes, the expert considers the picture relevant to the query, and indeed unexpected and rather encouraging result in the context of the TRENDS project. However, as shown in Table 3, there are cases when Copernic retrieves some pages (even though with very low precision) while OntoTag returns a few or no results (see the query ‘‘original” in Table 3). This indicates that although semantic-based searches demonstrate clear advantages, they should be used in combination with keyword-based search. In addition, OntoTag was evaluated through user-centred studies, which validated the performance of the TRENDS software by its end users. Qualitative and quantitative methods were used at each stage. The system was evaluated according to the main aspects identified by both the HCI experts and the TRENDS end-users as critical for the exploitation of a computer-based inspirational support. These relate to:
the the the the
availability of high-quality images, performance of the search algorithms, user-friendliness of the interface design, and functionality of the system.
The evaluation protocol allowed the content of the database and the performance of the search algorithms to be assessed independently from the system interface. Twelve participants from the design departments of Fiat (6) and Stile Bertone (6) took part in these
142
R. Setchi et al. / Advanced Engineering Informatics 25 (2011) 131–146
Fig. 6. TRENDS environment.
experiments. First, each designer was given a design brief and asked to write down the keywords they would use when searching for inspirational images. The design brief was as follows: ‘‘The own-
er of this car will be retired, married, countryside dweller and a dog owner. The customer’s main requirement is that the car is economical and spacious. The car should be easy to get in and out, and easy to
R. Setchi et al. / Advanced Engineering Informatics 25 (2011) 131–146 Table 3 Evaluation results: OntoTag and Copernic compared. Concept search queries
of relevant documents PRECISION ¼ number all documents retrieved
Copernic results
OntoTag results
Adaptable Aggressive High tech Original Luxury Passionate Pleasant Refreshing Soft Well-being
0/0 0/2 3/31 1/12 2/10 0/0 1/1 0/0 3/4 0/6
0/0 3/4 3/3 0/0 1/1 0/0 3/3 2/2 1/1 1/1
load. It should be functional and smart”. Next, the individuals conducting the experiment formed search queries using words suggested by the designers (i.e. ‘‘smart”, ‘‘farm”, ‘‘eco-design”, ‘‘comfort”, etc.). Two different image search engines were used in this experiment: TRENDS and Google image search. The designers were then presented with two sets of 10 pages; each set retrieved using a different search engine. Each page contained 16 images, as shown in Fig. 8a (TRENDS, 2009). The designers were not told which set has been produced by which search engine. They were then asked to consider the design brief, scroll through the images and rate each image in terms of its inspirational value, usefulness, aesthetics, unusualness/serendipity, closeness to the idea they have had in mind, and image quality. In addition, the designers were given the opportunity to conduct random search by browsing images in the collection. Fig. 8a shows the evaluation sheet used by the designers during the tests. It is important to note that although the participants were asked to scroll through sets of 10 pages, only the first 6 pages contained images selected as inspirational by all 12 participants. Fig. 8b shows the mean for the inspirational value of the images included in page 1 and page 6. The TRENDS system seems to produce
143
relatively more inspirational images in the first page and less good images in the sixth page than Google. Overall, the TRENDS system was regarded as a useful tool. Finally, tests with the final prototype were conducted with eight car designers: six from Fiat and two from Stile Bertone who used the system as part of their normal working routine over a period of one month. They provided daily feedback on their experience when using the TRENDS system; in particular the ease of use of its interface, the quality of the database content and search algorithms, and the usefulness of the system functions. These designers used a range of available functions including text-based and content-based retrieval. They raised a number of points for consideration such as the quality of the images, difficulty in finding some elements of the interface, and the slow time of response. The results of the final tests indicate the potential of the TRENDS software for further commercial exploitation, and identify the outstanding issues which need to be addressed. End users considered the TRENDS functionality very useful in the design context. Overall the semantic search was accepted well. 5. Conclusions and future work The semantic-based image retrieval tool tags images by first processing all significant words in the text around them, extracting all keywords and key phrases in it, ranking them according to their significance, and linking them to ontological concepts. It generates a set of concept numbers for each text, which is then used to retrieve information in a process called semantic expansion, where a keyword query is also processed semantically. The challenge addressed was not to rely on machine learning techniques and the availability of a tagged corpus. Instead, the algorithm uses a large lexical ontology and the polysemy of words to compute the weight of all concepts that might be relevant to a particular piece of text and the images next to it. The semantic-based image retrieval tool developed has demonstrated good performance and scalability, and has been integrated
Fig. 7. OntoTag evaluation: semantic search for ‘‘aggressive”.
144
R. Setchi et al. / Advanced Engineering Informatics 25 (2011) 131–146
Fig. 8a. User-centred evaluation of the TRENDS tool for image retrieval: materials and evaluation sheet.
with keyword-based indexing and content retrieval algorithms in an industrial prototype. The concept-based search combined with content-based image retrieval and keyword-based search complements traditional methods by providing images with a degree of diversity and high inspirational value. An outstanding issue in the area of information retrieval, which needs more research, is the lack of well established methodologies,
indexed corpuses for comparative studies, and qualitative measures for evaluating semantic algorithms. The concept indexing algorithm is currently being used as a research tool in several ongoing research projects, which aim to better understand the potential of the semantic technologies on developing applications in support of inspiration, creativity and associative thinking. More research is needed to explore the role
145
R. Setchi et al. / Advanced Engineering Informatics 25 (2011) 131–146
TRENDS
GOOGLE
7
7 6
Total (Means)
Total (Means)
6 5 4 3 2
5 4 3 2
1
1
Page 1
Page 2
Page 3
Page 4
Page 5
Page 6
Page number
Page 1
Page 2
Page 3
Page 4
Page 5
Page 6
Page number
Fig. 8b. User-centred evaluation of the TRENDS tool for image retrieval: results for inspirational value.
of serendipity, diversity and semantic associations on imagination, inspiration and creativity. Acknowledgements The authors wish to acknowledge the financial support of the European Community through their Framework programme. The authors are grateful to their industrial collaborators Stile Bertone, Italy, Fiat, Italy, Pertimm, France and Robotiker, Spain, and also wish to acknowledge the contribution of Mr. E. Fadzli who developed the web interface to OntoRo as part of his ongoing Ph.D. research. References [1] P. Andre, M.C. Schaefel, J. Teevan, S.T. Dumais, Discovery is Never by Chance: Designing for (Un)Serendipity, C&C’09, October 26–30 2009, Berkeley, USA, 2009. [2] P.I. Ansburg, K. Hill, Creative and analytic thinkers differ in their use of attentional resources, Personality and Individual Differences 34 (2003) 1141– 1152. [3] V.R. Benjamins, J. Contreras, M. Blazquez, J.M. Dodero, A. Garcia, E. Navas, F. Hernandez, C. Wer, Cultural heritage and the semantic web, LNCS 3053 (2004) 433–444. [4] C. Bouchard, C. Mougenot, J.F. Omhover, R. Setchi, A. Aoussat. Building a domain ontology for designers, towards a Kansei based ontology, in: Proceedings of the 3rd I*PROMS International Conference, 2–13 July, Cardiff, UK, 2007 pp. 587–592. [5] T. Brown, Design thinking, Harvard Business Review (2008) 1–10. [6] P. Cassidy, An investigation of the semantic relations in the Roget’s thesaurus: preliminary results, in: A. Gelbukh (Ed.), CICLing-2000: Conference on Intelligent Text Processing and Computational Linguistics, February 13–19, Mexico City, Mexico, 2000, pp. 181–204. [7] Y.T. Chong, C.-H. Chen, K.F. Leong, Human-centric product conceptualization using a design space framework, Advanced Engineering Informatics 23 (2009) 149–156. [8] J. Conesa, V.C. Storey, V. Sugumaran, Improving web-query processing through semantic knowledge, Data and Knowledge Engineering 66 (2008) 18–34. [9] O. Corcho, Ontology based document annotation, trends and open research problems, International Journal of Metadata, Semantics and Ontologies 1 (1) (2006) 47–57. [10] I.I. Cox, M.L. Miller, T.P. Minka, T.V. Papathomas, P.N. Yianilos, The Bayesian image retrieval system, PicHunter: theory, implementation, and psychophysical experiments, IEE Transactions on Image Processing 9 (1) (2000) 20–37. [11] R. Datta, D. Joshi, J. Li, J.Z. Wang, Image retrieval: ideas, influences, and trends of the new age, ACM Computing Surveys 40 (2) (2008) 65. [12] S. Dill, N. Eiron, D. Gibson, D. Gruhl, R. Guha, A. Jhingran, K. Kanungo, S. McCurley, S. Rajagopalan, A. Tomkins, J.A. Tomlin, J.Y. Zien, A case for automated large-scale semantic annotation, Web Semantics: Science, Services and Agents on the World Wide Web 1 (1) (2003) 115–132. [13] C.L. Dym, A.M. Agogino, O. Eris, D.D. Frey, L.J. Leifer, Engineering design thinking, teaching and learning, Journal of Engineering Education 94 (1) (2005) 103–120. [14] C. Eckert, M.K. Stacey, Sources of inspiration: a language of design, Design Studies 21 (2000) 99–112. [15] P.G.B. Enser, C.J. Sandom, J.S. Hare, P.H. Lewis, Facing the reality of semantic image retrieval, Journal of Documentation 63 (4) (2007) 465–481.
[16] M. Ferecatu, N. Boujemaa, M. Crucianu, Semantic interactive image retrieval combining visual and conceptual content description, ACM Multimedia Systems Journal 13 (5–6) (2008) 309–322. [17] A. Fink, M. Benedek, R.H. Grabner, B. Staudt, A. Neubauer, Creativity meets neuroscience: experimental tasks for the neuroscientific study of creative thinking, Methods 42 (2007) 68–76. [18] N. Ford, Information retrieval and creativity: towards support for the original thinker, Journal of Documentation 55 (5) (1999) 528–542. [19] Gene, Ontology-Consortium, Creating the Gene Ontology Resource: Design and Implementation, Genome Research 11 (8) (2001), 1425–1433. [20] A. Gliozzo, C. Strapparava, I. Dagan, Unsupervised and supervised exploitation of semantic domains in lexical disambiguation, Computer Speech and Language 18 (3) (2004) 275–299. [21] P. Gomes, N. Seco, F.C. Pereira, P. Paiva, P. Carreiro, J.L. Ferreira, C. Bento, The importance of retrieval in creating design analogies, Knowledge-based Systems 19 (2006) 480–488. [22] E. Hyvönen, A. Styrman, S. Saarela, Ontology-based image retrieval, in: E. Hyvönen, M. Klemettinen (Eds.), Proceedings of the XML Finland 2002 Conference, vol. 16, 2002, pp. 15–27. [23] M. Jarmasz, S. Szpakowicz, Roget’s thesaurus: a lexical resource to treasure, in: Proceedings of the NAACL WordNet and Other Lexical Resources workshop, Pittsburgh, June, 2001, pp. 186–188. [24] J. Köhler, S. Philippi, M. Specht, A. Rüegg, Ontology based text indexing and querying for the semantic web, Knowledge-based Systems 19 (8) (2006) 744– 754. [25] K. Kuroda, M. Hagiwara, An image retrieval system by impression words and specific objects names – IRIS, Neurocomputing 43 (2002) 259–276. [26] I.-C. Lai, T.-W. Chang, A distributed linking system for supporting idea association during the conceptual design stage, Design Studies 27 (2006) 685–710. [27] Y. Liu, D. Zhang, G. Lu, W.-Y. Ma, A survey of content-based image retrieval with high-level semantics, Pattern Recognition 40 (2007) 262–282. [28] D. McDonagh, A. Bruseberg, C. Haslam, Visual product evaluation: exploring users’ emotional relationships with products, Applied Ergonomics 33 (3) (2002) 231–240. [29] D. McDonagh, N. Goggin, J. Squier, Signs, symbols, and subjectivity: an alternative view of the visual, Computers and Composition 22 (1) (2005) 79–86. [30] P. Mello, S. Storari, B. Valli, A knowledge-based system for fashion trend forecasting, LNAI 5027 (2008) 425–434. [31] J. Morris, G. Hirst, Non-classical lexical semantic relations, workshop on computational lexical semantics, in: Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics, 2004, pp. 46–51. [32] C. Mougenot, Towards Stimulating Creativity in Design: A Model of Informational Activities in Early Phases of Product Design Process, Ph.D. Thesis, LCPI, ENSAM, ENAM0042, 2008, p. 193. [33] OntoRo. Available from:
(accessed 26.06.09). [34] OntoTag, Ontology Tagger, TRENDS Report, 12 June 2008. Available from:
(accessed 25.03.09). [35] OpenCYC. Available from: (accessed 25.06.09). [36] A. Popescu, C. Millet, P.-A. Moellic, Ontology Driven Content Based Image Retrieval, CIVR’07, 9–11 July 2007, Amsterdam, The Netherlands, 2007. [37] E. Davidson (Ed.), Roget’s Thesaurus of English Words and Phrases, Penguin, UK, 2003. [38] G. Salton, C. Buckley, Improving retrieval performance by relevance feedback, Journal of the American Society for Information Science 41 (4) (1990) 288– 297. [39] R. Setchi, Q. Tang, C. Bouchard, Ontology-based concept indexing of images, LNAI 5711 (2009) 293–300. [40] R. Setchi, Q. Tang, Concept indexing using ontology and supervised machine learning, Transactions on Engineering, Computing and Technology 19 (2007) 221–226.
146
R. Setchi et al. / Advanced Engineering Informatics 25 (2011) 131–146
[41] R. Setchi, Q. Tang, Semantic-based representation of content using concept indexing, in: Proceedings of the 3rd I*PROMS International Conference, 2–13 July 2007, Cardiff, UK, 2007, pp. 611–618. [42] A W.M. Smeulders, M. Worring, S. Santini, A. Gupta, R. Jain, Content-based image retrieval at the end of the early years, IEEE Transactions on Pattern Analysis and Machine Intelligence 22 (12) (2000) 323–329. [43] M. Song, I.-Y. Song, X. Hu, R.B. Allen, Integration of association rules and ontologies for semantic query expansion, Data and Knowledge Engineering 63 (2007) 63–75. [44] S. Staab, A. Scherp, R. Arndt, R. Troncy, M. Grzegorzek, C. Saathoff, S. Schenk, L. Hardman, Semantic multimedia, LNCS 5224 (2008) 125–170. [45] R.J. Sternberg, The nature of creativity, Creativity Research Journal 18 (1) (2006) 87–98. [46] Q. Tang, Knowledge Management Using Machine Learning, Natural Language Processing and Ontology, Ph.D. Thesis, Trevithick Library, Cardiff University, UK, 2006. [47] T. Taura, Y. Nagai, J. Morita, T. Takeuchi, A study on design creative process focused on concept combination types in comparison with linguistic interpretation process, in: International Conference on Engineering Design (ICED’07), 28–31 August 2007, Paris, France, 2007. [48] E.G. Toms, Serendipitous information retrieval, in: Proceedings of the First DELOS Network of Excellence Workshop on Information Seeking, Searching and Querying in Digital Libraries, Zurich, Switzerland European Research Consortium for Informatics and Mathematics, 2000. [49] M. Tovey, Styling and design: intuition and analysis in industrial design, Design Studies 18 (1997) 5–31.
[50] M. Tovey, S. Portera, R. Newman, Sketching, concept development and automotive design, Design Studies 24 (2) (2003) 135–153. [51] TRENDS. Available from: . [52] C.-F. Tsai, A review of image retrieval methods for digital cultural heritage resources, Online Information Review 31 (2) (2007) 185–198. [53] G. Vijaykumar, A. Chakrabarti, Understanding the knowledge needs of designers during design process in industry, Journal of Computing and Information Science in Engineering 8 (2008) 011004-1–011004-9. [54] W. Wang, A. Zhang, Extracting semantic concepts from images: a decisive feature pattern mining approach, Multimedia Systems 11 (4) (2006) 352–366. [55] A.A. Ward, S.J. McKenna, A. Buruma, P. Taylor, J. Han, Merging technology and users: applying image browsing to the fashion industry for design inspiration, in: Content-based Multimedia Indexing, London, 2008, pp. 288–295. [56] S.J. Westerman, S. Kaur, C. Dukes, J. Blomfield, Creative industrial design and computer-based image retrieval: the role of aesthetics and affect, LNCS 4738 (2007) 618–629. [57] S.J. Westerman, S. Kaur, C. Mougenot, L. Sourbe, C. Bouchard, The impact of computer-based support on product designers’ search for inspirational materials, in: Proceedings of the 3rd I*PROMS International Conference, 2– 13 July 2007, Cardiff, UK, 2007, pp. 581–586. [58] Wikipedia. Available from: (accessed 11.11.09). [59] WordNet. Available from: (accessed 10.11.09). [60] R. Zhang, Z. Zhang, M. Li, W.-Y. Ma, H.-J. Zhang, A probabilistic semantic model for image annotation and multi-modal image retrieval, Multimedia Systems 12 (1) (2006) 27–33.