Available online at www.sciencedirect.com
Int. J. Human-Computer Studies 71 (2013) 166–170 www.elsevier.com/locate/ijhcs
Information is not knowledge, knowledge is not wisdom, wisdom is not truth V.Richard Benjamins Telefonica Digital, Madrid, Spain Received 25 September 2012; accepted 25 September 2012 Available online 16 October 2012
Abstract In this contribution to the special issue of the International Journal of Human-Computer Studies on Knowledge Acquisition I will give a view on the evolution of concepts related to knowledge during the last 25 years and will briefly look into the future. The concepts include knowledge acquisition, knowledge engineering, knowledge management, knowledge level, knowledge retrieval, knowledge modeling, knowledge protection, knowledge retention, knowledge deletion and knowledge privacy. This contribution is a reflection on the theme Knowledge Acquisition based on my experience from 1987 when I started working in this area. & 2012 Elsevier Ltd. All rights reserved. Keywords: Knowledge Acquisition; Innovation; Knowledge
‘‘Information is not knowledge. knowledge is not wisdom. wisdom is not truth. truth is not beauty. beauty is not love. love is not music. music is the best.’’—Frank Zappa, 1979
1. Introduction Frank Zappa wrote the above quote for his rock opera Joe’s Garage in 1979, almost a decade before the first Knowledge Acquisition Workshop in Banff. If the quote would have started with ‘‘Data is not Information’’, then he could have submitted it to that workshop. Having read this quote again in the context of writing this note, I asked myself how many songs refer to knowledge? (We know most songs are about lovey). Browsing a bit through iTunes, I found more than 10,000 songs with I know in the title and over 5000 with knowledge in the title. When I asked the question answering engine Wolfram Alpha1, ‘‘How many songs have ‘‘I know’’ in the title?’’, the answer was 13, not really convincing. If I ask the same question to Google, there is no result that makes sense. 1
E-mail address:
[email protected] http://www.wolframalpha.com/
1071-5819/$ - see front matter & 2012 Elsevier Ltd. All rights reserved. http://dx.doi.org/10.1016/j.ijhcs.2012.10.005
Let’s now have a simplified look at the role of knowledge in our society. As a proxy I use Google Web hits. The more hits a term has in Google, the longer it exists, we can assume, and/or the more popular it is. The Web and the Knowledge Acquisition Community both started in the late eighties/early nineties. Typing knowledge into Google results in 1.3 billion hits. That seems quite a lot, but let’s see knowledge in relation to other important concepts in life. This is illustrated in Fig. 1. As we can appreciate, knowledge is a popular concept, more popular than politics and much more popular than crisis and Messi. Not surprisingly, concepts such as life/live, love, and sex are more popular than knowledge. Surprisingly, music is nr 2 on this list with 9.5 billion hits (something Apple has understood very well). Let’s now zoom in on knowledge and look at concepts in the same space as knowledge acquisition, such as knowledge engineering, knowledge management, knowledge level, knowledge modeling, knowledge retrieval, knowledge (actually data) protection, knowledge retention, knowledge deletion and knowledge privacy. There are about 1.5 M hits in Google for ‘‘knowledge acquisition,’’ and Google Books knows of 38,600 books. Figs. 2–5 show the popularity of the other concepts relative
V.Richard Benjamins / Int. J. Human-Computer Studies 71 (2013) 166–170
to knowledge acquisition. It is split in two sets of figures due to the difference in the order of magnitude in hits for both sets. The concepts in Figs. 2 and 3 clearly stand out compared to the others in Figs. 4 and 5. The commonality in patterns for Google Web hits and Google Books shows that there is some consistency regarding popularity. The main differences might be explained because Web hits are almost instantaneous and quickly reflect hot topics, whereas books require some time to appear. This probably explains the relative high number of hits for data protection on the web compared to the relative low number in books. The same is probably true for data privacy. Knowledge Management and Data Protection have each 36 M (million) hits on the Web, and also several hundred thousands of books. Then there is Knowledge Retrieval (combination of Information Retrieval, Knowledge Retrieval and Question Answering) and Data Privacy with each between 8 and 9 M hits. Curiously, the number of books on Data Privacy is much smaller (73,000) than the number of books on Information Retrieval, probably because Data Privacy is a much more recent concept. Moreover, Information Retrieval has been a popular topic in research and education for many years, which might explain the high number of books compared to the other terms. Of course, Data Protection and Data Privacy are related concepts: data needs to be protected for legal and regulatory reasons, and is a prerequisite for privacy. Privacy is a rather ambiguous concept and is much more recent, having gained enormous importance since the rise of Social Networks, such as Facebook, Twitter, etc. A last concept that stands out with 3.5 M Web hits is Data Retention (the sum of knowledge retention and data retention). There are 20 15 10 5 0
15.7 9.5
8.6 3.3
3
1.3
1.1
0.58
‘‘only’’ 63,000 books on this subject, which reflects its recency. At a second level there are concepts like knowledge acquisition, knowledge engineering, knowledge level, data deletion (including also the term ‘‘the right to be forgotten’’). My interpretation is that knowledge engineering, knowledge level and knowledge acquisition reflect activity in the scientific community (‘‘our’’ community), but knowledge acquisition is also used in the context of education. What does stand out is the number of books referring to knowledge level. This is probably due to high productivity in our community in the past, but also because of its ‘‘normal’’ semantics, which refers to how much people know (e.g., ‘‘The female college freshman’s knowledge level of reproduction and contraception’’) about something. Data deletion and the right to be forgotten are hot terms nowadays with the European Commission trying to protect consumers’ rights to privacy. From a pragmatic perspective it is however unclear how to enforce ‘‘the right to be forgotten’’ in the current mobile, socially networked digital world. Finally, there are the concepts knowledge updating (including non-monotonic reasoning) and knowledge modeling which have a limited number of hits. There is, however, a fair amount of books on those topics, which is probably due to activity in the scientific community. A high level interpretation of all those numbers is that information societies and knowledge economies care about finding knowledge and managing it. Once the knowledge is there, they want to protect it, care about their privacy (which might be threatened by it) and worry about retention of this knowledge/data when it sits in systems that are beyond their control (the Cloud!) In the following sections, I will briefly discuss the knowledge concepts and their importance with respect to past, today and future. 1.1. Knowledge acquisition (KA)
0.23
Fig. 1. Number of Google Web hits for some concepts important in life.
In the early days of Artificial Intelligence (AI), KA referred to the manual extraction of knowledge from experts, using techniques such as interviews, participative observation, think aloud protocols, card sort, laddering
Google web hits in billions 40 30 20 10 0 Knowledge management
Knowledge retrieval
167
Data protection
Data retention
Data privacy
Fig. 2. Number of hits in Google for the most popular concepts related to knowledge.
V.Richard Benjamins / Int. J. Human-Computer Studies 71 (2013) 166–170
168
Google Books in thousands 3000 2500 2000 1500 1000 500 0 Knowledge management
Knowledge retrieval
Data protection
Data retention
Data privacy
Fig. 3. Number of hits in Google Books for most popular knowledge concepts.
Google Web hits 1,865,240
2,000,000 1,490,000
1,520,000
1,500,000
1,160,000
1,000,000 500,000
114,100
186,400
Knowledge Acquisition
Knowledge Knowledge Knowledge Engineering Level Updating
Knowledge Modeling
Knowledge Deletion
Fig. 4. Google Web hits for less popular knowledge concepts.
Google Books 120,000 100,000 80,000 60,000 40,000 20,000 -
102,000 38,600
Knowledge Acquisition
25,100
19,600
Knowledge Knowledge Engineering Level
Knowledge Updating
16,430
10,273
Knowledge Modeling
Knowledge Deletion
Fig. 5. Google Books hits of less popular knowledge concepts.
tools, etc. The main idea was that knowledge could be extracted, represented and turned into a conceptual or computational model using for example the KADS methodology (Wielinga and Breuker, 1984). Another term often used was Knowledge Modeling. Later developments also tried to acquire knowledge from first principles, and still later from textbooks. During mid 2000 there were several projects, which aimed to enable domain experts to selfmodel their knowledge using intelligent tools. HPKB (Cohen et al., 1998) and Project Halo2 are examples of this. Nowadays KA still refers to the original idea, but also includes extraction of knowledge from text and from Web sources. Opinion mining or sentiment analysis is a kind of knowledge acquisition. It acquires actionable knowledge about what people think and say about products, services, companies, people, etc. Another example that shows the importance of ‘‘modern’’ KA is the fact that teachers have 2
http://www.projecthalo.com/
programs at their disposal to check whether students have copied their assignments from the Internet. Recently, a former PhD student of mine found a PhD thesis at a university in Mexico that was a complete copy of his PhD thesis. 1.2. Knowledge engineering (KE) KE is very related to KA but gives it a more formal and methodological focus, like software engineering. People working in the field of knowledge modeling and KE are/ were often referred to as knowledge engineers. 1.3. Knowledge management (KM) KM is a very popular topic mostly due to the management sciences, and especially made popular by Peter Drucker3. 3
http://en.wikipedia.org/wiki/Peter_Drucker
V.Richard Benjamins / Int. J. Human-Computer Studies 71 (2013) 166–170
In our community we have ‘‘piggybacked’’ on that looking more at the technical aspect (acquiring organizational knowledge, modeling and storing it, and retrieving/distributing it). Knowledge management is still an important concept, but also very wide and it is used to cover (too) many things to be useful. 1.4. Knowledge level The knowledge level was introduced by Allen Newell (Newell, 1982) as the level above the symbol level. That is, a description of knowledge expressed in a language more abstract than a programming language. The term has been quite relevant in our community as we were trying to understand human reasoning and designing knowledgebased systems that could be implemented in different programming language. As evidenced by the many hits and books, it has been a very popular concept (in the 90ies). But I am not sure whether it is still in use today. 1.5. Knowledge retrieval Today information and knowledge retrieval are almost synonyms to Google Search, and many researchers of our community have investigated ways to outperform Google, especially in the early days when Google, apart from its patented pagerank algorithm, was mostly using key-word based retrieval. Our community was very quick to take-up the notion of the Semantic Web (Berners-Lee, et al., 2001) to go beyond simple keyword retrieval and retrieve knowledge (semantics) or real answers to questions. Several semantic search engines have seen the light during the last 10 years, but none has made an impact coming even close to Google. Google, over the past 10 years, have incorporated a variety of additional technologies in their search engine (some of which are considered part of Semantic Web technology), and is now able to give answers to some questions. Probably one of the best works so far on question answering is IBM’s Watson4 that is a champion on answering factual questions. Indeed, Watson won the popular Jeopardy quiz in the US. Another relevant initiative is Wolfram Alpha, which is more concerned with answering non-factual questions, i.e. questions that involve some kind of reasoning to come up with an answer. Wolfram Alpha has generated some noise in the press, but it is still far away from wide take-up such as Google, even though it is used as one of the underlying technologies in Bing and Siri. Apple’s Siri5 (of our ‘‘own’’ Tom Gruber) seems to base its question answering on a pragmatic mixture between intelligent facts retrieval and reasoning. 1.6. Knowledge updating, deletion, retention, privacy Knowledge updating, deletion, retention, and privacy are related concepts, and the success of Google and of 4 5
http://www.ibmwatson.com/ http://www.apple.com/ios/siri/
169
social networks (and web2.0 in general) has given these areas much more importance than they had 10 or 15 years ago. Google’s search engine crawls the whole Web for information. And Facebook gets billions of user-generated content items per month. When information grows so fast, updating knowledge becomes infeasible; you get basically the whole history of the information. Deletion and retention form another challenging problem in today’s information society. Many people and organizations have already suffered the consequences as politicians, companies, royal families, and active Facebook users are faced with ‘‘Personal Embarrassing Information—PEI’’ (as opposed to Personal Identifiable Information—PII). And obviously, PEI and PII are at the heart of privacy in the digital world. 1.7. Data protection I used the tem data protection because knowledge protection usually refers to IPR protection such as patents. As the amount of data, information and knowledge is growing at an enormous speed, and more and more is stored ‘‘in the Cloud’’, data protection becomes a key enabler for our modern society. Data, Information and Knowledge drive modern society, but, by the same token, that same data, information and knowledge are also attackable by people and organization with dubious intentions. 2. Reflections Operationalized through Google Web hits, we have seen that acquiring knowledge, managing, retrieving and protecting it are key aspects of our modern knowledge economy and information society. Often I wonder how much the thought leadership of our Knowledge Community 25 years ago has contributed to this, and is credited for that. If we count the number of companies that use our intellectual property, it seems not very high. Much of what has been going on in the Semantic Web originated in our community. But whether ‘‘our’’ Semantics is going mainstream today is debatable, and in my opinion, an important discussion to have after 25 years of work. I might, of course, have missed other relevant indicators of our influence. 2.1. The future? Coming back to Frank Zappa’s quote, knowledge acquisition has only worked on the first part: (Data is not information) Information is not Knowledge. But the quote goes on: Knowledge is not Wisdom. Is there any Artificial Intelligence work trying to capture, represent and reason about wisdom? And then Wisdom is not truth. Who is working on truth acquisition and truth management (and I am not referring to logical truth maintenance systems, 120,000 Web hits, and 35,200 books)? Is this something for the future? And then the quote goes on: Truth is not Beauty. Can Artificial Intelligence techniques recognize and reason about beauty? I can imagine image processing techniques
170
V.Richard Benjamins / Int. J. Human-Computer Studies 71 (2013) 166–170
automatically detecting beauty from images (photos, paintings, etc.) and there is published work on this topic (Gunes and Piccardi, 2006). Beauty is not Love. Do we understand love enough to represent and reason with it form an AI point of view? There are certainly many matchmaking web sites that try to couple partners to share their lives (love?). And finally, Love is not Music, Music is the best. As I showed in the beginning of this note, music is the 2nd most popular term of the terms I looked at. It is therefore not surprising that Music is also a popular topic for Artificial Intelligence researchers, evidenced by 15 million Web hits, 824,000 books for the query ‘‘ ]Artificial Intelligence^ music ’’. Acknowledgment I would like to thank Enrico Motta for his improvements on an earlier version of this note.
References Berners-Lee, T., Hendler, J., Lassila, O., 2001. The semantic web. Scientific American, 34–43 May 2001. Cohen, P., Schrag, R., Jones, E., Pease, A., Lin, A., Starr, B., Gunning, D., Burke, M., 1998. DARPA high-performance knowledge bases project. AI Magazine 19, 4. Gunes, Hatice, Piccardi, Massimo, 2006. Assessing facial beauty through proportion analysis by image processing and supervised learning. International Journal of Human-Computer Studies 64 (12), 1184–1199 December 2006. Newell, A., 1982. The knowledge level. Artificial Intelligence 18 (1), 87–127. Wielinga, B.J., Breuker, J.A., 1984. In: Proceedings of the ECAI, Interpretation of verbal data for knowledge acquisition. pp. 3–12.