ScienceDirect ProcediaScienceDirect Computer Science 00 (2018) 000–000
Available online at www.sciencedirect.com
Available online at www.sciencedirect.com Procedia Computer Science 00 (2018) 000–000
ScienceDirect
www.elsevier.com/locate/procedia www.elsevier.com/locate/procedia
Procedia Computer Science 126 (2018) 1387–1394
22nd International Conference on Knowledge-Based and Intelligent Information & Engineering Systems 22nd International Conference on Knowledge-Based and Intelligent Information & Engineering Systems
Conceptual Modeling supported by Text Analysis Conceptual Modeling supported by Text Analysis Yasunobu Kino * Yasunobu Kino *
* 3-29-1 Otsuka, Bunkyo-ku, Tokyo, 112-0012, Japan
Abstract
* 3-29-1 Otsuka, Bunkyo-ku, Tokyo, 112-0012, Japan
Abstract To think of society as a system is important. Conceptual modeling is a helpful method to analyze society as systems.
Flow charts, Entity-Relationship Diagrams and other modeling techniques in software engineering are useful for To think ofand society as a system important. modeling a helpful methodthe to analyze as systems. analyzing designing societyis as systems.Conceptual As we human beingsisthink/understand universesociety and society using Flow Entity-Relationship other in software for naturalcharts, language, the use case andDiagrams narrativeand sorties aremodeling effective techniques ways to design societyengineering as systems. are Thisuseful research analyzing as systems. we human will discussand thedesigning creation ofsociety a conceptual modelAssupported by beings the use think/understand of text analysis. the universe and society using natural language, the use case and narrative sorties are effective ways to design society as systems. This research will discuss the creation of a conceptual model supported by the use of text analysis.
© 2018 The Authors. Published by Elsevier Ltd. © 2018 The Authors. Published by Elsevier Ltd. This is an open access article under the CC BY-NC-ND license (https://creativecommons.org/licenses/by-nc-nd/4.0/) This is an open access article under the CC BY-NC-ND license (https://creativecommons.org/licenses/by-nc-nd/4.0/) © 2018 The Published Elsevier Ltd.of KES International. Selection andAuthors. peer-review underby responsibility Selection and peer-review under responsibility of KES International. This is an open access article under the CC BY-NC-ND license (https://creativecommons.org/licenses/by-nc-nd/4.0/) Selection and under responsibility KES International. Keywords: Textpeer-review Mining, Social System, ER Diagram,ofModeling Language, Grammar; Keywords: Text Mining, Social System, ER Diagram, Modeling Language, Grammar;
1. Introduction
1. Introduction Modeling society as a system is important to create innovations and inventions in today’s technological society. Flowchart, Entity-Relationship Diagram, and other diagramming techniques are useful for describing social matters ModelingTosociety a systemwe is need important to createoninnovations and inventions today’s technological society. as systems. create as diagrams, information the target matter, which isinoften explained or described by Flowchart, Entity-Relationship and other techniques are useful for describing social matters natural language. Researchers Diagram, and engineers usediagramming natural languages, tables, and figures when observing and as systems. To social createphenomena. diagrams, weThose need documents informationdescribed on the target is oftentables, explained or described by understanding usingmatter, naturalwhich languages, and figures help our natural language. Researchers anduniverse engineers natural languages,descriptions tables, andbyfigures observing and understanding of phenomena of the anduse society. Furthermore, natural when language, tables, and understanding social Those documents using natural languages, tables, figures help our figures are useful forphenomena. designing new social matters.described In this research, we will discuss how to and describe conceptual understanding of phenomena of the universe and society. Furthermore, descriptions by natural language, tables, and model supported by the use of text analysis. figures are useful for designing new social matters. In this research, we will discuss how to describe conceptual model supported by the use of text analysis. * Corresponding author. Tel.: +81-3-3942-6927; fax: +81-3-3942-6921. E-mail address:
[email protected] * Corresponding author. Tel.: +81-3-3942-6927; fax: +81-3-3942-6921. E-mail address:
[email protected] 1877-0509 © 2018 The Authors. Published by Elsevier Ltd. This is an open access article under the CC BY-NC-ND license (https://creativecommons.org/licenses/by-nc-nd/4.0/) Selection peer-review under responsibility of KES International. 1877-0509and © 2018 The Authors. Published by Elsevier Ltd. This is an open access article under the CC BY-NC-ND license (https://creativecommons.org/licenses/by-nc-nd/4.0/) Selection and peer-review under responsibility of KES International.
1877-0509 © 2018 The Authors. Published by Elsevier Ltd. This is an open access article under the CC BY-NC-ND license (https://creativecommons.org/licenses/by-nc-nd/4.0/) Selection and peer-review under responsibility of KES International. 10.1016/j.procs.2018.08.090
1388
Yasunobu Kino / Procedia Computer Science 126 (2018) 1387–1394 Author name / Procedia Computer Science 00 (2018) 000–000
2. Modeling techniques There are many modeling techniques that are created using natural languages, such as the KJ method and the Mind map. In addition to the techniques above, this paper will investigate modeling techniques that have been considered and used in software engineering. In order to teach jobs to a computer, we need to develop programming codes that are recognizable to the computer. Human beings can understand natural languages like English or Japanese, but computers cannot be taught to use natural languages since natural languages contain ambiguity. Computers are devices that perform a logical operation process using binary numbers of 0 and 1; therefore human beings have to instruct/teach information logically and structurally, and by entirely eliminating ambiguity. However, the tasks/jobs we intend to instruct/teach computers are usually tasks executed by human beings. Therefore there are no clear instructional documents or manuals that are appropriate for teaching computers. Even humans can sometimes make mistakes with the accuracy level of existing documents created for humans. Thus, it is necessary to describe the content of the instructions logically and precisely to the standards of the computer’s recognition. The tasks will ultimately be taught to the computer, but before that, humans need to describe/develop the programming code using programming language, and the tasks need to be organized and expressed logically. As a method of organizing these tasks, and designing these tasks, various modeling techniques have been developed since around the 1960s. HIPO Human beings and machines work daily, and whether executed by machines or humans, their job can be described as a process. For example, an automotive manufacturer “makes cars”, and in this case, the “making” is an activity. We call this type of job activity as a process or processes, which always has clear inputs and outputs. An automotive manufacturing process is performed with inputs of raw materials, and outputs of products. Every activity in the world can be described with a set of "input - process - output". In SADT (Structured Analysis and Design Technique), in addition to the inputs and outputs, “control” is indicated by an arrow from above, and “mechanism” by an arrow from the bottom. A concept similar to process is “function”, but this paper will not distinguish between the two words. Each process is composed of a more detailed series of processes. For example, an automotive assembly process is composed of many sub-processes such as installing the engine in the body, attaching seats, and fixing doors. In this way, a series of processes can be gathered and described as one higher level of process, which is a process with a hierarchical structure. A process structure with a hierarchy is called HIPO (Hierarchical InputProcess-Output). In many cases, an output of a process will be an input of the next process, and these processes are chained. For this reason, a series of processes are often defined as a process. To summarize, HIPO has a process of inputs and outputs and a concept of hierarchical structure. It can also be said that HIPO has a conception of time as inputs and outputs exist in relation to the time axis. Flowchart Flowcharts are described by the aspect of process, centered along a time axis. Each process has a clear input and output. Here, the process has a similar meaning to function. The first flowchart was introduced in 1921 by Frank Gilbreth in "Process Charts, First Steps in Finding the One Best Way”. A flow is described using processes, decisions, and other information set along a time axis. A typical flowchart aimed to design software does not include the concept of space/location, but in the case of business process flowcharts, the flowchart is described with information on who or which organization will execute the process. A process is drawn as a square, and a diamond represents a decision. Fig.1 shows a simplified flowchart. In conclusion, a flowchart is developed using concepts of process and decision, and a clear time axis. In addition, processes have a hierarchical structure as described in HIPO. Structured programming From the 1960s, Dijkstra and other researchers have proposed a concept of structured programming, which develops program source code using three patterns: sequence, selection, and repetition. This kind of coding style helps source code maintainability. This structured programming means that everything in our daily activities can be described by only using these three patterns.
Yasunobu Kino / Procedia Computer Science 126 (2018) 1387–1394 Author name / Procedia Computer Science 00 (2018) 000–000
1389
State diagram A State diagram is described by focusing on the state of an entity, and how that entity transitions from an entity to another entity. A state is represented by a circle, and a transition is drawn with an arrow. A transition is called an event. Fig.2 expresses the state and transition of a simple telephone with only the basic function of calling. Here, three states are assumed: off state, standby state, and calling state. During the off state, if there is an event of turning on the power, it transitions to the standby state. In a standby state, the event of making a phone call or receiving a call will transition the phone to a calling state. As described, when an event occurs in a particular state, the status follows the arrow to the next state. The State diagram shows the possible states of a single entity - the phone, and how those states are transferred by events. Events have a concept of time, but the timing of events may occur randomly and need not be regular. It can be summarized that the State diagram has a concept of time but the time axis is not obvious compared to a flowchart, and the total time is expressed as a compressed form in the diagram. Entity-Relationship Diagram ERD (Entity-Relationship Diagram) was developed by Peter Chen in 1975. This diagram is described by using different entities and their relationship. Fig.3 shows a simplified diagram used for design relational database. The Class chart of UML (Unified Modeling Language) is regarded as an expanded diagram of ERD. Data Flow Diagram With the growth and development of computer technologies, data size and data items processed by computers have increased year by year. In addition, there have been maintenance issues such as the computer system duplicating same data items and storing them in the same database. As a result, Data Flow Diagram was introduced which focuses on data centered design rather than process. There are variations of Data Flow Diagrams. In a flowchart, the process is the center of the focus and data tends to be expressed as subordinates. However, a Data Flow Diagram centers data flow. Data Flow Diagram has a data-centric concept and also the concept of hierarchy. Regarding the time axis, Data Flow Diagram is a static model with time expressed in compressed form. Start
Turn on
Call
Process
Stand‐ by
Off Decision Yes
Calling
Entity
Relationship
Entity
No
Turn off
Break
End
Fig. 1. Simplified Flow chart
Fig. 2. Simplified State diagram
Fig. 3. Simplified ERD
Total image of modeling diagram By examining different modeling diagrams, we can identify some essential concepts that will be useful when we will execute text analysis. Fig.4 shows those crucial concepts. Entity is shown as a noun, and the State belongs to an entity. Relationship is shown as a verb. Other important concepts are the space (xyz axis) and time axis. Process is a change of state, attribute or relationship. (Noun) (Verb)
Entity
Relationship
State
Attribute
xyz + time axis
Fig. 4. Total image of modeling diagram
Yasunobu Kino / Procedia Computer Science 126 (2018) 1387–1394 Author name / Procedia Computer Science 00 (2018) 000–000
1390
3. The process of Conceptual Modeling In this section, we will discuss the process of conceptual modeling for the society using text analysis. The process of conceptual modeling of this research is shown as Fig.5.
START (1) Create original text Interview, Questionnaire survey, Existing articles
Continue 1
Original Text
(4) Develop Static Model (2) Create input text
Connect each Concepts or Actors as Relationship
Input Text
(3) Execute Text Analysis Frequent Keywords
(5) Develop Dynamic Model Co‐ occurrence Network
Read the Sentence
Array each Process/Functions
Continue 1
END
Fig. 5. Process of Conceptual Modeling (1) The first step is to create text, which will be the input data for text analysis. Ways to create input text are, for example by interviewing, questionnaire surveys, or from existing documents. In the case of interviews, the number of interviewees will influence research quality. Generally for this type of research, the appropriate number of interviewees is from 5 to 20. Interviews are recorded on a voice recorder under the permission of the interviewee. After the interview, the results are transcribed into text. In other cases, usage of free comments of questionnaire surveys or articles documents is permitted. (2) Next, an input data is made from an original text, as original text may use inappropriate character code such as characters dependent on special machines, inappropriate CRLF code, or may lack periods. Also, the original text will be corrected to a style that will be analyzed by morphological tools. The style of the text is dependent on morphological tools. (3) The third step is the execution of text analysis. Checking frequent keywords is useful for understanding the meanings of the document. When checking frequent keywords, it is also important to check parts of the speech in order to create a conceptual model in the following steps. Of course there are many irregular uses of words in a language and it is difficult to formulate exact rules, however, we can find the following patterns: (a) Nouns will be an entity or concept of the model. (b) Verbs will be a key of relationship or process/function of the model. Note that the Japanese “Sa-hen Noun (Sa-hen verb)” will act as a verb in English. (c) Adjectives will indicate the inside attribute of an entity or concept. Also, it is crucial to find all actors in frequent keywords as it
Yasunobu Kino / Procedia Computer Science 126 (2018) 1387–1394 Author name / Procedia Computer Science 00 (2018) 000–000
1391
will help to create a conceptual model. Co-occurrence network will help with our understanding of concepts. Finding meaningful cluster in a co-occurrence network is essential, as the cluster may become a concept for the future model. The cluster in a co-occurrence network will be identified in a table. Additionally, the quality of identified concepts is vital: therefore it is important to read the sentences in which the keywords were included. (4) In the fourth step, a static model is developed. The static model of this conceptual modeling is referred from the Entity-Relationship Diagram and Class Diagram of UML (Unified Modeling Language). Entity-Relationship Diagram and Class Diagram are useful when we design an information system. This static conceptual model is a simplified version of the Entity-Relationship Diagram and Class Diagram. This is because our target is to create a conceptual model, and not to write a computer program. If we wanted to implement a real information system, we can create an Entity-Relationship Diagram or Class Diagram from our static model. To create a static model, all actors have to be identified. Actors of this conceptual modeling include teams or organizations and not only individuals. It is important to distinguish an organization and an individual when we draw a static model. (5) In the final step, a dynamic model is developed if needed. Events or processes are discovered and set along the time axis. A set of static models and dynamic models becomes a conceptual model. 4. Sample case for conceptual modeling In this section, conceptual modeling is explained with a sample case. 4.1. Input data In this sample case, free text answers from questionnaire surveys[1] were used. The questionnaire survey was conducted from April 24 to May 16 in 2015. The aim of this survey was to understand the opinions of Japanese high school students on how to become a person who will succeed globally. The target students of this survey were students of advanced educational high schools nominated by the government to enroll in a global leadership development program called SGH (Super Global High School). There were 1,911 effective answers from 73 high schools in Japan with the students consisting of 40.2 percent male and 59.8 percent female. In this analysis, the free text answer was set in the last question. The question was, "What kind of skills/abilities would you need in order to succeed globally in the future? To acquire those skills, what kind of education would you like to receive?". 1,756 effective answers were used in this text analysis. The following are some examples of answers. - English speaking ability. - To be able to explain about my country's culture and to understand other countries' cultures and differences. - I wish there would be a class where you can discuss international issues and make presentations. - The ability to understand/respect each other's values/mindset. The ability to communicate well. - The ability to detect issues in my own country and to be able to analyze and create solutions. KH Coder[2] version 2.00e was used as the text analysis tool, and this study refers to researches[3,4,5,6] using text analysis techniques. The following are the analytical steps in this analysis. (1) Translation: The first step was to translate original into English. "The Honyaku Professional V15" was used as the translation tool, and in this sample case, the translation was executed only with this tool. It is difficult to comprehend translated text when relying solely on machine translation, but we believe that the quality in this case was passable for text analysis, because we identified keywords from sentences after translation. (2) Extract Keywords: In the second step, we extracted keywords through "KH coder" and by using "Stanford POS Tagger". 17 stop keywords like A, D, and E, were filtered since those words were meaningless in this analysis.
Yasunobu Kino / Procedia Computer Science 126 (2018) 1387–1394 Author name / Procedia Computer Science 00 (2018) 000–000
1392
(3) Check Frequent Keywords: In the third step, we picked up some important keywords and candidate keywords that were indicating actors of this theme. Details are described in section 4.2. (4) Check Co-occurrence Network: In the fourth step, we identified some concepts in the co-occurrence network. Details are described in section 4.3. (5) Create Static Model: Finally, we created a model that connected identified concepts and important keywords. Details are described in section 4.4. 4.2. Frequent Keywords Table 1 shows the top 25 frequent keywords divided into categories of Noun, Proper Noun, Adjective, Adverb, and Verb. Table 1. Frequent Keywords Noun
Proper Noun
Adjective
Adverb
Verb
capability
788
English
1002
foreign
570
positively
234
think
country
720
Communication
420
various
436
globally
138
require
1871 824
language
544
Japan
132
important
376
just
116
like
533
power
532
Education
92
english
299
actually
115
speak
410
education
529
ALT
24
active
197
abroad
87
receive
382
culture
477
First
23
able
153
overseas
86
understand
314
person
453
SGH
13
japanese
144
firmly
81
study
282
opinion
440
Chinese
10
good
141
especially
40
make
278
skill
412
Japanese
10
linguistic
138
correctly
31
consider
276
lesson
406
United
10
necessary
123
clearly
30
know
248
people
405
Positiveness
9
overseas
121
mutually
28
carry
237
thing
398
Proficiency
9
global
109
exactly
27
use
207
order
375
Society
9
different
87
flexibly
27
play
195
communication
338
Testing
9
high
76
intelligibly
27
want
188
partner
286
States
8
present
66
deeply
25
say
183
opportunity
264
Educational
7
international
60
merely
21
talk
181
ability
262
Lesson
6
indispensable
58
usually
19
tell
169
world
248
Vocabulary
5
large
57
appropriately
18
learn
165
school
219
Christmas
4
educational
50
fully
16
come
149
command
210
Exchange
4
possible
48
smoothly
16
acquire
136
idea
180
Internet
4
better
42
nearly
15
increase
111
problem
177
TOEIC
4
certain
39
directly
14
perform
107
foreigner
176
Acquisition
3
interested
38
probably
14
hear
105
study
169
Asia
3
mutual
37
currently
13
regard
104
purpose
166
Britain
3
positive
36
fluently
13
accept
103
Yasunobu Kino / Procedia Computer Science 126 (2018) 1387–1394 Author name / Procedia Computer Science 00 (2018) 000–000
1393
We checked each keyword in the following steps: - We picked up on some interesting keywords from the frequent keyword in Table 1. For nouns, the keywords were example, capability, language, culture, opinion, and so on. - Next, we picked keywords which may be considered as actors in this theme. For nouns, keywords such as person, people, partner, and foreigner were picked. Proper noun keywords were ALT (Assistant Language Teacher), Chinese, Japanese, Society, and Asia. Keywords that can be considered as actors tend to be nouns and proper nouns. - We then checked essential verbs that describe actions or a relationship. For example, words such as think, receive, understand, and know. 4.3. Co-occurrence Network Next we checked the co-occurrence network in Fig.3. From this figure, we could identify the following concepts and keywords. (a) (b) (c) (d) (e)
ability to make presentation: make, person ability to talk and hear: hear, talk ability to speak English: speak, ability, language, English communication skill: communication, skill, require understand culture, and understand partner's opinion: understand, culture, country, partner, opinion
(e)
(d) (a)
(c)
(b)
Fig. 3. Co-occurence Network
Yasunobu Kino / Procedia Computer Science 126 (2018) 1387–1394 Author name / Procedia Computer Science 00 (2018) 000–000
1394
4.4. Conceptual Static Model According to frequent keywords and co-occurrence network, we created the model seen in Fig.4. In Fig.4, concepts of (1) and (2) indicate actors, and concepts of (3), (4), and (5) indicate skill or ability which students would like to improve on. Details of Fig.4 is as follows: (1) Important actor as an individual (I) or an individual belonging to an organization, for example, Japanese or Society. (2) Another type of actor as a partner or the partner's organization, for example, Chinese, Asia, and so on. Those actors are identified in section 4.2. In table 1, the word ALT (Assistant Language Teacher) appears as an actor candidate. However, since ALT acts as a teacher, it does not belong to the two organizations shown in Fig.4, therefore it was not included in this model. (3) English communication skill. This concept connects with (c) and (d) in section 4.3. (4) Presentation skill. This concept connects with (a) in section 4.3. (5) Hearing and understanding skill. This concept connects with (b) and (e) in section 4.3. (1) Organization in belongings
Organization in belongings
(4)Presentation Skill (3) English Communication
I
Individual
・Culture
(2)
(5) Hearing ability
・Opinion
Partner ・Opinion
・Culture
Fig. 4. Sample of Conceptual Model 5. Conclusion Conceptual models are relevant for humans to understand natural phenomena such as social behavior. When we create a beautiful model, the model will help our understanding of phenomena. Models will also be useful when we design new things. In this research, we created a static conceptual model and used text analysis. When we created a conceptual model, we referred to the way of modeling techniques of software engineering. This time, the original text was in Japanese, and we translated into English through machine translation. Although English and Japanese come from different language families, text analysis method tends to overcome this problem in the meaning of creating a conceptual model. This research samples only one case, and we should test with other cases to correct the conceptual modeling process. References [1] Yasunobu Kino (2016) "Skills for Global Human Resources and Structure Recognized by High School Students." Oukan 10 (2): 116-123. [2] Koichi Higuchi (2914) "Syakai Cyosa no tameno Keiryo Text Bunseki", Nakanishiya syuppan. [3] Daisuke Toyoshima, Yasunobu Kino, et al. (2016) "Extraction and Trend Analysis of the Word "Risk" in Journals and Proceedings." Journal of the Society of Project Management 18 (4): 20-25. [4] Kiyoshi Sakamori, Kiyomi Miyoshi (2016) "Outlook of the program design for Project Management Professional School: Based on Text Mining Application." Bulletin of Advanced Institute of Industrial Technology (10): 79-84. [5] Hiromi Asano, Koji Tanaka, Yoshikatsu Fujita, Kazuhiko Tsuda (2015) "Study on hiring decision: analyzing rejected applications by mining individual job placement data of Public Employment Security Offices." Procedia Computer Science 60: 1156-1163. [6] Yasunobu Kino, Hiroshi Kuroki, Tomomi Machida, Norio Furuya, Kanako Takano (2017) "Text Analysis for Job Matching Quality Improvement." Procedia Computer Science 112: 1523-1530.