Conceptual Modeling supported by Text Analysis

Conceptual Modeling supported by Text Analysis

ScienceDirect ProcediaScienceDirect Computer Science 00 (2018) 000–000 Available online at www.sciencedirect.com Available online at www.sciencedire...

680KB Sizes 0 Downloads 29 Views

ScienceDirect ProcediaScienceDirect Computer Science 00 (2018) 000–000

Available online at www.sciencedirect.com

Available online at www.sciencedirect.com Procedia Computer Science 00 (2018) 000–000

ScienceDirect

www.elsevier.com/locate/procedia www.elsevier.com/locate/procedia

Procedia Computer Science 126 (2018) 1387–1394

22nd International Conference on Knowledge-Based and Intelligent Information & Engineering Systems 22nd International Conference on Knowledge-Based and Intelligent Information & Engineering Systems

Conceptual Modeling supported by Text Analysis Conceptual Modeling supported by Text Analysis Yasunobu Kino * Yasunobu Kino *

* 3-29-1 Otsuka, Bunkyo-ku, Tokyo, 112-0012, Japan

Abstract

* 3-29-1 Otsuka, Bunkyo-ku, Tokyo, 112-0012, Japan

Abstract To think of society as a system is important. Conceptual modeling is a helpful method to analyze society as systems.

Flow charts, Entity-Relationship Diagrams and other modeling techniques in software engineering are useful for To think ofand society as a system important. modeling a helpful methodthe to analyze as systems. analyzing designing societyis as systems.Conceptual As we human beingsisthink/understand universesociety and society using Flow Entity-Relationship other in software for naturalcharts, language, the use case andDiagrams narrativeand sorties aremodeling effective techniques ways to design societyengineering as systems. are Thisuseful research analyzing as systems. we human will discussand thedesigning creation ofsociety a conceptual modelAssupported by beings the use think/understand of text analysis. the universe and society using natural language, the use case and narrative sorties are effective ways to design society as systems. This research will discuss the creation of a conceptual model supported by the use of text analysis.

© 2018 The Authors. Published by Elsevier Ltd. © 2018 The Authors. Published by Elsevier Ltd. This is an open access article under the CC BY-NC-ND license (https://creativecommons.org/licenses/by-nc-nd/4.0/) This is an open access article under the CC BY-NC-ND license (https://creativecommons.org/licenses/by-nc-nd/4.0/) © 2018 The Published Elsevier Ltd.of KES International. Selection andAuthors. peer-review underby responsibility Selection and peer-review under responsibility of KES International. This is an open access article under the CC BY-NC-ND license (https://creativecommons.org/licenses/by-nc-nd/4.0/) Selection and under responsibility KES International. Keywords: Textpeer-review Mining, Social System, ER Diagram,ofModeling Language, Grammar; Keywords: Text Mining, Social System, ER Diagram, Modeling Language, Grammar;

1. Introduction

1. Introduction Modeling society as a system is important to create innovations and inventions in today’s technological society. Flowchart, Entity-Relationship Diagram, and other diagramming techniques are useful for describing social matters ModelingTosociety a systemwe is need important to createoninnovations and inventions today’s technological society. as systems. create as diagrams, information the target matter, which isinoften explained or described by Flowchart, Entity-Relationship and other techniques are useful for describing social matters natural language. Researchers Diagram, and engineers usediagramming natural languages, tables, and figures when observing and as systems. To social createphenomena. diagrams, weThose need documents informationdescribed on the target is oftentables, explained or described by understanding usingmatter, naturalwhich languages, and figures help our natural language. Researchers anduniverse engineers natural languages,descriptions tables, andbyfigures observing and understanding of phenomena of the anduse society. Furthermore, natural when language, tables, and understanding social Those documents using natural languages, tables, figures help our figures are useful forphenomena. designing new social matters.described In this research, we will discuss how to and describe conceptual understanding of phenomena of the universe and society. Furthermore, descriptions by natural language, tables, and model supported by the use of text analysis. figures are useful for designing new social matters. In this research, we will discuss how to describe conceptual model supported by the use of text analysis. * Corresponding author. Tel.: +81-3-3942-6927; fax: +81-3-3942-6921. E-mail address: [email protected] * Corresponding author. Tel.: +81-3-3942-6927; fax: +81-3-3942-6921. E-mail address: [email protected] 1877-0509 © 2018 The Authors. Published by Elsevier Ltd. This is an open access article under the CC BY-NC-ND license (https://creativecommons.org/licenses/by-nc-nd/4.0/) Selection peer-review under responsibility of KES International. 1877-0509and © 2018 The Authors. Published by Elsevier Ltd. This is an open access article under the CC BY-NC-ND license (https://creativecommons.org/licenses/by-nc-nd/4.0/) Selection and peer-review under responsibility of KES International.

1877-0509 © 2018 The Authors. Published by Elsevier Ltd. This is an open access article under the CC BY-NC-ND license (https://creativecommons.org/licenses/by-nc-nd/4.0/) Selection and peer-review under responsibility of KES International. 10.1016/j.procs.2018.08.090

1388

Yasunobu Kino / Procedia Computer Science 126 (2018) 1387–1394 Author name / Procedia Computer Science 00 (2018) 000–000

2. Modeling techniques There are many modeling techniques that are created using natural languages, such as the KJ method and the Mind map. In addition to the techniques above, this paper will investigate modeling techniques that have been considered and used in software engineering. In order to teach jobs to a computer, we need to develop programming codes that are recognizable to the computer. Human beings can understand natural languages like English or Japanese, but computers cannot be taught to use natural languages since natural languages contain ambiguity. Computers are devices that perform a logical operation process using binary numbers of 0 and 1; therefore human beings have to instruct/teach information logically and structurally, and by entirely eliminating ambiguity. However, the tasks/jobs we intend to instruct/teach computers are usually tasks executed by human beings. Therefore there are no clear instructional documents or manuals that are appropriate for teaching computers. Even humans can sometimes make mistakes with the accuracy level of existing documents created for humans. Thus, it is necessary to describe the content of the instructions logically and precisely to the standards of the computer’s recognition. The tasks will ultimately be taught to the computer, but before that, humans need to describe/develop the programming code using programming language, and the tasks need to be organized and expressed logically. As a method of organizing these tasks, and designing these tasks, various modeling techniques have been developed since around the 1960s. HIPO Human beings and machines work daily, and whether executed by machines or humans, their job can be described as a process. For example, an automotive manufacturer “makes cars”, and in this case, the “making” is an activity. We call this type of job activity as a process or processes, which always has clear inputs and outputs. An automotive manufacturing process is performed with inputs of raw materials, and outputs of products. Every activity in the world can be described with a set of "input - process - output". In SADT (Structured Analysis and Design Technique), in addition to the inputs and outputs, “control” is indicated by an arrow from above, and “mechanism” by an arrow from the bottom. A concept similar to process is “function”, but this paper will not distinguish between the two words. Each process is composed of a more detailed series of processes. For example, an automotive assembly process is composed of many sub-processes such as installing the engine in the body, attaching seats, and fixing doors. In this way, a series of processes can be gathered and described as one higher level of process, which is a process with a hierarchical structure. A process structure with a hierarchy is called HIPO (Hierarchical InputProcess-Output). In many cases, an output of a process will be an input of the next process, and these processes are chained. For this reason, a series of processes are often defined as a process. To summarize, HIPO has a process of inputs and outputs and a concept of hierarchical structure. It can also be said that HIPO has a conception of time as inputs and outputs exist in relation to the time axis. Flowchart Flowcharts are described by the aspect of process, centered along a time axis. Each process has a clear input and output. Here, the process has a similar meaning to function. The first flowchart was introduced in 1921 by Frank Gilbreth in "Process Charts, First Steps in Finding the One Best Way”. A flow is described using processes, decisions, and other information set along a time axis. A typical flowchart aimed to design software does not include the concept of space/location, but in the case of business process flowcharts, the flowchart is described with information on who or which organization will execute the process. A process is drawn as a square, and a diamond represents a decision. Fig.1 shows a simplified flowchart. In conclusion, a flowchart is developed using concepts of process and decision, and a clear time axis. In addition, processes have a hierarchical structure as described in HIPO. Structured programming From the 1960s, Dijkstra and other researchers have proposed a concept of structured programming, which develops program source code using three patterns: sequence, selection, and repetition. This kind of coding style helps source code maintainability. This structured programming means that everything in our daily activities can be described by only using these three patterns.



Yasunobu Kino / Procedia Computer Science 126 (2018) 1387–1394 Author name / Procedia Computer Science 00 (2018) 000–000

1389

State diagram A State diagram is described by focusing on the state of an entity, and how that entity transitions from an entity to another entity. A state is represented by a circle, and a transition is drawn with an arrow. A transition is called an event. Fig.2 expresses the state and transition of a simple telephone with only the basic function of calling. Here, three states are assumed: off state, standby state, and calling state. During the off state, if there is an event of turning on the power, it transitions to the standby state. In a standby state, the event of making a phone call or receiving a call will transition the phone to a calling state. As described, when an event occurs in a particular state, the status follows the arrow to the next state. The State diagram shows the possible states of a single entity - the phone, and how those states are transferred by events. Events have a concept of time, but the timing of events may occur randomly and need not be regular. It can be summarized that the State diagram has a concept of time but the time axis is not obvious compared to a flowchart, and the total time is expressed as a compressed form in the diagram. Entity-Relationship Diagram ERD (Entity-Relationship Diagram) was developed by Peter Chen in 1975. This diagram is described by using different entities and their relationship. Fig.3 shows a simplified diagram used for design relational database. The Class chart of UML (Unified Modeling Language) is regarded as an expanded diagram of ERD. Data Flow Diagram With the growth and development of computer technologies, data size and data items processed by computers have increased year by year. In addition, there have been maintenance issues such as the computer system duplicating same data items and storing them in the same database. As a result, Data Flow Diagram was introduced which focuses on data centered design rather than process. There are variations of Data Flow Diagrams. In a flowchart, the process is the center of the focus and data tends to be expressed as subordinates. However, a Data Flow Diagram centers data flow. Data Flow Diagram has a data-centric concept and also the concept of hierarchy. Regarding the time axis, Data Flow Diagram is a static model with time expressed in compressed form. Start

Turn on

Call

Process

Stand‐ by

Off Decision Yes

Calling

Entity

Relationship

Entity

No

Turn off

Break

End

Fig. 1. Simplified Flow chart

Fig. 2. Simplified State diagram

Fig. 3. Simplified ERD

Total image of modeling diagram By examining different modeling diagrams, we can identify some essential concepts that will be useful when we will execute text analysis. Fig.4 shows those crucial concepts. Entity is shown as a noun, and the State belongs to an entity. Relationship is shown as a verb. Other important concepts are the space (xyz axis) and time axis. Process is a change of state, attribute or relationship. (Noun) (Verb)

Entity

Relationship

State

Attribute

xyz + time axis

Fig. 4. Total image of modeling diagram

Yasunobu Kino / Procedia Computer Science 126 (2018) 1387–1394 Author name / Procedia Computer Science 00 (2018) 000–000

1390

3. The process of Conceptual Modeling In this section, we will discuss the process of conceptual modeling for the society using text analysis. The process of conceptual modeling of this research is shown as Fig.5.

START (1) Create original text Interview, Questionnaire  survey, Existing articles

Continue 1

Original Text

(4) Develop Static Model (2) Create input text

Connect each Concepts or Actors as  Relationship

Input Text

(3) Execute Text Analysis Frequent Keywords

(5) Develop Dynamic Model Co‐ occurrence  Network

Read the  Sentence

Array each Process/Functions

Continue 1

END

Fig. 5. Process of Conceptual Modeling (1) The first step is to create text, which will be the input data for text analysis. Ways to create input text are, for example by interviewing, questionnaire surveys, or from existing documents. In the case of interviews, the number of interviewees will influence research quality. Generally for this type of research, the appropriate number of interviewees is from 5 to 20. Interviews are recorded on a voice recorder under the permission of the interviewee. After the interview, the results are transcribed into text. In other cases, usage of free comments of questionnaire surveys or articles documents is permitted. (2) Next, an input data is made from an original text, as original text may use inappropriate character code such as characters dependent on special machines, inappropriate CRLF code, or may lack periods. Also, the original text will be corrected to a style that will be analyzed by morphological tools. The style of the text is dependent on morphological tools. (3) The third step is the execution of text analysis. Checking frequent keywords is useful for understanding the meanings of the document. When checking frequent keywords, it is also important to check parts of the speech in order to create a conceptual model in the following steps. Of course there are many irregular uses of words in a language and it is difficult to formulate exact rules, however, we can find the following patterns: (a) Nouns will be an entity or concept of the model. (b) Verbs will be a key of relationship or process/function of the model. Note that the Japanese “Sa-hen Noun (Sa-hen verb)” will act as a verb in English. (c) Adjectives will indicate the inside attribute of an entity or concept. Also, it is crucial to find all actors in frequent keywords as it



Yasunobu Kino / Procedia Computer Science 126 (2018) 1387–1394 Author name / Procedia Computer Science 00 (2018) 000–000

1391

will help to create a conceptual model. Co-occurrence network will help with our understanding of concepts. Finding meaningful cluster in a co-occurrence network is essential, as the cluster may become a concept for the future model. The cluster in a co-occurrence network will be identified in a table. Additionally, the quality of identified concepts is vital: therefore it is important to read the sentences in which the keywords were included. (4) In the fourth step, a static model is developed. The static model of this conceptual modeling is referred from the Entity-Relationship Diagram and Class Diagram of UML (Unified Modeling Language). Entity-Relationship Diagram and Class Diagram are useful when we design an information system. This static conceptual model is a simplified version of the Entity-Relationship Diagram and Class Diagram. This is because our target is to create a conceptual model, and not to write a computer program. If we wanted to implement a real information system, we can create an Entity-Relationship Diagram or Class Diagram from our static model. To create a static model, all actors have to be identified. Actors of this conceptual modeling include teams or organizations and not only individuals. It is important to distinguish an organization and an individual when we draw a static model. (5) In the final step, a dynamic model is developed if needed. Events or processes are discovered and set along the time axis. A set of static models and dynamic models becomes a conceptual model. 4. Sample case for conceptual modeling In this section, conceptual modeling is explained with a sample case. 4.1. Input data In this sample case, free text answers from questionnaire surveys[1] were used. The questionnaire survey was conducted from April 24 to May 16 in 2015. The aim of this survey was to understand the opinions of Japanese high school students on how to become a person who will succeed globally. The target students of this survey were students of advanced educational high schools nominated by the government to enroll in a global leadership development program called SGH (Super Global High School). There were 1,911 effective answers from 73 high schools in Japan with the students consisting of 40.2 percent male and 59.8 percent female. In this analysis, the free text answer was set in the last question. The question was, "What kind of skills/abilities would you need in order to succeed globally in the future? To acquire those skills, what kind of education would you like to receive?". 1,756 effective answers were used in this text analysis. The following are some examples of answers. - English speaking ability. - To be able to explain about my country's culture and to understand other countries' cultures and differences. - I wish there would be a class where you can discuss international issues and make presentations. - The ability to understand/respect each other's values/mindset. The ability to communicate well. - The ability to detect issues in my own country and to be able to analyze and create solutions. KH Coder[2] version 2.00e was used as the text analysis tool, and this study refers to researches[3,4,5,6] using text analysis techniques. The following are the analytical steps in this analysis. (1) Translation: The first step was to translate original into English. "The Honyaku Professional V15" was used as the translation tool, and in this sample case, the translation was executed only with this tool. It is difficult to comprehend translated text when relying solely on machine translation, but we believe that the quality in this case was passable for text analysis, because we identified keywords from sentences after translation. (2) Extract Keywords: In the second step, we extracted keywords through "KH coder" and by using "Stanford POS Tagger". 17 stop keywords like A, D, and E, were filtered since those words were meaningless in this analysis.

Yasunobu Kino / Procedia Computer Science 126 (2018) 1387–1394 Author name / Procedia Computer Science 00 (2018) 000–000

1392

(3) Check Frequent Keywords: In the third step, we picked up some important keywords and candidate keywords that were indicating actors of this theme. Details are described in section 4.2. (4) Check Co-occurrence Network: In the fourth step, we identified some concepts in the co-occurrence network. Details are described in section 4.3. (5) Create Static Model: Finally, we created a model that connected identified concepts and important keywords. Details are described in section 4.4. 4.2. Frequent Keywords Table 1 shows the top 25 frequent keywords divided into categories of Noun, Proper Noun, Adjective, Adverb, and Verb. Table 1. Frequent Keywords Noun

Proper Noun

Adjective

Adverb

Verb

capability

788

English

1002

foreign

570

positively

234

think

country

720

Communication

420

various

436

globally

138

require

1871 824

language

544

Japan

132

important

376

just

116

like

533

power

532

Education

92

english

299

actually

115

speak

410

education

529

ALT

24

active

197

abroad

87

receive

382

culture

477

First

23

able

153

overseas

86

understand

314

person

453

SGH

13

japanese

144

firmly

81

study

282

opinion

440

Chinese

10

good

141

especially

40

make

278

skill

412

Japanese

10

linguistic

138

correctly

31

consider

276

lesson

406

United

10

necessary

123

clearly

30

know

248

people

405

Positiveness

9

overseas

121

mutually

28

carry

237

thing

398

Proficiency

9

global

109

exactly

27

use

207

order

375

Society

9

different

87

flexibly

27

play

195

communication

338

Testing

9

high

76

intelligibly

27

want

188

partner

286

States

8

present

66

deeply

25

say

183

opportunity

264

Educational

7

international

60

merely

21

talk

181

ability

262

Lesson

6

indispensable

58

usually

19

tell

169

world

248

Vocabulary

5

large

57

appropriately

18

learn

165

school

219

Christmas

4

educational

50

fully

16

come

149

command

210

Exchange

4

possible

48

smoothly

16

acquire

136

idea

180

Internet

4

better

42

nearly

15

increase

111

problem

177

TOEIC

4

certain

39

directly

14

perform

107

foreigner

176

Acquisition

3

interested

38

probably

14

hear

105

study

169

Asia

3

mutual

37

currently

13

regard

104

purpose

166

Britain

3

positive

36

fluently

13

accept

103



Yasunobu Kino / Procedia Computer Science 126 (2018) 1387–1394 Author name / Procedia Computer Science 00 (2018) 000–000

1393

We checked each keyword in the following steps: - We picked up on some interesting keywords from the frequent keyword in Table 1. For nouns, the keywords were example, capability, language, culture, opinion, and so on. - Next, we picked keywords which may be considered as actors in this theme. For nouns, keywords such as person, people, partner, and foreigner were picked. Proper noun keywords were ALT (Assistant Language Teacher), Chinese, Japanese, Society, and Asia. Keywords that can be considered as actors tend to be nouns and proper nouns. - We then checked essential verbs that describe actions or a relationship. For example, words such as think, receive, understand, and know. 4.3. Co-occurrence Network Next we checked the co-occurrence network in Fig.3. From this figure, we could identify the following concepts and keywords. (a) (b) (c) (d) (e)

ability to make presentation: make, person ability to talk and hear: hear, talk ability to speak English: speak, ability, language, English communication skill: communication, skill, require understand culture, and understand partner's opinion: understand, culture, country, partner, opinion

(e)

(d) (a)

(c)

(b)

Fig. 3. Co-occurence Network

Yasunobu Kino / Procedia Computer Science 126 (2018) 1387–1394 Author name / Procedia Computer Science 00 (2018) 000–000

1394

4.4. Conceptual Static Model According to frequent keywords and co-occurrence network, we created the model seen in Fig.4. In Fig.4, concepts of (1) and (2) indicate actors, and concepts of (3), (4), and (5) indicate skill or ability which students would like to improve on. Details of Fig.4 is as follows: (1) Important actor as an individual (I) or an individual belonging to an organization, for example, Japanese or Society. (2) Another type of actor as a partner or the partner's organization, for example, Chinese, Asia, and so on. Those actors are identified in section 4.2. In table 1, the word ALT (Assistant Language Teacher) appears as an actor candidate. However, since ALT acts as a teacher, it does not belong to the two organizations shown in Fig.4, therefore it was not included in this model. (3) English communication skill. This concept connects with (c) and (d) in section 4.3. (4) Presentation skill. This concept connects with (a) in section 4.3. (5) Hearing and understanding skill. This concept connects with (b) and (e) in section 4.3. (1) Organization in belongings

Organization in belongings

(4)Presentation Skill (3) English Communication



Individual

・Culture

(2)

(5) Hearing ability

・Opinion

Partner ・Opinion

・Culture

Fig. 4. Sample of Conceptual Model 5. Conclusion Conceptual models are relevant for humans to understand natural phenomena such as social behavior. When we create a beautiful model, the model will help our understanding of phenomena. Models will also be useful when we design new things. In this research, we created a static conceptual model and used text analysis. When we created a conceptual model, we referred to the way of modeling techniques of software engineering. This time, the original text was in Japanese, and we translated into English through machine translation. Although English and Japanese come from different language families, text analysis method tends to overcome this problem in the meaning of creating a conceptual model. This research samples only one case, and we should test with other cases to correct the conceptual modeling process. References [1] Yasunobu Kino (2016) "Skills for Global Human Resources and Structure Recognized by High School Students." Oukan 10 (2): 116-123. [2] Koichi Higuchi (2914) "Syakai Cyosa no tameno Keiryo Text Bunseki", Nakanishiya syuppan. [3] Daisuke Toyoshima, Yasunobu Kino, et al. (2016) "Extraction and Trend Analysis of the Word "Risk" in Journals and Proceedings." Journal of the Society of Project Management 18 (4): 20-25. [4] Kiyoshi Sakamori, Kiyomi Miyoshi (2016) "Outlook of the program design for Project Management Professional School: Based on Text Mining Application." Bulletin of Advanced Institute of Industrial Technology (10): 79-84. [5] Hiromi Asano, Koji Tanaka, Yoshikatsu Fujita, Kazuhiko Tsuda (2015) "Study on hiring decision: analyzing rejected applications by mining individual job placement data of Public Employment Security Offices." Procedia Computer Science 60: 1156-1163. [6] Yasunobu Kino, Hiroshi Kuroki, Tomomi Machida, Norio Furuya, Kanako Takano (2017) "Text Analysis for Job Matching Quality Improvement." Procedia Computer Science 112: 1523-1530.