The development of a data collection and analysis system based on social network users’ data

The development of a data collection and analysis system based on social network users’ data

ScienceDirect ProcediaScienceDirect Computer Science 00 (2019) 000–000 Available online at www.sciencedirect.com Available online at www.sciencedire...

1MB Sizes 12 Downloads 56 Views

ScienceDirect ProcediaScienceDirect Computer Science 00 (2019) 000–000

Available online at www.sciencedirect.com

Available online at www.sciencedirect.com

www.elsevier.com/locate/procedia

ScienceDirect

www.elsevier.com/locate/procedia

Procedia Computer Science 00 (2019) 000–000

Procedia Computer Science 156 (2019) 194–203

8th International Young Scientist Conference on Computational Science 8th International Young Scientist Conference on Computational Science The development of a data collection and analysis system based on users’ data The development of social a data network collection and analysis system based on social network dataSobolevskyb, a a,* Lada Rudikowa , Oleg Myslivecusers’ , Stanislav a b c a,* c Lada Rudikowa , Oleg Nenko Myslivec , , Ilia, Stanislav SavenkovSobolevsky Alexandra c c Kupala State University of Grodno, Belarus , Ilia Savenkov Alexandra bNewNenko York University, USA

aYanka

Abstract

c

Saint-Petersburg National Research University of Information aYanka Kupala State UniversityTechnology, of Grodno,Mechanics Belarus And Optics (ITMO), Russia

c

Saint-Petersburg National Research University of Information Technology, Mechanics And Optics (ITMO), Russia

bNew

York University, USA

The general concept and implementation of a practice-oriented social network data storage and analysis system are discussed in Abstract this paper. The need for such a concept comes from the fact that at the moment little attention is paid to the internal structure of such general systems. The proposed system organizes the process of collecting data, filling the database from system third-party sources and The concept and implementation of a practice-oriented social network data storage and analysis are discussed in analyzing main problem of gathering data from different sources is to also Due to the this paper.the Thecollected need fordata. suchThe a concept comes from the fact fragmented that at the moment little attention is paid theaddressed. internal structure of variety of possible formats, approaches for designing efficientdata, datafilling warehouse for further social networksources user’s data such systems. The information proposed system organizes the process of collecting the database from third-party and analysis this The paper. Considered approach forfragmented design and data implementation of the data iscollection subsystem be analyzingare thediscussed collectedindata. main problem of gathering from different sources also addressed. Duecan to the used forofstoring user’s data withformats, the current trends infor thedesigning development of social of data confidentiality in variety possible information approaches efficient data networks. warehouseThe for problem further social network user’s data such systems is also discussed. An example is provided forfor thedesign implementation of the mainofparts of thecollection system. subsystem can be analysis are discussed in this paper. Considered approach and implementation the data used for storing user’s data with the current trends in the development of social networks. The problem of data confidentiality in such systems is also discussed. An example is provided for the implementation of the main parts of the system. © 2019 The Authors. Published by Elsevier Ltd. © 2019 The Authors. by Elsevier Ltd. This is an open accessPublished article under the CC BY-NC-ND license https://creativecommons.org/licenses/by-nc-nd/4.0/) This is an open access article under the CC BY-NC-ND license (https://creativecommons.org/licenses/by-nc-nd/4.0/) Peer-review under responsibility of Elsevier the scientific committee of the 8th International Young Scientist Conference on Computational © 2019 The Authors. Published by Ltd. Peer-review under responsibility of the scientific committee of the 8th International Young Scientist Conference on Computational Science This is an open access article under the CC BY-NC-ND license https://creativecommons.org/licenses/by-nc-nd/4.0/) Science. Peer-review under responsibility of the scientific committee of the 8th International Young Scientist Conference on Computational Keywords: social network data, OLTP-system, socio-economic migration, data model, general architecture, Boyer-Moore algorithm; clientScience server; analysis subsystem;

Keywords: social network data, OLTP-system, socio-economic migration, data model, general architecture, Boyer-Moore algorithm; clientserver; analysis subsystem; * Corresponding author. Tel.: +37533 6882079. E-mail address: [email protected] * Corresponding author. Tel.: +37533 6882079. E-mail address: [email protected]

1877-0509 © 2019 The Authors. Published by Elsevier Ltd. This is an open access article under the CC BY-NC-ND license https://creativecommons.org/licenses/by-nc-nd/4.0/) Peer-review©under the scientific committee 1877-0509 2019responsibility The Authors. of Published by Elsevier Ltd. of the 8th International Young Scientist Conference on Computational Science This is an open access article under the CC BY-NC-ND license https://creativecommons.org/licenses/by-nc-nd/4.0/) Peer-review under responsibility of the scientific committee of the 8th International Young Scientist Conference on Computational Science

1877-0509 © 2019 The Authors. Published by Elsevier Ltd. This is an open access article under the CC BY-NC-ND license (https://creativecommons.org/licenses/by-nc-nd/4.0/) Peer-review under responsibility of the scientific committee of the 8th International Young Scientist Conference on Computational Science. 10.1016/j.procs.2019.08.195

2

Lada Rudikowa et al. / Procedia Computer Science 156 (2019) 194–203 195 Lada Rudikowa, Oleg Myslivec, Stanislav Sobolevsky, Alexandra Nenko, Ilia Savenkov / Procedia Computer Science 00 (2019) 000–000

1. Introduction and literature review Currently, it seems relevant to develop a general concept and implement a system for collecting and analyzing social community data on the Internet, which can be considered in terms of creating a single data repository that allows analyzing the data of social network users according to the necessary requirements [1, 2]. Research and generalization of those subject areas is quite a relevant area of research [3, 4]. Currently, an increasing number of researchers are using big data to evaluate numerous aspects of human activity, including human relationships in different social networks. For example, data sets of this kind are used for regional differentiation at various scales [5], land use classification [6], optimization of transportation [7] and transport studies [8], calculation of socio-economic indicators of urban neighborhoods [9], study of tourism behavior [10], human migration [11, 12], etc. It should also be noted that a huge amount of information is accumulating in the world, a lot of methods, algorithms and methodologies for analyzing data have been developed [13, 14], there is a sufficient amount of necessary software, technologies and methods for formalizing data arrays, which allow to structure the data for specific research directions. In connection with the foregoing, the analysis of social networking data is currently quite a promising direction with a wide scope of application of the results obtained in the process of analysis. Firstly, the data of social user profiles has found its application in the field of modeling the propagation of the information wave in social networks. Secondly, campaigning and marketing companies of various kinds rely quite heavily on the data from users of social networks: data of this kind make it possible to tailor campaigns more effectively. Thirdly, the most common field of application of these social networks is the identification of consumer preferences. Despite the fact that a huge amount of information accumulates around the world, these data are often fragmented and are far from providing a general overview on social life. Publicly available data sources, social networks in our case, are not always easy to process, due to the lack of a software interface and specific data storage formats. These restrictions more often do not allow using available information when carrying out the necessary data analysis. Frequent changes in data format and provision policies remain a major problem. In addition, the issue of visualization of the stored data in a form accessible for exploration by a third-party is quite relevant and is no less important than data analysis itself. Another problem with information gathered from various social networks and Internet resources is brand-new laws in the field of personal data processing. These laws directly impact on every aspects of processing user’s social networks data. Problem of personal data leakage from social networks are the most common problem when speaking about personal data security [23]. Considering new requirements for data processing we should also keep in mind that many social network users do not want their data to be collected by some third-party software applications or companies. No one wants to know that their data stored insecurely. Thus, designing complex system for social network data analysis requires novel approaches nowadays. The development of a general concept of building systems for social network user’s data analysis is necessary because of the fact, that research in this field is mostly devoted to the process of direct data analysis, as if the data were obtained by researchers in advance [15, 16, 17, 18]. Some basic information about full-stack systems of data collection and analysis is presented by Alibaba Cloud [19], Netflix [20] and Coca-Cola Company [21]. All facts about their big data analysis systems are accompanied by information about the technologies, frameworks and software they used. It is difficult to find information about internal structure of such systems: how their systems process users’ data, how effectively do they store unstructured data from different sources, how their systems authenticate and authorize users etc. All the considered systems receive information from various sources, and the developed system focuses on processing efficient data handling from various social networks taking into account the fact that all social networks are developing and users can use data in completely different formats in the future. One of the features of the developed system is an attempt to store all the user data (both available at the moment, and all sorts of formats that may appear in the future) in the most general form. Lack of this specific information about internal structure of social data analysis systems requires an in-depth study of all aspects of building systems of this kind. This paper will consider the development of a common approach to building a system for collecting and analyzing data exclusively for social networks. To process statistical data of social networks' users, a software package should be able to collect all the relevant social information, securely accumulate

196

Lada Rudikowa et al. / Procedia Computer Science 156 (2019) 194–203 Lada Rudikowa, Oleg Myslivec, Stanislav Sobolevsky, Alexandra Nenko, Ilia Savenkov / Procedia Computer Science 00 (2019) 000–000 3

it in a database and provide forecasting models, e.g. predicting preferences of different users based on their social network activity or visualizing trends in social culture evolvement in a certain context. 2. The main requirements for the system processing and visualizing social networks data The proposed system is based on an information storage technology. It is fulfilling several requirements – an extensible complex domain, data integrity (for data which come from various sources), persistence of the stored data over time with mandatory labels, high data stability, optimal solutions in data redundancy, modularity of individual system units, flexible and extensible architecture, strict security requirements for the stored data. Figure 1 shows a simplified scheme of the system for collecting and analyzing data from online communities.

Fig. 1. General implementation architecture

The basic concept of the proposed system for collecting and analyzing data from social communities on the Internet is based on data warehousing technology. The development of the system is based on a wide range of users, that is, the functionality of this system will be used by a wide range of people to solve various tasks. During the process, the capabilities of the system will grow, and it will affect its resource intensity. The principle of modularity should be considered in the developed system. At least 4 modules are assumed: 1. Module for receiving and storing information. 2. Data analysis module. 3. System administration module.

4

Lada Rudikowa et al. / Procedia Computer Science 156 (2019) 194–203 197 Lada Rudikowa, Oleg Myslivec, Stanislav Sobolevsky, Alexandra Nenko, Ilia Savenkov / Procedia Computer Science 00 (2019) 000–000

4. A module that provides users with an interface to work with the system. 3. The overall architecture for storage and processing social networks data Figure 2 shows the developed architecture for a system for collecting and analyzing data from social networks. The proposed information system has a modular client-server architecture, where each module performs its own function and its scaling doesn’t affect other modules. Collection of precise information about users is an important stage in the workflow, however, the development and design of such subsystem raise many questions. An object-oriented approach and structural methodology for building software systems are used to develop modules of the system which secure collection of accurate data.

Fig. 2. Structure of information extraction and storage modules

4. Social media data usage Data obtained from social networks are used in many study areas, from identifying consumer preferences to the analysis of social relations and competitive intelligence. Social networks data can tell much more about the hidden preferences of the individual than the voluntarily shared information in her profile. However, these data are poorly structured [9]. This means that a problem of correct data processing arrives into the storage system. To overcome this barrier, we use ETL-process, which is the process of extracting (retrieving) information from OLTP-systems (databases), then its further conversion into the format of the stored data and loading into the database [13]. Data is collected from different social networks, where it can be presented in many formats and with a different information load, therefore, the use of off-the-shelf software is not possible. The development of one's own software solution is the best preferred option. When developing this subsystem other limitations of social networks on the availability of data should be considered, namely, privacy settings. If a user hides his personal data, the subsystem receives it only partly, what affects the final results. Possible ways to solve this problem may be the following. Firstly, direct analysis of the page with the user profile (requires the development of additional functionality and extremely difficult to

198

Lada Rudikowa et al. / Procedia Computer Science 156 (2019) 194–203 Lada Rudikowa, Oleg Myslivec, Stanislav Sobolevsky, Alexandra Nenko, Ilia Savenkov / Procedia Computer Science 00 (2019) 000–000 5

implement because of the growing number of user profiles in various social networks). Secondly, using data collected for a certain period of time, which can show the dynamics of changes in user preferences. We have applied the second method because privacy restrictions do not allow manual page parsing and working with API. 5. Centralized data storage from various social networks As mentioned above, social networks are characterized by a insufficient data structure. How to store data from different social networks in one database and not to affect the efficiency of the entire system? Some of the social networks focus on messaging (Twitter), some on the placement of large texts by users (LiveJournal), but most social networks present the possibility of aggregating data of different formats (Vkontakte, Facebook). Centralized storage of data from social networks is an issue, which we also tried to address during the development of a subsystem for collecting and storing social networks data [14]. Figure 3 shows a conceptual database model, which will store effectively the data from various social networks. Our data collection and storage subsystem allows one to store the following information about users: basic user information, list of social networks which the user is registered in (Table Connections), education information, places of work. An important factor is the ability to store all the communities related to the user in all social networks [13, 23]. The main table is a table called Posts where all materials of different formats published by users are stored (text, video, photos, etc.). The system provides the ability to track hashtags.

Fig. 3. Conceptual database model

6

Lada Rudikowa et al. / Procedia Computer Science 156 (2019) 194–203 199 Lada Rudikowa, Oleg Myslivec, Stanislav Sobolevsky, Alexandra Nenko, Ilia Savenkov / Procedia Computer Science 00 (2019) 000–000

6. Data Depersonalization According to the personal data protection requirements, our system has to implement depersonalization of personal data. At the same time, it should keep the possibility of personal data processing [22], i.e. data after depersonalization must have a number of properties, which include: – data completeness – saving information about a specific user; – data structure – maintaining proper structural relationships between depersonalized data; – data relevance – the ability to process requests for personal data processing and receive answers in the same semantic form; – data applicability – the ability to solve personal data processing tasks without de-referencing the entire amount of user data; – data anonymity – the inability to uniquely identify data subjects, derived from depersonalization, without using any additional information. However, the development of a system for analyzing social communities is not compatible with complete depersonalization, because the client of the system needs to know specific information about the user. Only partial achievement of the anonymity of the stored data is possible. It is supposed to encrypt basic information about users (name, surname, date of birth, address), and to prevent unauthorized access to the database. In other words, the entire database should be encrypted separately. To prevent unauthorized access to the system on which the service is running, you should use appropriate security policies that are not the subject of this article. The data collected by the collection and storage subsystem will be used for the following tasks: Audience segmentation. The proposed system suggests the possibility of segmentation of the audience according to various criteria. This is derived from the methods of using social networking data described above. Thus, the system will allow to divide users according to their interests and offer them relevant communities. Also, it can make assumptions about which communities would be interesting for the user depending on the information she publishes. Proper targeting. Promotion of specific content related to goods and services will be more efficient if target audience is analyzed beforehand. The problem is twofold: from one side, certain communities may not even realize that a product or a service they are interested in have appeared. From another, many producers reach too small an audience and need means to present and disseminate information about their goods and services more effectively. The proposed system is able to define possible target online and offline audiences service providers should appeal to. The system is also able to build and analyze graphs of interests within the communities. Each user will be able to get information about people with whom she may have common interests and communities that may interest her. In addition, the system is useful for various analytical centers interested in sociocultural, economic and political trends in cities or other regions. Moreover, the system can predict destructive communal behavior and be applied for prevention policies. 7. OLAP-cubes module The data collected by the system is then used for constructing OLAP-cubes. One of the cubes of the module (Fig. 4) is implemented on the basis of the star schema and allow to analyze user actions based on the entries on their page [24, 25, 26]. Some of the information will not be used in analyzing users; it is shown in the data collection model. All functionality for working with OLAP-cube can be attributed to the information processing module, the second main module of the information system. All basic data operations take place in this module.

Sobolevsky, Nenko,Computer Ilia Savenkov / Procedia Computer Science 00 (2019) 000–000 7 200 Lada Rudikowa, Oleg Myslivec, Stanislav Lada Rudikowa Alexandra et al. / Procedia Science 156 (2019) 194–203

Fig. 4. OLAP-cube model

Data collection module and data analysis module are implemented based on Net Framework by Microsoft, С# language, Database management system MSSQL, and various software tools (Entity Framework, Analytics Services, Azure). Based on the fact that the Azure platform is used for data proceeding, the system performance depends on the virtual machine purchased. The main limitation of the system is that the API of social networks does not allow to collect data continuously. The number of possible requests per second varies across social networks, so the collection of initial data is quite a long procedure. Another problem is “quality” of users’ data - many users of social networks are not active enough so that their data could be analyzed. Despite the fact that it is difficult to estimate system performance without administrative module and GUI module, there is a possibility of obtaining some basic information about social network users’ activity after data collection process. At that moment considered system works in real-time and gathers possible users’ data. According to the data already collected it was possible to cover, for example, first 20 cities with interconnected users. Initially data collection started from Grodno (Belarus) and as a result social network users mostly communicate with Kiev, Minsk and Moscow (Figure 5). The system currently contains 10444 user records after one month of work. Fault tolerance is achieved by monitoring the data collection errors. If due to some kind of a problem it was not possible to obtain certain data, information about this issue is stored in a log file, and subsequently, with a certain periodicity, attempts are made to obtain the necessary information block. Initially, data of 181 users was not obtained due to various kinds of errors, of which 121 users’ data errors were received while gathering general user profile data (the profile was deleted or temporarily blocked), 50 errors because of profile privacy constraints and 10 errors due to the request API timeout (Internet connection problems). In total, 7616 failed requests were logged and processed, which is approximately 5.5% of the total number of requests. Summarizing, we can say that the missing data blocks of all 181 users were successfully obtained anew.

8

Lada Rudikowa et al. / Procedia Computer Science 156 (2019) 194–203 201 8LadaLada Rudikowa, Rudikowa, OlegOleg Myslivec, Myslivec, Stanislav Stanislav Sobolevsky, Sobolevsky, Alexandra Alexandra Nenko, Nenko, Ilia Savenkov Ilia Savenkov / Procedia / Procedia Computer Computer Science Science 00 (2019) 00 (2019) 000–000 000–000

Users amount by city Users amount by city 40004000 35003500 30003000 25002500 20002000 15001500 10001000 500 500 0

0

Fig. Fig. 5. Top 5. Top 20 interconnected 20 interconnected citiescities

8. Conclusion 8. Conclusion TheThe development development of aofgeneral a general concept concept andand implementation implementation of aofsystem a system for for collecting collecting andand analyzing analyzing social social network network datadata is topical is topical duedue to the to the factfact thatthat the the latter latter havehave become become a major a major place place for for communication, communication, self-representation, self-representation, information information search, search, trade, trade, events, events, etc.etc. Existence Existence of many of many online online communities communities allows allows conducting conducting various various studies studies on on theirtheir content: content: analysis analysis andand research research of the of the prospects prospects for for the the use use of certain of certain resources, resources, analysis analysis of possible of possible patterns patterns of of available available datadata arrays arrays andand the the influence influence of various of various factors factors on the on the development development of these of these communities. communities. TheThe system system willwill allow allow to collect to collect extensive extensive information information on various on various online online communities communities andand theirtheir users users andand obtain obtain the the necessary necessary analytic analytic reports, reports, carry carry out out datadata processing processing andand apply apply appropriate appropriate datadata mining mining methods methods andand algorithms. algorithms. ThisThis development development willwill be useful be useful for for the the statestate centers centers of general of general information information processing processing in order in order to develop to develop newnew policies policies in the in the fieldfield of management of management of social of social processes, processes, intercultural intercultural communications communications andand recommendations recommendations for for the the development development of regional of regional infrastructure. infrastructure. It isItalso is also expected expected thatthat thisthis system system willwill be claimed be claimed by abywide a wide range range of people of people interested interested in studying in studying andand analyzing analyzing the the hidden hidden preferences preferences of users of users of various of various social social networks. networks. It will It will be useful be useful for the for the users users whowho want want to find to find like-minded like-minded people people on different on different issues issues throughout throughout the the media media space, space, as well as well as analytical as analytical andand other other centers, centers, which which are are interested interested in in tracking tracking social social network network trends trends in the in the aspect aspect of the of the administrative administrative division division of regions, of regions, countries, countries, cities, cities, etc.etc. It will It will contribute contribute to the to the identification identification of various of various sociocultural, sociocultural, economic economic andand political political trends trends of modern of modern society society withwith the the possibility possibility of future of future adjustment adjustment of processes of processes in these in these areas. areas. DataData integrity integrity andand confidentiality confidentiality are are crucial crucial for for the the considered considered system. system. Without Without proper proper information information security, security, datadata leakage leakage from from suchsuch system system cancan cause cause horrendous horrendous consequences, consequences, therefore, therefore, additional additional research research in the in the fieldfield of secure of secure datadata acquisition, acquisition, storage storage andand processing processing for for suchsuch systems systems is required. is required.

Lada Rudikowa, Oleg Myslivec, Stanislav Sobolevsky, Alexandra Nenko, Ilia Savenkov / Procedia Computer Science 00 (2019) 000–000 9

202

Lada Rudikowa et al. / Procedia Computer Science 156 (2019) 194–203

Reference 1. Rudikova, L.V. On the general architecture of a universal data storage and processing system of practice-oriented orientation. Rudikova / System Analysis and Applied Informatics. – Mn.: BNTU, 2017. – № 2. – P. 12-19. 2. Rudikova, L.V. On modeling data of subject-areas of practice-oriented orientation for a universal system of data warehousing and data processing // L.V. Rudikova, E.V. Zhavnerko / System Analysis and Applied Informatics. – Mn.: BNTU, 2017. – № 3. – P. 19-26. 3. Belyi, A. Global multi-layer network of human mobility //Alexander Belyi, Iva Bojic, Stanislav Sobolevsky, Izabela Sitko, Bartosz Hawelka, Lada Rudikova, Alexander Kurbatski, Carlo Ratti / International Journal of Geographical Information Science. – 2017. – Volume 31. – P. 1381-1402. 4. Belyi, A.B. Flickr service data and community structure of countries / A.B. Belyi, L.V. Rudikova, S.L. Sobolevsky, A.N. Kurbatski // International Congress on Computer Sciens: Information Systems and Technologies: materials of International Scientific Congress, Republic of Belarus, Minsk, 24 October. – 27 Nov. 2016. / BSU; rare: S.V. Ablameiko (editorial editors) [and others]. – Minsk, 2016. – P. 851–855. 5. Amini A. The impact of social segregation on human mobility in developing and industrialized regions / Amini A, Kung K, Kang C, Sobolevsky S, and Ratti C // EPJ Data Science. – 2014. – 3(1):6. 6. Pei T. A new insight into land use classification based on aggregated mobile phone data / Pei T., Sobolevsky S., Ratti C., Shaw S. L., Li T., Zhou, C. // International Journal of Geographical Information Science. – 2014. – 28(9), P. 1988-2007. 7. Santi P. Quantifying the benefits of vehicle pooling with shareability networks / Santi P., Resta G., Szell M., Sobolevsky S., Strogatz S.H., Ratti C. // Proceedings of the National Academy of Sciences. – 2014. – 111(37). – Рp. 13290-13294. 8. Kung K. Exploring universal patterns in human home/work commuting from mobile phone data / Kung K., Greco K., Sobolevsky S., Ratti C. // PLoS ONE. – 2014. – 9(6):e96180. 9. Hashemian B. Socioeconomic characterization of regions through the lens of individual financial transactions / Hashemian B., Massaro E., Bojic I., Arias J. M., Sobolevsky S., Ratti C. // PloS one – 2017 - 12(11), e0187031. 10. Bojic I. Scaling of foreign attractiveness for countries and states / Bojic I., Belyi, A., Ratt, C., Sobolevsky S. // Applied Geography. – 2016. – 73. – P. 47-52. 11. Sabou M. Visualizing Statistical Linked Knowledge Sources for Decision Support / Sabou M., HubmannHaidvogel A., Fischl D., Scharl A. // SemanticWeb. – 2016 – 1. – P. 1-25. 12. Li Q. VisTravel: visualizing tourism network opinion from the user generated content / Li Q., Wu Y., Wang S., Lin M., Feng X., Wang H. // J. Vis. – 2016. – 19. – P. 489-502.7. Rudikova L.V. (2014). On the development of a system for the support of laser rapid examination: monograph. LAP LAMBERT Academic Publishing, 134 p. (in Russian) 13. Data analysis of social networks: methods and applications [Electronic resource] / ISP RAS Access mode: http://www.ispras.ru/ - Access mode: 09/18/2018. 14. ETL environment (extraction, transformation and loading) Rational Insight [Electronic resource]. – Access mode: [https://www.ibm.com/support/knowledgecenter/ ru/SSRL5J_1.1.1/com.ibm.rational.raer.overview.doc/topics/c_arch_etl_process.html]. – Access date: [14.05.2018]. 15. Cao, Jinwei; Basoglu, Kamile Asli; Sheng, Hong; and Lowry, Paul Benjamin (2015) "A Systematic Review of Social NetworksResearch in Information Systems: Building a Foundation for Exciting Future Research,"Communications of the Association forInformation Systems: Vol. 36 , Article 37. 16. Liao, C.-H., & Chen, M.-Y. (2018). Building social computing system in big data: From the perspective of social network analysis. Computers in Human Behavior. doi:10.1016/j.chb.2018.09.040 17. Chang, V. (2018). A proposed social network analysis platform for big data analytics. Technological Forecasting and Social Change, 130, 57–68. doi:10.1016/j.techfore.2017.11.002

10

Lada Rudikowa et al. / Procedia Computer Science 156 (2019) 194–203 203 Lada Rudikowa, Oleg Myslivec, Stanislav Sobolevsky, Alexandra Nenko, Ilia Savenkov / Procedia Computer Science 00 (2019) 000–000

18. McCarthy, D. D. P., D. D. Crandall, G. S. Whitelaw, Z. General, and L. J. S. Tsuji. 2011. A critical systems approach to social learning: building adaptive capacity in social, ecological, epistemological (SEE) systems. Ecology and Society 16(3): 18. http://dx.doi.org/10.5751/ES-04255-1603 19. Building a Social Recommendation System Based on Big Data / Alibaba Cloud [Electronic resource]. – Access mode: [https://www.alibabacloud.com/blog/building-a-social-recommendation-system-based-on-bigdata_593980?spm=a2c4.12011681.0.0] 20. Evolution of the Netflix Data Pipeline [Electronic resource]. – Access mode: https://medium.com/netflixtechblog/evolution-of-the-netflix-data-pipeline-da246ca36905 21. Jieren Liu, Xiaolin Li. (2016) Innovation business model of Big Data - Taking Coca-Cola as an example. 3rd International Conference on Management, Education Technology and Sports Science (METSS 2016). doi: https://doi.org/10.2991/metss-16.2016.24 22. GDPR – new rules for the processing of personal data in Europe for the international IT market [Electronic resource]. – Access mode: [https://habr.com/company/digitalrightscenter/blog/344064/]. – Access date: [May 14.05.2018]. 23. Martorell, L.B., Nascimento, W.F., & Garrafa, V. (2015). Social networks, privacy, confidentiality and ethics: exposure of patient images on Facebook. Interface - Communication, Health, Education, 20 (56), 13-23. doi: 10.1590 / 1807-57622014.0902 24. Robert Wrembel, Christian Koncilia. Data warehouses and OLAP: concepts, architectures, and solutions. IRM Press, 2007. PP. 1-26. 25. Codd, E.F. Providing OLAP (on-line analytical processing) to user-analysts: An IT mandate // E.F. Codd, S.B. Codd, C.T.Salley / Technical report. – 1993 26. Rudikova, L.V. Database design: studies. student manual higher studies. institutions in the field of “Software Information Technologies "," Economic Cybernetics «, “Applied Mathematics”, “Information. systems and technology (in economics)”/ L.V. Rudikova. - Minsk: ITC Ministry of Finance, 2009.- 352 s.