Enabling Next Generation Logistics and Planning for Smarter Societies

Enabling Next Generation Logistics and Planning for Smarter Societies

Available online at www.sciencedirect.com Available online at www.sciencedirect.com ScienceDirect ScienceDirect Procediaonline Computer 00 (2017) 00...

679KB Sizes 0 Downloads 42 Views

Available online at www.sciencedirect.com Available online at www.sciencedirect.com

ScienceDirect ScienceDirect

Procediaonline Computer 00 (2017) 000–000 Available at Science www.sciencedirect.com Procedia Computer Science 00 (2017) 000–000

ScienceDirect

www.elsevier.com/locate/procedia www.elsevier.com/locate/procedia

Procedia Computer Science 109C (2017) 1122–1127

International Workshop on Smart Cities Systems Engineering (SCE 2017) International Workshop on Smart Cities Systems Engineering (SCE 2017)

Enabling Next Generation Logistics and Planning for Smarter Enabling Next Generation Logistics and Planning for Smarter Societies Societies

Sugimiyanto Suma11, Rashid Mehmood2* , Nasser Albugami3, Iyad Katib1, Aiiad Albeshri1 Sugimiyanto Suma , Rashid Mehmood2*, Nasser Albugami3, Iyad Katib1, Aiiad Albeshri1 Department of Computer Science, FCIT, King Abdulaziz University, Jeddah, 21589, Saudi Arabia Department of Computer Science, FCIT,King KingAbdulaziz AbdulazizUniversity, University,Jeddah, Jeddah,21589, 21589,Saudi SaudiArabia Arabia High Performance Computing Center, 3 2 Department of Information Technology, FCIT, Abdulaziz University, Jeddah, 21589, Saudi Arabia High Performance Computing Center, KingKing Abdulaziz University, Jeddah, 21589, Saudi Arabia 3 Department of Information Technology, FCIT, King Abdulaziz University, Jeddah, 21589, Saudi Arabia 1

12

Abstract Abstract Social media has revolutionized our societies. It has made fundamental impact on the way we work and live. More importantly, Social media is hasgradually revolutionized our asocieties. hassmart madesocieties fundamental impactthe on information the way we about work and live. More social media becoming key pulseIt of by sensing the people and importantly, their spatiosocial media is gradually becoming a key pulse Big of smart by sensingintelligence the information about thearepeople andustheir spatiotemporal experiences around the living spaces. data societies and computational technologies helping to manage temporal experiences around spaces. data and computational intelligence technologies are helping us to and analyze large amounts of the dataliving generated by Big the social media, such as twitter, and make informed decisions about us manage and the and analyze of dataour generated by thework socialonmedia, such twitter, andfor make about us and the living spaces.large Thisamounts paper reports preliminary the use of as social media the informed detection decisions of spatio-temporal events living Thisand paper reportsSpecifically, our preliminary work theand useAI ofplatforms social media for theHadoop, detection of spatio-temporal relatedspaces. to logistics planning. we use bigon data including Spark, and Tableau, toevents study related logistics planning. Specifically, useGoogle big data and AI platformsAPI including Hadoop, Spark, and to study twitter to data about and London. Moreover, we usewethe Maps Geocoding to locate the tweeters and Tableau, make additional twitter London. use the the London Google city. Maps APIthat, to locate tweeters and top make additional analysis.data Weabout find and locate Moreover, congestion we around WeGeocoding also discover during the a certain period, third tweeted analysis. Weabout find and congestion the London city. of Wethe also discover that, during atocertain period, top third tweeted words were job locate and hiring, leadingaround us to locate the source tweets which happened be originating from around the words were about and major hiring,financial leading center. us to locate the source of thein tweets which happened to be originating fromtweets. around the Canary Wharf area,job UK’s The results presented the paper have been obtained using 500,000 Canary Wharf area, UK’s major financial center. The results presented in the paper have been obtained using 500,000 tweets. © 2017 The Authors. Published by Elsevier B.V. 1877-0509 © 2017 The Authors. Published by Elsevier B.V. © 2017 The under Authors. Published by B.V. Program Chairs. Peer-review responsibility of Elsevier the Conference Conference Peer-review under responsibility of the Program Chairs. Peer-review under responsibility of the Conference Program Chairs. Keywords: Smart Cities, Smart Societies, Big Data, High Performance Computing, Social network Analysis, Logistics, Planning, Transportation Keywords: Smart Cities, Smart Societies, Big Data, High Performance Computing, Social network Analysis, Logistics, Planning, Transportation

1. Introduction 1. Introduction Social media has revolutionized our societies. It has made fundamental impact on the way we work and live. Social media hassocial revolutionized our societies. It has amade fundamental on the way we the work and live. More importantly, media is gradually becoming key pulse of smartimpact societies by sensing information More importantly, social media is gradually becoming a key pulse of smart societies by sensing the information Corresponding author. Tel.: +966-537-069-318. Corresponding Tel.: +966-537-069-318. E-mail address:author. [email protected] E-mail address: [email protected] 1877-0509 © 2016 The Authors. Published by Elsevier B.V. Peer-review©under the Conference Program 1877-0509 2016responsibility The Authors. of Published by Elsevier B.V. Chairs. Peer-review under responsibility of the Conference Program Chairs. * *

1877-0509 © 2017 The Authors. Published by Elsevier B.V. Peer-review under responsibility of the Conference Program Chairs. 10.1016/j.procs.2017.05.440

2

Sugimiyanto Suma et al. / Procedia Computer Science 109C (2017) 1122–1127 Suma et al./ Procedia Computer Science 00 (2017) 000–000

1123

about the people and their spatio-temporal experiences around the living spaces. Twitter is one of the most-widely used social media at these days. People tend to post about their life, feel and experience to twitter. This thing leads to huge number data generation every day. There are around 500 million tweets per day1. It is a huge amount of data, and this is a big chance to explore the data and getting something benefit from it. By emerging big data term and technology, we can utilize this data to be explored and analyzed further to get insight and result in decision. Every day data is being generated very fast, with vast amount, and in various type. There are lots of resources produce this data such as internet of things (IoT) from various installed sensor, microblogging, social media posting, transactional data, digital video and picture, and many more. This is called big data, and it is stored in cluster machine with distributed system. Big data also means big challenge, big effort, and big result. Stakeholders agree with the 3Vs (Volume, Variety, Velocity) to define the characteristics of big data, although there are some new Vs proposed by some researchers at these days. We use term big data analytic to get a deeper and wider insight for a good decision from our data. Since, big data is a lake of data which contains any type of data. Preparing this data to be easily processed further is challenging and consuming time, however we can get insight more accurate and faster. Big data and computational intelligence technologies are helping us to manage and analyze large amounts of data generated by the social media, such as twitter, and make informed decisions about us and the living spaces. This paper reports our preliminary work on the use of social media for the detection of spatio-temporal events related to logistics and planning. Specifically, we use big data and AI platforms including Hadoop, Spark, and Tableau, to study twitter data about London. Moreover, we use the Google Maps Geocoding API to locate the tweeters and make additional analysis. We found and located congestion around the London city. We also discovered that, during a certain period, top third tweeted words were about job and hiring, leading us to locate the source of the tweets which happened to be originating from around the Canary Wharf area, UK’s major financial center. The results presented in the paper have been obtained using 500,000 tweets. This paper is organized as follows. Section 2 gives a literature review of any previous work related to this topic. Section 3 describes the methodology and design used in this paper. Section 4 discusses the results and analysis of our research. Finally, we conclude in Section 5 with directions for future work. 2. Literature Review Big data and social network analysis have been used in the past for smart cities research. Herrera-Quintero et al. have proposed the design and implementation of an intelligent transportation system (ITS) prototype which combined big data and internet of things (IoT) to produce ITS cloud service for supporting transportation planning for Bus Rapid Transit (BRT) Systems2. Mousheimish et al. have proposed a framework that connects critical business operations with contextual events3. This framework aims to predict possible infraction to their occurrences. Kolchyna et al. proposed a framework to detect twitter events to predict spikes in sales and evaluated it using a dataset comprising 150 million tweet records and 75 brands of sales data 4. Qu et al. investigated a dynamic Production Logistics Synchronization (PLS) mechanism of a manufacturer which adopts public PL service5. They reported that cloud manufacturing integrated with IoT infrastructure enables smart mechanisms for PLS control. Khan et al. developed and implemented a prototype cloud-based analytics service for big data management and analysis in smart cities6. Other relevant works on mobility in smart cities include smart emergency management systems7,8, virtual reality based traffic event simulations9, autonomic mobility systems10,11, urban logistics12–15, and location based services16. A review of smart logistics is provided by Schuh et al. highlighting its benefits for efficient transportation17. Lingli et al. have reviewed existing logistics platforms and given recommendations for smart city and transportation logistics platforms18. Alam et al. has presented a comparison of eight artificial intelligence algorithms that is useful for selecting the right algorithm for social network data analysis purposes 19. The contribution of this work is novel due to its uniquely different focus on the spatio-temporal analysis of the events relevant to road traffic using big data technologies. Some similar works exist20–22, however, these works do not use big data technologies and their analyses are different. Moreover, our work is also looking into integrating high performance computing (HPC) with big data technologies.

1124

Sugimiyanto Suma et al. / Procedia Computer Science 109C (2017) 1122–1127 Suma et al./ Procedia Computer Science 00 (2015) 000–000

3

3. Methodology and Design Our system used some current sources and technologies for big data such as twitter data streaming as the data source, hadoop as storage space, spark to analyze the data in cooperation with geocoding API to generate spatial detail data of geotagged twitter, and tableau as visualization tool. The way how this system is flowing shown in Fig. 1, consisting of those sources and technologies. Actually, in general, hadoop is not only for storage and tableau is not only for visualization they could be used for analysis as well. However, in our case, hadoop is only used for storage space storing crawled twitter data (HDFS), and tableau is only used for interpretation to show location distribution in form of map.

Fig. 1. The System Pipeline

General description of how our system work is shown in Fig. 1. We will describe it in more detail. First, we created a standalone program, we call it crawlerApp which crawls twitter data with streaming API provided by twitter. In crawlerApp, we embedded keywords related to what we are looking for, such as “London”, “Traffic”, etc. Therefore, this application acts as real-time listener when there is new incoming tweet about those keywords it will be collected by the application. Second, the collected twitter data will be stored in HDFS with cluster machine to maintain the data availability for fault tolerance. Third, we built spark application with Java programming language, we call it analyzer for further. Using spark for bigdata processing is an efficient way since spark is in-memory computation which has better speed, replication for fault tolerance and data parallelization provide reliability when we are processing bigdata and takes long time. Analyzer takes raw twitter data as the data source from HDFS. Furthermore, there are several sub-processes in spark application: (1) Data parsing; this process will parse raw twitter data from JSON format to separate predefined parameters as our requirement. (2) Data pre-processing; also known as data preparation, is one of the challenge in big data analytic pipeline23. Data pre-processing has essential role in big data analytic pipeline which affects quality of analysis, and leads to better result. It is a set of technique to process raw data before further processed in analysing phase. Since the collected data will might be imperfect, redundant, inconsistent, and contains unnecessary part. The set of tasks in data pre-processing includes data transformation, integration, cleaning, and normalization. We faced those challenges, when twitter data contains unnecessary part such as URL, or any illegal character during crawling process. We should clean those up from the data before it will be processed for analysis. This process is also time-demanding when there are lots of exception in the raw data that should be taken into consideration. Another data preprocessing is transformation, generating other additional data from current data, when there is our required field does not exist in the data e.g. required detail of location data such as district, postal code, region, or other any administrative level area for plotting to a map. This detail can be generated from tweet spatial data latitude and longitude, by invoking Google Geocoding API and passing geocoordinate data, then we will get the response in JSON format contains all detail about the location point. (3) Statistical analysis; analyzer will perform statistical analysis to make sense and getting insight from our data. This analysis including counting and grouping into categories such as time-based category, location, and topic interest. The result will be used in the last phase which is visualization. Finally, we performed summarizing to present our finding or well known as storytelling. We used MS. Excel and tableau platform to present our finding in visual way and aims to be easy to understand by user or stakeholder getting the insight. We used filled map component to show area based on certain administrative area level such as district, county, postcode, etc. In this case, we used postcode to represent and grouping location of tweet.

4

Sugimiyanto Suma et al. / Procedia Computer Science 109C (2017) 1122–1127 Suma et al./ Procedia Computer Science 00 (2017) 000–000

1125

4. Result and Analysis We gathered twitter data about London traffic start from 1st – 4th March 2017. We grouped it as hourly data, and it is represented from 0 to 71. Thus, in total, we collected 72 hours’ tweet about traffic. As shown in Fig. 2, the relation between time and number of tweet about London traffic is varied. We can look at the graph, the highest number of tweet is at around fifteenth hour, by realizing this thing we can conclude that people tweeting about traffic at that time because of an event was on going, and it caused problem to the traffic in London.

Fig. 2. traffic related to number of tweet in London

Experiment result shows that tweet intensity about traffic in London is shown in Fig. 3. The tension is varied by coloring and increasing from green to red as the intensity. It shows that downtown has the highest tension number of tweet about traffic as shown in red color. We assume that high number of tweet about traffic means there is more traffic on that area. The grey part of map means we did not get tweet related to those areas which geotagged. Therefore, we cannot determine their location. A tweet will contain geotagged data only when tweeters are willing to share it before posting a tweet. In other hand, previously, we analyzed tweets between 26th February and 2nd March 2017. Word “job” and “hiring” were among the third highly tweeted words. As shown in Fig. 3, the location distribution of tweet about job and hiring took place. Most of the tweet came from one particular area as shown in red color from the figure. By looking at this phenomenon we can assume that there was something happened on that area related to job and hiring. This is located ±5 miles from Canary Wharf which is UK’s major financial center.

(a) tweet intensity about traffic

(b) tweet intensity about job and hiring

Fig. 3. Intensity distribution number of tweet in some areas based on postcode

1126

Sugimiyanto Suma et al. / Procedia Computer Science 109C (2017) 1122–1127 Suma et al./ Procedia Computer Science 00 (2015) 000–000

5

We collected more twitter data about London from 1st March to 4th March 2017. We found that tweet about football such as premiere league and Chelsea FC were among the highly-mentioned words. As shown in Fig. 4, the number of tweet is increasing by the time. By looking at the number of tweet increasing, we assume that there is something happened at that time in London related to football. It was true, because when we look at that time, there was a football event called “Premiere League” being held for some months, and Chelsea FC which is from London was taking a part of that competition.

Fig. 4. job and hiring related to number of tweet in London

Our analysis does not consider contextual analysis of tweet data to ensure whether it is traffic problem or not. Therefore, it is likely we had overestimated the reported numbers. However, our goal here is to report the methodology and use of social media as the data source. 5. Conclusion We have presented our work dealing with bigdata in case twitter data about London. This paper reports our preliminary work on the use of social media for the detection of spatio-temporal events related to logistics and planning. Specifically, we use big data and AI platforms including Hadoop, Spark, and Tableau, to study twitter data about London. Moreover, we use the Google Maps Geocoding API to locate the tweeters and make additional analysis. We find and locate congestion around the London city. We also discover that, during a certain period, top third tweeted words were about job and hiring, leading us to locate the source of the tweets which happened to be originating from around the Canary Wharf area, UK’s major financial center. The results presented in the paper have been obtained using 500,000 tweets. We have encountered a number of challenges in our work and these will be addressed in the future work. In analysis phase, our analysis does not consider contextual analysis of tweet data to ensure whether it is traffic problem or not. Therefore, it is likely we had overestimated the reported numbers. However, our goal here is to report the methodology and use of social media as the data source. Future work in this research will focus on improving data management, details and quality of analysis, and scalability of our software. For data/information management and integration, we hope to benefit by our earlier on sustainable enterprise information systems24,25. For enhancing scalability and computational intelligence, we plan to integrate HPC and big data technologies. Acknowledgments The experiments in this paper were performed on the Aziz supercomputer at the King Abdulaziz University.

6

Sugimiyanto Suma et al. / Procedia Computer Science 109C (2017) 1122–1127 Suma et al./ Procedia Computer Science 00 (2017) 000–000

1127

References 1. 2. 3. 4. 5. 6. 7. 8.

9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 21. 22. 23. 24. 25.

Internetlivestats.com. Twitter Usage Statistics. Available at: http://www.internetlivestats.com/twitter-statistics/#trend. Luis Felipe Herrera-Quintero, Klaus Banse, Julian vega-Alfonso, A. V.-S. Smart ITS Sensor for the transportation Planning using the IoT and Bigdata approaches to produce ITS cloud services. 3–9 Raef Mousheimish, Thia Taher, B. F. Towards Smart Logistics Process Management. New York 5692, 46-55–55 (1973). Kolchyna, O., Treleaven, P. C. & Aste, T. A Framework for Twitter Events Detection , Differentiation and its Application for Retail Brands. (2016). Qu, T. et al. IoT-based real-time production logistics synchronization system under smart cloud manufacturing. Int. J. Adv. Manuf. Technol. 84, 147–164 (2016). Khan, Z., Anjum, A., Soomro, K. & Tahir, M. A. Towards cloud based big data analytics for smart future cities. J. Cloud Comput. 4, 1–11 (2015). Alazawi, Z., Alani, O., Abdljabar, M. B., Altowaijri, S. & Mehmood, R. A Smart Disaster Management System for Future Cities. WiMobCity ’14. Int. Work. Wirel. Mob. Technol. Smart Cities 1–10 (2014). doi:10.1145/2633661.2633670 Alazawi, Z., Abdljabar, M. B., Altowaijri, S., Vegni, A. M. & Mehmood, R. ICDMS: An intelligent cloud based disaster management system for vehicular networks. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 7266 LNCS, (Springer, 2012). Ayres, G. & Mehmood, R. On discovering road traffic information using virtual reality simulations. in 11th International Conference on Computer Modelling and Simulation, UKSim 2009 411–416 (2009). doi:10.1109/UKSIM.2009.14 Schlingensiepen, J. & Nemtanu, F. in Intelligent Transportation Systems – Problems and Perspectives (eds. Sladkowski, A. & Pamula, W.) 3–35 (Springer International Publishing, 2016). doi:10.1007/978-3-319-19150-8_1 Schlingensiepen, J. Framework for an Autonomic Transport System in Smart Cities. Cybern. … (2015). Mehmood, R. & Graham, G. Big Data Logistics: A health-care Transport Capacity Sharing Model. Procedia Comput. Sci. 64, 1107–1114 (2015). Mehmood, R. & Lu, J. A. Computational Markovian analysis of large systems. J. Manuf. Technol. Manag. 22, 804–817 (2011). Mehmood, R., Meriton, R., Graham, G., Hennelly, P. & Kumar, M. Exploring the Influence of Big Data on City Transport Operations: a Markovian Approach. Int. J. Oper. Prod. Manag. Forthcomin, (2016). Graham, G., Mehmood, R. & Coles, E. Exploring future cityscapes through urban logistics prototyping: a technical viewpoint. Supply Chain Manag. 20, 341–352 (2015). Ayres, G. & Mehmood, R. LocPriS: A security and privacy preserving location based services development framework. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 6279 LNAI, (2010). Schuh, G., Gottschalk, S., Höhne, T. & Attig, P. Further Potentials of Smart Logistics. 41st CIRP Conf. Manuf. Syst. 93–96 (2008). doi:10.1007/978-1-84800-267-8_18 Lingli, J. Smart city, smart transportation - Recommendations of the logistics platform construction. Proc. - 2015 Int. Conf. Intell. Transp. Big Data Smart City, ICITBS 2015 729–732 (2016). doi:10.1109/ICITBS.2015.184 Alam, F., Mehmood, R., Katib, I. & Albeshri, A. Analysis of Eight Data Mining Algorithms for Smarter Internet of Things (IoT). Procedia Comput. Sci. 98, 437–442 (2016). D ’andrea, E., Ducange, P., Lazzerini, B. & Marcelloni, F. Real-Time Detection of Traffic From Twitter Stream Analysis. IEEE Trans. Intell. Transp. Syst. 1, Hou Lei, K., Khadiwala, R. & Chen-Chuan Chang, K. TEDAS: a Twitter Based Event Detection and Analysis System. Gutierrez, C., Figuerias, P., Oliveira, P., Costa, R. & Jardim-Goncalves, R. Twitter mining for traffic events detection. in 2015 Science and Information Conference (SAI) 371–378 (IEEE, 2015). doi:10.1109/SAI.2015.7237170 García, S., Ramírez-Gallego, S., Luengo, J., Benítez, J. M. & Herrera, F. Big data preprocessing: methods and prospects. Big Data Anal. 1, 9 (2016). Ahmad, N. & Mehmood, R. Enterprise systems and performance of future city logistics. http://dx.doi.org/10.1080/09537287.2016.1147098 27, 500–513 (2016). Ahmad, N. & Mehmood, R. Enterprise systems: Are we ready for future sustainable cities. Supply Chain Manag. 20, 264–283 (2015).