Personalized Interactive Storyboarding utilizing Content Based Multimedia Retrieval

Personalized Interactive Storyboarding utilizing Content Based Multimedia Retrieval

12th IFAC Symposium on Analysis, Design, and Evaluation of Human-Machine Systems August 11-15, 2013. Las Vegas, NV, USA Personalized Interactive Stor...

1MB Sizes 1 Downloads 41 Views

12th IFAC Symposium on Analysis, Design, and Evaluation of Human-Machine Systems August 11-15, 2013. Las Vegas, NV, USA

Personalized Interactive Storyboarding utilizing Content Based Multimedia Retrieval Phani Kidambi*, S. Narayanan** *College of Engineering & Computer Science, Wright State University, Dayton, OH 45435 USA (Tel: 937-775-5001; e-mail: [email protected]) **College of Engineering & Computer Science, Wright State University, Dayton, OH 45435 USA (Tel: 937-775-5001; e-mail: [email protected]) Abstract: An explosion of digital multimedia technologies that permit quick and easy uploading of any multimedia file to the web, coupled with the rapid advancement of hardware and software have resulted in petabytes of multimedia data being available on the internet. In this paper, we provide a framework that rapidly integrates these data for personalized storyboarding utilizing Content Based Multimedia Retrieval (CBMR) techniques. A storyboard is a narrative of audio, text, video and images of a particular topic that are linked systematically. People have different abilities to understand a story/topic according to their learning styles coupled with experience, age, knowledge, gender etc. Personalized storyboard multimedia learning help the user dictate the learning process in a creative way through multiple means of representation, expression and engagement. This process involves active learning, both behavioral as well as cognitive. Learning is constructive, and information learned is remembered at a deeper level. The use of multimedia promotes meaningful learning that can be transferred or generalized to other situations. In this paper, various learning styles are discussed. The goal of the research was to design and implement a novel approach that integrates human reasoning with computerized algorithms for multimedia storyboarding for learning. By systematically coupling human reasoning and computerized process, we attempt to minimize the barrier between the human’s cognitive model of what they are trying to storyboard and the computers understanding of the user’s task. We present our model which utilizes a fusion of content based image retrieval, content based video retrieval, content based audio retrieval, and text based retrieval techniques with human computer interaction based relevance feedback to enhance the learning process through personalized multimedia storyboarding. To illustrate the advantages of this model in a greater detail, we showcased the concepts of this model utilizing the game of cricket. We summarize the paper by discussing the future in this area. Keywords: Human Computer Interaction, Content Based Multimedia Retrieval, Learning, Storyboarding 

1.



INTRODUCTION

A storyboard is a narrative of text, audio, images and video of a particular topic that are linked systematically. When a person has to learn something new, the person approaches the task in a similar fashion each time. Overtime, the person develops a behaviour that he uses for new learning and this pattern is known as a learning style. People have different abilities to understand a story/topic according to their experience, age, gender, knowledge, culture etc. Their learning styles are different. The traditional definition of a learning style can be defined as a characteristic cognitive, effective, and psychosocial behaviour that serve as relatively stable indicators of how learners perceive, interact with, and respond to the learning environment [Curry, 1981]. No singly accepted learning methodology exists.



Perception: Some people perceive information using concrete examples (touch, feel) while others perceive information abstractly (visualize) Processing: Some people process information by active experimentation while others perceive it by reflective observation

The four dimensions of Kolb’s model are illustrated in Figure 1.

To measure learning styles, Kolb [1976] identified two learning activities:

978-3-902823-41-0/2013 © IFAC

526

10.3182/20130811-5-US-2037.00098

IFAC HMS 2013 August 11-15, 2013. Las Vegas, USA

the user. The goal of the research was to design and implement a novel approach that integrates human reasoning with computerized algorithms for personalized multimedia storyboarding. By systematically coupling human reasoning and computerized process, we attempt to minimize the barrier between the human’s cognitive model of what they are trying to learn and the computers understanding of the user’s task. In section 2 of this paper we discuss learning in this era, in section 3 we discuss personalized storyboarding, in section 4 traditional multimedia retrieval techniques are discussed and in section 5 we present our model which utilizes a fusion of content based image retrieval, content based video retrieval, content based audio retrieval and text based retrieval techniques with human computer interaction based relevance feedback to enhance the personalized storyboard multimedia learning process and explain it using the example of game of cricket. We finally conclude this paper by giving our insights to the future of this field.

Perception

Concrete Experience

Accommodating

Processing

Converging

Processing

Perception

Active Experimentation

Diverging

Reflective Observation

Assimilating

Abstract Conceptualisation

Figure 1: Kolb’s Learning Model

2.

A combination of Kolb’s dimensions formed the four learning styles: Diverging, Assimilating, Converging and Accommodating [Kolb, 1976; Kolb, 1985]. Other learning models were an adaptation to Kolb’s learning styles included Visual, Auditory, and Kinaesthetic [Reid, 1987; Felder & Silverman, 1988]. Visual learners learn the best when they see pictures, flow charts, etc., audio learners learn best when they hear and kinaesthetic learners learn when they touch.

LEARNING IN THIS ERA

The human brain’s capacity to learn and establish new memories is one of its most fascinating properties. The way we learn differs from person to person. Esach [2007] proposed to distinguish three types of learning: formal, informal and non-formal. The terms formal, informal and non-formal learning have nothing to do with the formality of the learning. Instead they have to do with the direction of who controls the learning objectives and goals.

Personalized storyboarding utilizes Content Based Multimedia Retrieval (CBMR) and can cater to any learning type whether it is Diverging, Assimilating, Converging, Accommodating, Visual or Auditory. Personalized storyboard multimedia learning helps the user dictate the learning process in a creative way through multiple means of representation, expression and engagement. This process involves active learning, both behavioural as well as cognitive. Learning is constructive, and information learned is remembered at a deeper level. The use of multimedia promotes meaningful learning that can be transferred or generalized to other situations.

Formal learning refers to education or training provided by an institution in a structured way in terms of learning objectives, learning time and learning support. Formal learning is typically teacher led, it is evaluated and is sequential [Esach, 2007]. In formal learning, books are the traditional source of information. Though books are extremely reliable sources of information, they are not interactive. Non-formal learning occurs in a planned but highly adaptable setting [Esach, 2007]. If the organization (other than the training department) sets the learning goals and objectives, then it is referred to as non-formal learning (Hanley, 2008). Informal learning is unstructured and applies to situations in life that come about spontaneously. Informal learning is voluntary and is generally reflected in what a person is reading, viewing and listening to, and also in his or her hobbies and social life (Maarschalk, 1988). In informal learning process, the learner is motivated intrinsically (Csikszentmihalya and Hermanson, 1995) and determines the path taken to acquire the desired knowledge, skill, or abilities.

In the 21st century, information is widely available on the internet. People rarely rely on books, newspapers, and libraries for information. Advances in multimedia technologies accompanied by an overwhelming usage of internet in the past few years has resulted in an explosion of multimedia information (text, images, videos, audios, etc.) being uploaded on the Web (internet) every day [Kidambi & Narayanan, 2008]. Most of the information is not readily accessible as the information is not organized to allow efficient browsing, searching and retrieval [Kidambi, Fendley & Narayanan, 2011; Kidambi, Narayanan & Fendley, 2010]. In other words, the internet does not have any quality control and is like a vast uncataloged library. The current era is facing the problem of multimedia overload. Multimedia overload can be defined as the condition where the available multimedia information far over-strips the user’s ability to assimilate or a condition where the user is inundated with a deluge of information – some of which may be irrelevant to

Good & Brophy (1990) have defined incidental and intentional learning. In intentional learning, the goals and objectives of what to learn and how to learn are defined while in incidental learning, the person loses focus and picks up in the learning environment without specific goals and in the process probably discovers. While formal and intentional learning strategies use books as the major source of knowledge, non-formal, informal and intentional learning strategies use various multimedia 527

IFAC HMS 2013 August 11-15, 2013. Las Vegas, USA

technologies, internet, social interactions as well as peer to peer interactions. In recent times, Khan Academy is trying to revolutionize the way students learn. Khan academy is a non-profit organization that is trying to provide free online education by using over 2800 video lectures which are delivered using various multimedia technologies including videos and animation (Thompson, 2011). As the world is moving towards using various multimedia technologies in classrooms for learning, we envision a multimedia storyboarding that stems from the fact that it is much more interactive and each person dictates the pace and the methodology medium they intends to learn. 3.

Our previous benchmarking studies [Kidambi, Fendley & Narayanan, 2011; Kidambi, Narayanan & Fendley, 2010] have suggested that overall, commercial search engines continue to have significant difficulties effectively executing multimedia retrieval tasks. The Google search engine performs significantly better than Yahoo or MSN Live for any query type: unique, non-unique, unique with refiner and non-unique with refiner. The results also indicate that the precision of the search engines tended to drop with the increase in the number of retrievals. This performance reduction was noted across-the-board, that is, irrespective of the search engines and the query types. The performance of the search engines also dropped dramatically when the queries had refiners (unique or non-unique).

PERSONALIZED MULTIMEDIA STORYBOARDING

The other methodology to search for multimedia is Content Based Multimedia Retrieval (CBMR) which searches for the multimedia based on the multimedia features such as colour, texture, shape and sound is still at a nascent stage and has not been incorporated to its full capacity in the commercial search engines. This research addresses the issue of the multimedia retrieval problem by systematically coupling the ABMR and the CBMR algorithms and uses the human input wherever needed to reduce the semantic gap. Semantic gap is the gap between the low level multimedia features and the high level textual features (manually annotated using human perception) [Liu, Zhang, Lu, Ma, 2007]. The key research question addressed by this study is whether a human integrated approach helps in better multimedia retrieval for personalized storyboarding.

A storyboard is a narrative of audio, text, video and images of a particular topic that are linked systematically. Storyboards are traditionally used in TV or Radio news programming where the product development time is short. Storyboarding plays an important role in the media houses where it provides everyone with a common point of reference to verify and validate the structural and the content elements. The concept of storyboarding can be extended to learning. Generally, any topic that a person wants to learn consists of one or more multimedia. People have different abilities to understand a story/topic according to their experience, age, knowledge, gender etc. Personalized storyboard multimedia learning help the user dictate the learning process in a creative way through multiple means of representation, expression and engagement. We have come up with a template for storyboarding that aids in leaning. The content for the storyboard will be generated utilizing Content Based Multimedia Retrieval (CBMR) techniques discussed in the next section of the paper. We believe that technology when used effectively and for the right purpose can enhance the learning process for students in this era of technological advancements. 4.

Human reasoning is an integral part of annotation based multimedia retrieval and this is the fundamental difference between content based multimedia retrieval (CBMR) and the annotation based multimedia retrieval (ABMR). While CBMR automatically extracts low level features like colour, texture, shape, and sound, ABMR uses high level features such as keywords, text descriptors and the semantics involved with these. In general there is no direct link between the low level visual / audio features and the high level text features. This gap between the low level multimedia features and the high level semantic concepts of the multimedia file is termed the multimedia semantic gap. Multimedia semantic gap can also be defined as the lack of coincidence between the information a computer algorithm can extract from the visual / audio data and the interpretation that the same data a human can extract in a given situation.

MULTIMEDIA STORYBOARDING UTILIZING CBMR

Millions of multimedia files are being uploaded on the World Wide Web every day by a wide range of users. According to 2011 Internet Statistics [Pingdom, 2011], there are 555 million websites, more than 1 trillion videos on YouTube alone, and more than 10 trillion images on the web. Most of this information can be used for personalized storyboarding. In personalized multimedia storyboard learning “Search” plays a major role. Searching for relevant information in any form including text, audio, image and video is known as Multimedia information retrieval. Current major commercial search engines utilize a process known as Annotation Based Multimedia Retrieval (ABMR) to execute search requests focused on retrieving any multimedia information. The ABMR technique primarily relies on textual information associated with a multimedia (labels/annotations to images, videos and audios) to complete the search and retrieval process.

This research responds to this limitation of the multimedia semantic gap by proposing to integrate the primitive multimedia features with text keywords that can overcome a part of the semantic gap and bridging the other half of the semantic gap i.e. the distance between the object annotations and the high level reasoning by Human Integration. 5.

HUMAN INTERGRATED APPROACH

Figure 2 represents the information system of our Human Integrated Approach Content Based Multimedia Retrieval for Personalized Storyboarding Architecture. Specifically this 528

IFAC HMS 2013 August 11-15, 2013. Las Vegas, USA

architecture aims to reduce the semantic gap between the ABMR and the CBMR techniques to retrieve relevant results during the storyboarding process. The approach of this research is showcased using game of Cricket database (www.cricinfo.com and related web sites). The game of cricket has thousands of multimedia files on the web. This multimedia includes images of batsmen, bowlers, fielders, cups, stadiums etc. Even though it is a confined database, the multimedia files in this domain are diverse in nature. Some of the blocks of figure 2 have been explained here.

Eliminate Noisy Information (Advertisements & Useless Links)

Web Crawler (Extract Related WebPages)

comprehensive annotation to all the multimedia files. As previously stated the current ABMR and CBMR techniques currently are far from our needs in terms of precision and recall. The performance of the current commercial search engines can only be improved by a disciplined annotation approach by domain experts. With the number of multimedia files available on the internet growing exponentially, the human is incapable of annotating all these multimedia files in a systematic manner. Computer Vision algorithms can potentially be used to alleviate the load of the human operator for annotating the images. These algorithms can annotate images for content dependent metadata, but as they are still in their infancy, they fail when annotations requiring content descriptive metadata are required.

Identify Informative Sections (multimedia & keywords)

Clearly, the human’s cognitive abilities have to be better tuned for the person to annotate the multimedia files in a more systematic way which may improve the performance of the search engines, i.e., there needs to be a more systematic way to annotate these files to improve the performance of the search engines. Our previous studies on using a semiautomated annotation framework that provides a systematic metadata template to improve upon the negative aspects associated with manual annotation has helped improve the precision of the search engines. The use of templates will incorporate human expertise to capitalize on the strengths of manual annotation that we have shown with using domain experts while avoiding the out-of-the-loop performance problems that occur when such a system is completely automated.

Eliminate Duplicate Multimedia Files

Automatic Structured Template Generation from KeyWords

DATABASE IAV TEXT Image/ Audio/ Video + Annotations

Text

Domain Expert Input

Comprehensive Template from Annotations & Multimedia Features

Exhaustive Annotation (Computerized Algorithms Input + Human Input)

Automatic Structured Template Generation from Content Based Multimedia Algorithms

Text

Indexing Iterative Searching Similarity Ranking

Personalized Storyboard

An example metadata template and visual cue template (using CBIR and CVIR) for the game of cricket is illustrated in figure 3.

Query Case

Using Computer vision algorithms to fill the content dependent metadata in the template was another aspect of our research. The Human was brought into the loop to validate not only the metadata template but also the template that is created from the visual/audio features of the image by the Computer Vision algorithms to create a highly comprehensive multimedia search database.

Figure 2: Human Integrated Approach Content Based Multimedia Retrieval for Personalized Storyboarding Architecture

At the database block, the database was split into two individual databases 

The text database consists of any text information present on a webpage



The image/audio/video (IAV) database consists of all the images, videos and audios and their annotations present if any

The text database does not need any additional processing and is available for search and retrieval. Annotation for all the multimedia files in the IAV database are provided by Content Based Multimedia Retrieval Algorithms and validated by a human domain expert in the game of cricket. The human can either add additional information or remove incorrect annotation to provide a 529

IFAC HMS 2013 August 11-15, 2013. Las Vegas, USA

Content Based Image/Video Retrieval Template – Algorithm Matching

Content Feature

Perceptual Primitives

Color & Texture

Geometric Primitives

Shape

Contextual Abstraction

Color/Grayscale

Technical Abstraction

Face Recognition

Level 3: Retrieval by high level reasoning

Gap 1

Content Based Multimedia Algorithms Template

Semantic Gap

Level 2: Retrieval by degree of logical inference

Semantic Gap

Human Reasoning

Gap 2

Template Feature

Current CBMR Systems

Level 1: Retrieval by primitive features

Metadata Template – Feature Matching

Figure 4: Bridging the Semantic Gap Template Feature

Metadata Feature

Specific Object Named Class

Nationality of the Players

Specific Object Named Instance

Name of the Players

Specific Location

Location of the game

Specific Time

Time of the game

Generic Activity

Activity of the players

Specific Activity

Game Format

The performance enhancement of the search engines will be achieved by systematically coupling annotations generated by the content of the multimedia with the annotations geared from a human perspective. Our architecture aids the human by generating templates which helps the human to annotate in a much more efficient manner and has also filled the gap in the existing literature by developing a framework that systematically reduces the semantic gap between the high level textual features and low level image/video/audio features. Storyboarding Process When a person wants to learn a topic of their interest, they enter a keyword in the Search Query box as illustrated in the figure 5. The storyboard search engine retrieves text, images, audios and videos for the search topic. The person can go through any or all media of their choice to learn according to their learning style. The person can also choose to provide relevance feedback on the retrieved results.

Figure 3: Example CBIR/CVIR and Metadata Templates The use of the templates definitely helps to reduce the semantic gap. Even when using templates, visual/audio features can be interpreted as generic objects to which labels might be applied (by means of automatic annotations using visual/audio cue template generation) which is the gap between level 1 and level 2 (Gap 1) as illustrated in Figure 4, it still doesn’t fill the other half of the gap (Gap 2) i.e. the distance between the object annotations and the high level reasoning (the gap between the level 2 and level 3 of image retrieval). To bridge the gap between level 2 and level 3, human reasoning has been utilized. The use of templates aids the human to fill in the information needed for this domain, thereby creating relevant annotation needed for the multimedia.

The content in the text box in the figure 5 is being filled by the data from the TEXT database in our architecture, while the content in the Image, Audio and Video boxes is being filled by the data in the IAV database. Search Query

R Text R Audio

What is Cricket

Text

R Image R Video

Images

Cricket is a bat and ball game which involves two teams with 11 players on each side. The captain who wins the toss decides whether his team bats or bowls first. If they bat first , their aim is to score a lot of runs and make sure the other team does not reach that score. Cricket is played in many formats , but the most popular are Test Cricket, One Day Cricket and Twenty 20 Cricket. In Test cricket game goes on for 5 days, with each team batting twice - if time permits. In One Day cricket each team getting 300 balls to score runs and the other team tries to outscore them within the same number of balls. In Twenty 20 Cricket which is the most popular format of the game, each team getting 120 balls to score runs and the other team tries to outscore them within the same number of balls. Cricket is the second most popular sport in the world behind soccer. Read More Relevance Feedback

See More

ü û

Audio’s

Relevance Feedback

ü û

Video’s

Game of Cricket

What is Cricket

History of Cricket See More

Relevance Feedback

ü û

See More

Relevance Feedback

ü û

Figure 5: Personalized Storyboard using our Human Integrated Technique

530

IFAC HMS 2013 August 11-15, 2013. Las Vegas, USA

If the person does not want any of the four media for leaning, he can choose to uncheck the unwanted media from the storyboard. An example where the person unchecks the audio is shown in figure 6. Search Query

R Text c Audio

What is Cricket

Text

7. REFERENCES Csikszentmihalya, M., & Hermanson, K. (1995). Intrinsic motivation in museums: What makes visitors want to learn?, Museum News, Vol. 74.

R Image R Video

Curry L. (1981). Learning preferences in continuing medical education. Canadian Med Assoc J., Vol.124, pp.535536.

Images

Cricket is a bat and ball game which involves two teams with 11 players on each side. The captain who wins the toss decides whether his team bats or bowls first. If they bat first , their aim is to score a lot of runs and make sure the other team does not reach that score. Cricket is played in many formats , but the most popular are Test Cricket, One Day Cricket and Twenty 20 Cricket. In Test cricket game goes on for 5 days, with each team batting twice - if time permits. In One Day cricket each team getting 300 balls to score runs and the other team tries to outscore them within the same number of balls. In Twenty 20 Cricket which is the most popular format of the game, each team getting 120 balls to score runs and the other team tries to outscore them within the same number of balls. Cricket is the second most popular sport in the world behind soccer. A cricket match is divided into periods called innings. During an innings, one team fields and the other bats. The two teams switch between fielding and batting after each innings. All eleven members of the fielding team take the field, but only two members of the batting team are on the field at any given time. The two batsmen face each other at opposite ends of the pitch, each behind a line on the pitch known as a crease. The fielding team's eleven members stand outside the pitch, spread out across the field. Behind each batsman is a target called a wicket. One designated member of the fielding team, called the bowler, is given a ball, and attempts to send the ball from one end of the pitch to the wicket behind the batsman on the other side of the pitch. The batsman tries to prevent the ball from hitting the wicket by striking the ball with a bat. If the bowler succeeds in hitting the wicket, or if the ball, after being struck by the batsman, is caught by the fielding team before it touches the ground, the batsman is dismissed. A dismissed batsman must leave the field, to be replaced by another batsman from the batting team. Read More Relevance Feedback

ü û

Esach, H. (2007). Bridging In-school and Out-of-school Learning: Formal, Non-Formal, and Informal Education. Journal of Science Education and Technology, Vol. 16(2).

See More Relevance Feedback

ü û

Video’s

Felder & Silverman (1988). Learning and Teaching Styles in Engineering Education. Engr. Education, Vol. 78(7), pp. 674-681. See More

Relevance Feedback

Good, T. & Brophy, J. (1990). Educational Psychology: A realistic approach. New York: Holt, Rinehart, & Winston.

ü û

Figure 6: Example Customization of the Storyboard

Hanley M. (2008). Introduction to Non-formal Learning. ELearning Curve Blog. Retrieved January 30, 2012, http://michaelhanley.ie/elearningcurve/introductionto-non-formal-learning-2/2008/01/28/

This architecture is a first of its kind. Our foray into utilizing Content Based Multimedia Retrieval for personalized storyboarding will revolutionize the way people learn. This technology is not limited by the learning styles of people as it caters to any learning style. As we clearly see, the learning process is being dictated by the user which makes our architecture much more powerful. 6.

Kidambi P., Fendely M., Narayanan S. (2011). Framework for Improving Annotation-Based Image Retrieval Performance. Journal of Multimedia Processing and Technologies, vol 2(1), pp 9-25.

CONCLUSION

Kidambi P., Fendely M., Narayanan S. (2011). Performance of Annotation-Based Image Retrieval. Communications in Computer and Information Science, Springer, pp 251-269, ISBN: 978-3-64222184-2.

This research is a work in progress. We aim to build large repository of various databases to enhance learning using personalized storyboard using multimedia techniques to cater to a person with any learning style. This research framework that utilizes human integration towards content based multimedia storyboarding can be extended to multiple domains. The approach has the potential to enhance the learning process by allowing personalization and being interactive. The algorithm development in Computer Vision and Audio Processing is a growing field. Some of the algorithms used in this research for the Content Based Multimedia Retrieval can be updated with newer algorithms in the future. The architecture developed in this research is modular and hence readily supports algorithm enhancement. The modular structure also eases the way to use the distributed computing architecture for faster processing of the images. The first initiative is to expand the data sets particularly in the area of medical images and surveillance images. The Content Based Multimedia Retrieval algorithms will be developed and replaced with the current ones in the framework to support these two areas.

Kidambi P., Narayanan S., Fendley R. (2010). Benchmarking Web Based Image Retrieval. 16th Americas Conference on Information Systems, Lima, Peru. Kidambi P., Narayanan. S. (2008). A Human Computer Integrated Approach for Content Based Image Retrieval. Proc. Of 12th WSEAS International Conference on Computers, Heraklion, Greece. Kolb, D.A. (1976). The Learning Styles Inventory: Technical Manual. Boston: McBer & Company. Kolb, D.A. (1985). The Learning Styles Inventory and Technical Manual. Boston: McBer & Company. Liu Y., Zhang D., Lu G., Ma W. (2007). A survey of content-based image retrieval with high-level semantics. Pattern Recognition,Vol.40(1), pp. 262282. Maarschalk, J. (1988). Scientific literacy and informal science teaching. Journal of Research in Science Teaching, Vol. 25(2), pp.135– 146.

531

IFAC HMS 2013 August 11-15, 2013. Las Vegas, USA

Pingdom (2011). Internet 2011 in Numbers. Blog. Retrieved January 30, 2012, http://royal.pingdom.com/2012/01/17/internet-2011in-numbers/ Reid, J.M. (1987). The Learning Style Preferences of ESL Students. TESOL Quarterly Vol. 21(1), pp. 87-111. Thompson (2011). How Khan academy is changing the rules of Education. Wired.

532