Creating video art with evolutionary algorithms

Creating video art with evolutionary algorithms

ARTICLE IN PRESS Computers & Graphics 31 (2007) 837–847 www.elsevier.com/locate/cag Technology and Digital Art Creating video art with evolutionary...

1MB Sizes 0 Downloads 43 Views

ARTICLE IN PRESS

Computers & Graphics 31 (2007) 837–847 www.elsevier.com/locate/cag

Technology and Digital Art

Creating video art with evolutionary algorithms Teresa Chambela,, Luı´ s Correiaa, Joˆnatas Manzollib, Gonc- alo Dias Miguela, Nuno A.C. Henriquesc, Nuno Correiac a

Faculty of Sciences, University of Lisbon, 1749-016 Lisbon, Portugal b Unicamp, Campinas State University, Campinas, Brazil c Faculty of Sciences and Technology, New University of Lisbon, 2829-516 Caparica, Portugal

Abstract The boundaries of art are subjective, but the impetus for art is often associated with creativity, regarded with wonder and admiration along human history. Most interesting activities and their products are a result of creativity. The main goal of our approach is to explore new creative ways of editing and producing videos, using evolutionary algorithms. A creative evolutionary system makes use of evolutionary computation operators and properties and is designed to aid our own creative processes, and to generate results to problems that traditionally required creative people to solve. Our system is able to generate new videos or to help a user in doing so. New video sequences are combined and selected, based on their characteristics represented as video annotations, either by defining criteria or by interactively performing selections in the evolving population of video clips, in forms that can reflect editing styles. With evolving video, the clips can be explored through emergent narratives and aesthetics in ways that may reveal or inspire creativity in digital art. r 2007 Elsevier Ltd. All rights reserved. Keywords: Video editing; Digital art; Creativity; Genetic algorithms; Evolutionary systems

1. Introduction Digital art is developing and becoming more common, changing our perception of what art is and what it will become. Not only have traditional forms of art been transformed by digital techniques and media, but entirely new forms have emerged as recognized practices. Computers may enhance visual art through ease of rendering, capturing, editing and exploring multiple compositions, supporting the creative process. Artists express their creativity in ways intended to engage the audience’s aesthetic sensibilities or to stimulate mind and spirit, sometimes in unconventional ways. Creativity has been said to consist largely of re-arranging what we know in order to find out what we do not know [1]. Often implied in the notion of creativity is also the presence of inspiration, cognitive leaps, originality and appropriateness. Some have emphasized an element of Corresponding author. Tel.: +351 217 500 611; fax: +351 217 500 084.

E-mail address: [email protected] (T. Chambel). 0097-8493/$ - see front matter r 2007 Elsevier Ltd. All rights reserved. doi:10.1016/j.cag.2007.08.004

chance in the creative process, and that one must endeavour to come up with many ideas, and then discard the useless ones. In both nature and computer science, evolution provides novelty, showing us new and subtly different solutions in every generation. In the presence of many unexpected and unlikely solutions, users are forced to react, change their minds, and explore new possibilities [2]. This way, evolutionary systems can enhance the creativity of people. Evolutionary systems may also provide automatic solutions, although it may be harder to automate the judgement of more subjective properties, like aesthetic preferences. The main goal of our approach is to find new and creative ways of editing and producing videos, using evolutionary algorithms based on selection criteria determined by the user. This can be done either by defining criteria to be applied automatically or by interactively performing selections in the evolving population of video clips. Video segments are previously annotated with metadata, which characterizes video properties. Genetic

ARTICLE IN PRESS 838

T. Chambel et al. / Computers & Graphics 31 (2007) 837–847

operators use this metadata and video editing techniques in the evolutionary process. The work presented in this paper builds on previous work [3,4], as detailed in Sections 5 and 6. The evolutionary model has become more flexible, the system has increased functionalities, and the use of creative video editing is further explored as a process to support the production of digital video art. Next section discusses the role, main defining properties and challenges concerning video in the context of digital art. Section 3 introduces concepts of creativity and how it can be supported by evolutionary approaches. Section 4 overviews related work. The Section 5 describes details of the proposed evolutionary model for video editing. Next, the developed MovieGene prototype and video editing examples are presented. The paper concludes with a discussion of the approach, results and future work perspectives. 2. Video in digital art There are different definitions of art but most of them refer to an intention of stimulating human senses, mind or spirit, through the conscious use of skill and creative imagination [5,6]. Main art attributes include the impact it has on people, the number of people that can relate to it, the degree of their appreciation and the effect or influence it has or had in the past. And paradoxically, an object may be characterized by the intentions of its creator, regardless of its apparent purpose, as fine art or merely as craft. Art was primarily concerned with concepts of truth and beauty, but since early 1900s, with modernism, it was associated with the use of methods of the discipline to criticize the discipline itself. Modern artists had new ideas about the nature, materials and functions of art, often moving towards abstraction. Postmodern art in mid-1900s appears after or in contradiction to some aspect of modernism. Installation art, conceptual art, intermedia and multimedia, particularly involving video, are described as postmodern. Digital technology has had a major impact on the production and experience of art during the past two decades. It has been used either as the means to produce traditional forms in new ways or as a medium to create new types of art [7]. Digital art is art created in digital form, but the term is often reserved for art that has been non-trivially modified by a computing process, in a wide variety of approaches, contexts and experiences including: interactivity, non-linearity, multimedia, virtual and augmented reality, telepresence, net communications, databases and 3D visualization. Media artists are using an increasingly complex set of digital tools and devices and much of this work has considerable implications in the way we experience art, challenging assumptions about the nature of art itself [8]. One of the main concerns, when computer mediated art was first shown publicly in the mid 1960s, was ‘‘but is this art?’’ [9]. Next came concerns about the issue of originality

and the role of technology in the art making process, and then the issue of the separation of the creative act from the processes involved in realizing expression ‘‘should artists control every aspect of the creation of their work?’’, and the protection of property rights concern. These issues are frequently debated by the artists, critics and viewers and become even more interesting as interactivityrelated concepts are involved. When viewers become participants in the art making process, the answer to these questions becomes more complex and even more pertinent to adding to our understanding of digital expression. The distinction between viewer and creator is evolving. Prince [9] defends that time and history will answer these questions and add some more, and also that society, the art world, expression and the means to realize expression will change but that the role of the computer in the new literacy is certain. Video art relies on moving images comprised of video and audio, and made its first moves during the 1960s and 70s [10–12]. Although related, it is not the same as experimental cinema, and usually it is not even considered film. It does not necessarily rely on most conventions underlying theatrical cinema in what concerns the use of actors, dialogues, narrative or plot and many of the conventions that define entertainment films. Video art has a broader goal spanning from exploring the boundaries of the medium itself to attacking the viewer’s expectations of video shaped by conventional cinema. One of the most influential early video artists was Nam June Paik [13] who started to use portable cameras and TV sets in some artistic works. Accessibility enhanced by instant playback offered by video technologies and the support of video editing capability, made video more appealing than film to artists. Many of the early video artists were involved in conceptual art, performance and experimental film, others were interested in the formal qualities of video and employed video synthesizers to create abstract works. For instance, Bill Viola, one of the most notable video artists, used video in a quite different way from film. His work, including the usage of ultra slow motion, questions time and how it is commonly used, e.g. in commercial TV. Recent video art works include entirely digitally rendered environments and video that responds to viewers’ movements or other properties of the environment. Video can be presented in a single screen, or in an installation involving an environment including several pieces of video presented separately or in a combination with traditional media like sculpture, at the crossroads of other disciplines such as architecture, design and digital art. Artists such as Matthew Barney have explored these boundaries, in the filmic structures that are used and their relations with physical spaces, while presenting metaphoric universes. Traditional filmic structures are employed, to convey desired meaning, when necessary, but most of the work has a multilayered texture that defies linear narrative conventions.

ARTICLE IN PRESS T. Chambel et al. / Computers & Graphics 31 (2007) 837–847

3. Creativity and evolution Most of the things that are interesting, important and human are results of creativity [14]. However, creative ideas vanish unless there is a receptive audience to record and implement them, and the assessment of competent outsiders as a reliable way to decide whether the claims of a self-styled creative person are valid. Creativity is thus a systemic rather than an individual phenomenon, depending on the interaction between a person’s thoughts and a sociocultural context. Creativity results from the interaction of a system composed of three elements: a culture that contains symbolic rules; a person who brings novelty into the symbolic domain and a field of experts who recognize and validate the innovation. Creativity is the cultural equivalent of the process of genetic changes that result in biological evolution [14], where random variations take place in the chemistry of our chromosomes, below the threshold of consciousness. Most new traits do not improve survival chances and may disappear after a few generations. But a few do otherwise, and these are the ones that account for biological evolution. In cultural evolution, there are no mechanisms equivalent to genes and chromosomes. A new idea or creation is not automatically passed on to the next generation. Each child has to learn them again from the start. The cultural equivalent to a gene in evolution is the meme, a unit of information that we must learn if culture is to continue. It is these memes that creative people change, and if enough people see the change as an improvement, it will become part of the culture [14]. A creative evolutionary system makes use of some aspects of evolutionary computation and is designed to: (1) aid our own creative processes and (2) generate results to problems that traditionally required creative people to find solutions for, so they might also appear to act creatively. It may find innovative and novel solutions or combine different ideas to make something new. In both nature and computer science, evolution provides us with novelty, showing us new and subtly different solutions in every generation [2]. Then it is important to evaluate and select these novel solutions according to some established criteria. This process can be automatically performed, if it is easy to specify and compute. Otherwise, human judgment is required. Human interaction and judgment contributes to the evolutionary process, through interactive or collaborative algorithms. The advantages of this interactivity include: a good searching ability by guiding or forcing new alternatives; a wide range of different solutions explored and the ability to evolve solutions for which there is no clear fitness function, due to highly variable or subjective objectives. On the other hand, arguments in favor of automatic evolutionary processes include: speed, automatic evaluation is faster; consistency in the application of criteria and coverage, easier to evaluate large populations through many generations.

839

These approaches can be used in alternative or simultaneously. Evolutionary systems are not intended to replace people, but to allow them to explore a wider variety of solutions, increasing their productivity and creativity. The combination of evolution and human judgment enables a richer and more diverse kind of creativity. It is the very unpredictability of evolution that helps to stimulate the creativity of people. 4. Related work There has been some work done in creative and artistic evolutionary systems mainly in design, images, painting and music. Video has hardly been addressed in this context. First forays into explorative and generative evolutionary design were made by architects [15]. Design criteria are usually very complex, and our preferences and requirements change. People have different tastes, varying along time, influenced by factors such as fashion. This is particularly true in fields like architecture, where designs are revised and modified several times, until clients are satisfied. Explorative evolution can create novel floor plans that satisfy many constraints and objectives. It can also learn to create buildings in the style of well-known architects [16,17]. Sims explored evolution in computer graphics [18], proposing artificial evolution for the production of 3D plant structures and animation to create forms, textures and motions not bound by a fixed space of possible results. In Ref. [19], an artistic experiment is described. Evolutionary methods are used to increase and control recognizability of Picasso’s woman portraits. GenJam [20] is an evolutionary system for Jazz music, an improviser in the authors’ words. Vox Populi [21] also explores evolution in music composition, where the user can change music evolution through a graphical user interface. This system was further developed into ArTbitrating JaVox [22], an environment for artistic production in visual and sound domains, upgrading aesthetical judgment through interactive evolutionary computation techniques. As a case study, Kandinsky’s like objects based on geometric shapes are created and evolved. Relationships between visual features of images and sound are also explored. Although not from an evolutionary perspective, artistic projects exist proposing new ways of making films. Soft Cinema [23] follows an interesting approach, by selecting which video clips and animations will play, in what order, and narrated by which voice, in new screens generated at the beginning of each segment of a story. Its database contains a collection of short movies in different styles. A story is typically divided into a number of sequential parts, as short movies. The concept of rhetorical patterns has been introduced in Ref. [24] to support an approach to the adaptive composition of video documentaries, based on rules for dynamic selection, sequencing and composition of video shots. More recently, realtime audiovisual performances have emerged under the designations of Live

ARTICLE IN PRESS 840

T. Chambel et al. / Computers & Graphics 31 (2007) 837–847

Cinema and VideoJockeying (VJing) [25,26]. In these approaches, there is human intervention in choosing the techniques to apply in montage and visual effects, and realtime editing is a mandatory aspect. In an evolutionary approach, very recently, interactive genetic algorithms were used by Yoo and Cho [27] to retrieve video scenes based on emotions. They focus in scene retrieval, but not in creating new video sequences nor in exploring artistic aspects. Other approaches for sequencing adopt Markov chains. A Markov chain takes a sequence of events, and analyzes the tendency of one event to be followed by another. It is possible to generate a new sequence of events, which will look similar to the original, based on a stochastic process. In visual domain, a compact descriptor of video contents based on modeling the temporal behavior of image features using coupled Markov chains was proposed by Sanchez et al. [28]. They showed that complex high-level visual contents can be characterized using very simple low-level features, such as motion and color. A similar approach by Chang et al. [29], uses image features and domain knowledge to define a system capable of detecting highlights in a baseball game video. In both cases, the Markov models are used only to support detection and classification of image sequences. In the musical domain, however, Jones [30] studied the development of generative musical grammars based on Markov chain transition matrices. This approach is particularly interesting, because he showed that it is possible to develop a generative approach to control high-level music structures using Markov transition tables. 5. Evolutionary video editing model To evolve the video clips population, we need to define a representation and genetic operators to be used. It is very important to have an efficient representation that still enables evolution to explore a wide range of alternative solutions. From results previously obtained [4], we envisage two main ways of exploring evolutionary video. One is to develop the manipulation of genetic material at the video signal level; the other is to use high-level semantic descriptions amenable to induce the production, by evolution, of some sort of structure or story. This reflects different description levels of the videos: lexical, syntactic and semantic. Next, we describe the evolutionary model used. For further generic information on evolutionary algorithms please refer to Ref. [32].

Each gene in the chromosome is composed of a reference to a pre-shot video segment and additional information, as annotations (see Fig. 1). Annotation addresses both highlevel semantic information, usually defined by humans, and low-level audio–visual information, such as color histograms, that can be extracted automatically through video parsing and analysis. For annotation, we adopted a set of the most relevant descriptors from the MPEG-7 standard [31]. A segment’s annotation has five descriptors: its duration, color histogram, shooting distance (as close, medium or long), keywords and free text. Duration and color histograms are typically obtained by automatic processing of the video segment, but the remaining annotations are usually entered by humans. The MPEG7 descriptors used for each annotation, and represented in XML format in the gene, are: MediaDuration of MediaTime type, for segment duration, ShotDistance of CameraMotionType for shooting distance, ScalableColor of GoFGoPColorType for color histogram, and two text type descriptors, Keyword-Annotation and FreeTextAnnotation, for keywords and free text, respectively. At the syntactic level, chromosomes have a structure defined by the order of their genes, reflecting the video segment order in the individual. Different individuals can further be combined in a higher level structure to compose longer videos and convey more complex messages, for example in storytelling. The initial population is generated by composing chromosomes with randomly chosen video segments, or according to specific rules. One such rule is that the segments in each chromosome keep the order they had in the original video, if they are taken from the same video. Another rule is to satisfy specific criteria for each segment, defining a structure for the chromosomes. In addition, one can specify if segment repetition is allowed in the same chromosome. Other rules can be defined. The length of the chromosomes has to be pre-defined, either a fixed number or a randomly selected value between a minimum and a maximum number of genes. In this case, the system will work with variable length chromosomes.

5.1. Representation The genotype of individuals in the population is a coding of a video clip. The phenotype is created, or expressed, from information in the genotype, just when the video clip is to be exhibited, and it consists in rendering or playing the sequence of video segments that compose it.

Fig. 1. Example of an individual and its genome, a chromosome structure with three genes. This is the last individual represented in Fig. 5.

ARTICLE IN PRESS T. Chambel et al. / Computers & Graphics 31 (2007) 837–847

It should be noted that since the genes correspond to preshot video segments (and respective annotations), we maintain a library of all the genes, which constitutes the domain of the alleles. In fact, the chromosome representation is just a sequence of indexes pointing to the video segments in the library. The normal operation of the evolutionary algorithm does not need more than that. Only in the end, for playing the video clips, we need to generate the phenotype. There is a balance to be made regarding the evolutionary algorithm. To apply genetic operators to video involves a significant processing time and consequently a slow evolutionary process. An alternative approach is to code parameters of the operators in the genome and have a process of phenotype development to apply them to produce the actual video. On the other hand, this alternative requires that we change the coding whenever an operator produces a major structural modification. When the user is in the loop, the on-line operator processing seems to be the best choice. It makes do with a simpler coding and the processing times may be acceptable at the human scale. 5.2. Genetic operators In this model, the genetic operators used are the conventional selection, crossover and mutation. Selection can be used in a conventional way, through tournament of size two [32]. Mutation and crossover need to be described for this particular representation. Mutation can assume different forms. Some promote diversity, through exploratory approaches, others promote precision, through selection. On one hand, it can delete one gene, add a randomly chosen gene in a random position of the chromosome, or replace a gene by a randomly chosen one; on the other hand, it can select a gene with high fitness, or a gene that respects the original chromosome structure, if the initial population was structured. Again, gene repetition in the individuals can be allowed or disallowed. Mutation could also manipulate a gene. This manipulation will modify the video features of the segment, e.g.: color change, taking a cut of the segment or concatenation of two segments. Using a mutation variant that may increase the length of a chromosome carries the need of verifying if the chromosome does not exceed the maximum, or is below the minimum, admissible lengths. If that happens, the last gene addition, or deletion, is retracted and other mutation is chosen. Notice that these several options of the mutation operator can be chosen probabilistically. By defining different probability distributions for the possible mutation options, we may create an editing ‘‘style’’, which may be important for artistic purposes of video clip edition. The Crossover operator has to take into account the fact that the two parents may have different lengths. To handle this, the crossover point just has to be chosen within the

841

length of the shortest parent and then use the same point in the other parent. Elitism is also configurable in this model. The user may define the percentage of most fit individuals that go directly to the next generation. This allows maintenance of the best solutions, therefore eliminating possible disruptions in the evolutionary process. Specific genetic operators can be developed for manipulation of video signals and segments. Crossover operator can work with crossover points falling inside video segments, instead of between video segments. Reverse video or varying color pallets, quantization, decreased sampling rates, for instance, can introduce dramatic effects in video production. The genetic operator to perform such modifications is mutation and therefore several other variations of this operator are going to be developed. 5.3. Fitness function Evaluation of each individual is done by a fitness function which only takes into account annotations associated to each video segment in the genome. This allows a faster processing, since lengthy video data needs not to be processed to compute fitness. The fitness function is a weighted sum of partial fitnesses regarding each of the annotations in a gene: f ¼

G X

wi f i ,

(1)

i¼1

where G is the number of genes in a chromosome, fi is the fitness component corresponding to a specific descriptor in the annotations and wi is the corresponding weight. Each partial fitness function computes a distance between the respective descriptor value and a goal value defined by the user. The partial fitness value is maximum when the descriptor value equals the respective goal and decreases with increasing distance. In this way, the user may induce the evolutionary process to produce video clips with particular intended features (e.g. forest scenes, histogram with strong blue component). For text type components, we use a simple language processing procedure to compute word similarity between the chromosome and the goal. For shooting distance, a categorical component, since it is ordered, we compute the distance to the goal category. In the color histogram, we average the differences between histogram bins in the chromosome and the goal. Finally, the duration component is computed as a simple difference. All components are normalized for incorporation in Eq. (1). Other fitness functions can be defined. In particular, we have also developed interactive fitness, in which the user may select individuals to produce the next generation. This possibility is useful because it may be difficult to express mathematically (in a fitness function) artistic criteria. The interactive selection is a way to let the user have a direct control over the creative process. The computer is used to

ARTICLE IN PRESS 842

T. Chambel et al. / Computers & Graphics 31 (2007) 837–847

search for possibilities and the user role is to serve as the selection operator. At syntactic and semantic levels of specification, we may consider the intended story development of the resulting video from Movigene. Therefore, we must be able to express a coherent connexion of video segments, in order to produce a composition with beginning, development and ending. We are developing a generative grammar based on Markov chain transition matrices to produce video, in a way comparable to the musical composition method explored by Meigh-Andrews [30]. To our knowledge this is the first proposal to use Markov models to specify and generate video sequences. To this end, a model based on a set of Markov matrices, may form a grammar to control the concatenation of segments composing the video in different editing or artistic styles. At the moment, information about the video segments is a set of properties, defined as annotations. The Markov matrices can be obtained from samples of annotated videos or from a statistical analysis of a user creating a video. 5.4. New features The second version of MovieGene [4] allowed more flexibility than the initial one [3], by having independent genes, instead of a fixed position in a chromosome sequence. In the new model, a gene could be used in any position of the chromosome. Also, chromosomes could have variable length, and the initial population could be created in three different ways. This increased the available solutions for creation of new videos, in the initial population and through mutation in the evolutionary process. In the present work, the model is further refined with a stronger emphasis on the classification and structuring of video, reflected in the video annotations, individuals’ structure and composition; mutation was enriched with new operations; video sequencing and editing style are being enriched with the exploration of techniques based on Markov chains and the use of creative video editing is further explored as a process to support the production of digital video art, as its basic nature, and illustrated by new examples. Next section highlights the new features developed in the current version of the MovieGene system and presents the new examples.

6.1. System architecture MovieGene [3] is a client–server application with three major components (Fig. 2), providing a way to store, process and interact with the data: (1) Interface: the user interface provides interactive access to the evolving videos being edited. It is Web based, but can be used as a stand alone client application running in any Java aware platform (2) Application: the core application accesses the video MPEG-7 library containing the video descriptors in MPEG-7 format coded in XML files; the Java Media Framework providing low-level methods for media object access and editing and the ECJ, an evolutionary Java based library for the genetic algorithms module (3) Repository: contains the original and produced video documents, and respective media descriptors. 6.2. Evolutionary video editing Video editing is performed in a few stages, comprising the creation of the initial population, evolution of the population and video composition: (1) Creation of the initial population In this stage, the user selects criteria to generate the initial population, from the pool of video segments. This is the phase where the structural characteristics of the individuals are defined. At the moment, three criteria are used: random selection, not restricted; ordered selection, where video segments must respect a chronological order in relation to the original video they where selected from and structured selection, where more elaborate syntactic criteria can be defined for each segment. Some cinematic rules can be explored in this phase to set the common ground for the intended structure in individuals. In most

6. Movie gene The MovieGene system was developed to explore creative video editing based on evolutionary computation. It implements the conceptual model defined in the previous section and it is built in a way that allows a flexible extension of new features and its access through a Webbased interface. This section presents its architecture, the evolutionary video editing process and some examples.

Fig. 2. MovieGene architecture.

ARTICLE IN PRESS T. Chambel et al. / Computers & Graphics 31 (2007) 837–847

843

Fig. 3. MovieGene evolutionary interaction view and example from Quinta da Regaleira.

situations, this structure will be maintained (e.g. overture, development and closure), although some kinds of mutation might allow for gene order exchange. Another parameter defines if repetition is allowed in the segments of the same individual. The other parameters include: population size, the number of individuals in the population and individual size, the number of segments in each individual—three segments in the current implementation. (2) Evolving the population The user may introduce the intended characteristics for the final video—the goal—based on the video segments properties. This will define the fitness function. In the current version, two new features were included in this phase: the user may specify different weights for each one of the properties in the fitness function, allowing to make some properties more relevant than others in the evaluation; and the option to specify the goal ‘by example’, selecting a video segment for the example and having its properties defined as the goal for the evolving population. This way, the population will evolve towards similar videos. The user may also configure other genetic parameters, influencing how the population evolves along generations: crossover probability; mutation probability (gene replacement in the current version) and elitism percentage. These parameters define how likely it is for an individual to be crossed-over, mutated or selected for elitism. Another evolution option is defined, influencing how mutation will work, ranging from random selection (diversity) to the selection of a segment closer to the goal (precision). The number of generations may also be pre-defined. When para-

meters are set, the evolutionary loop begins. Fig. 3 depicts the main interface window for the evolutionary interaction. When the Go button is pressed, the evolving population is presented after each generation. In Fig. 3, we see a population of five individuals, with three segments each. The calculated fitness value is presented on top of each individual. The user may choose to play/pause each individual, providing a preview for better examination and choice. The user can eliminate (X button) one or more individuals in each population, influencing the next generations and thus the evolutionary process. In the current version, the user can also select one individual as the best (B button in the examples of Figs. 4 and 5) halting the evolution and presenting this best individual as the final generation. When the Best button (next to the Go button) is pressed, the evolutionary process is performed without user intervention, and the final best solution obtained with the pre-defined criteria and parameters is presented, just like in Fig. 3, but with the final solution— a single individual. At any time, the process can also be restarted. (3) Video composition The aim of this option is to produce larger and more complex videos in a more structured way, at their higher syntactic level. Scenes might be obtained in the process described, and further collected and combined in this composition view. Cinematographic rules may assist in the definition and evaluation of criteria at this level, influencing criteria for the selection and refinement of individual scenes. Smaller or simpler videos can have all their creation and evolution done based on the

ARTICLE IN PRESS 844

T. Chambel et al. / Computers & Graphics 31 (2007) 837–847

Fig. 4. MovieGene example. Portuguese and Brazilian music: singers in close-ups.

Fig. 5. MovieGene example. Turtles in blue.

process described so far, at the individual syntactic level, but this assisted composition can benefit more complex scenarios.

At the current state, there is a fixed number of sequential scenes and each one, in turn, will be generated or refined with the evolutionary process

ARTICLE IN PRESS T. Chambel et al. / Computers & Graphics 31 (2007) 837–847

described. This process is being sophisticated with the adoption of Markov chains defining processes that assist the creation of video sequences in different narrative or artistic styles. The resulting video can be played at any time, in this interaction, as a sequence of the video scenes, and it can also be assembled and stored in a file to be played independently. Stages 1 and 3 were introduced in the second version of MovieGene [4]. Also since this version, individuals can be viewed in a more flexible way showing all their segments, in stage 2. Mutations may also involve changing individual segments, in accordance with the extended model. Due to this increased focus in structural definition of the initial population and the composition stage, this extended system is sometimes referred to as MovieComposer.

6.3. Examples In this section, some examples are presented, stressing different aspects of MovieGene and the evolutionary process. In the example presented in Fig. 3, we generated the population using random selection of three segments per individual in a population of five individuals with no repetition allowed, based on 21 short video segments from Quinta da Regaleira, Sintra, Portugal. These include mainly shots of landscapes within a forest, monuments and water courses. Selection criteria allowed to explore different combinations of presenting main monuments in harmonious sequences concerning, for example camera distances and color. In spite of the limited number of segments, the search space in this scenario amounts to approximately 9000 individuals. A broader objective in this line includes the use of videos from Brazil and Portugal to explore their architectonic, historical and cultural relations. These relations can be explored and discovered through emergent narratives and aesthetics in ways that may inspire creativity and learning about these countries, their culture, their similarities and contrasts. The setting of the initial population defines the main structure of the videos (e.g. alternate between countries), while genetic operators and criteria, combined with user interaction, influence the flow of exploration. As an example, in Fig. 4, the initial population, of four individuals with three segments each, was created with a structured selection where the first segment is about music, the second about Portugal and the third about Brazil. This structure is maintained along generations. The goal was defined as ‘‘music’’ and ‘‘singer’’ for high-level semantic keywords and the value ‘‘short’’ (close-up) for the low-level camera distance parameter, in a pool of Portuguese and Brazilian performances. Thus, we promoted videos zoomed into musicians large plans. As a result, we are presented with different individuals (videos) about music, Portugal

845

and Brazil in sequence. Note that in some individuals, the third segment is about Brazil but not music, e.g. last individual in the second generation presented. This is compliant with the chromosome structure, but not aligned with the goal, resulting in a lower fitness value. Mutations were made by randomly replacing a gene segment, maintaining the individuals’ structure. A value of 25% was used for elitism, which prevented from loosing best individuals by crossover and mutation between some of the consecutive generations. Choosing to watch all the generations (Go option), we are presented with several performances in the search for musical performances featuring singers in close-ups. The best individual obtained in 30 generations (also configurable by the user) is presented at the end of Fig. 4, after 5 of the 30 generations, selected for this figure due to space restrictions. This is the best-fit individual in the last generation. MovieGene will tend to present music and singer segments in best-fit individuals, allowing to compare different styles and cultural traits in both countries. With the music—Portugal–Brazil structure, one can also try to guess which country the first video is from, having the opportunity to check the answer if it later appears in the second or third position. From Portugal, Mariza, The Gift, Carlos do Carmo, Cla˜ and Maria Joa˜o were some of the singers and bands selected. Although also present in the pool, videos about folk music and academic groups, for example were not selected in the 30 generations with this goal defined. From Brazil, Caetano Veloso and Taˆnia Maria were the most presented singer videos, but some samba videos were also selected occasionally, because there are many of these in the pool, and they satisfy the music and sometimes the close-up goal, although leading to lower fitness values than the singers’ close-ups. The final video has a high fitness, and could be used as a good individual according to the specified goal. However, the search on itself can have an interesting entertaining, cultural and artistic value. In the last example, in Fig. 5, the goal includes turtle as the semantic keyword, and the dominant color blue at the lower level, in a pool having several video segments featuring animals in the real world, animation and ads. As in the previous examples, fitness is defined as a similarity measure, to this goal. The initial population of three individuals was created by random selection. No repetition was allowed for the genes, mutation was chosen to benefit precision over diversity, and through elitism, again, the best individual was sometimes kept between generations. Consecutive generations of individuals explore different combinations of segments with animals in several colors and styles, in a search for videos with dominant color blue and featuring turtles. In this example, the best individual after 30 generations has the highest fitness value concerning these parameters. But again, while approaching the intended goal, the search process provides us with combinations of different individuals with different properties, often of aesthetical and narrative value. And by

ARTICLE IN PRESS 846

T. Chambel et al. / Computers & Graphics 31 (2007) 837–847

interactively changing the goals or genetic parameters, the user can influence the search or explore totally different combinations and results.

search space, as more videos become available on the internet. This will enrich the creative exploration of video production and repurposing at a larger scale.

7. Conclusions and future work References A new paradigm for evolutionary creative video editing was proposed and developed. The MovieGene model and prototype were extended to include more flexible and structural options, increasing the available solutions for creation of new videos, in the initial population and throughout the evolutionary process. Although it may perform automatic video editing, MovieGene intends to empower the user as a film editor, supporting the creative edition, by proposing innovative evolutionary combinations the user may subjectively select from, in the process of arriving at more satisfactory or artistic solutions. Results obtained so far show the technical viability of the model as a creative video editing approach, although experimented in simple scenarios. Interaction with this new prototype reflects a higher flexibility and richness of solutions. By exploring several possibilities based on the user’s intentions and options, in aesthetic and narrative dimensions, MovieGene supports creative and artistic video editing. It can be used by the author in the production of a final piece of video, or set in a way that the viewer participates in the co-creation of the artistic video product, through the interactive definition of criteria and selections that influence the video evolution. The evolutionary creative process may also be regarded as the piece of art on itself, stimulating the viewer senses and mind as the possibilities are explored, in harmony or contrast. The underlying cinematic rules are also often unconventional and exploratory, characteristic of video art. MovieGene provides means to experiment with the creation of new forms of video art where the viewer can have an active role. Future directions include experiments with richer video spaces and refinements in the selection and composition stages. These include exploring video properties, fitness functions and mutations that allow for richer editing capabilities and styles at the aesthetic and narrative levels, for example with the further development of the sequencing at the syntactic level with the Markov chains approach. The definition of higher level cinematographic rules and properties may also help the user to think in more abstract and structured ways, unaware of low-level video descriptors or genetic algorithms terminology. For example, one could specify a theme and duration for a video in the style of a particular film director or artist, and have the system evolve a population of videos in this direction. Different forms of interaction can also be explored to allow the selection or changing of some of the evolutionary parameters in unconventional ways, for example by scribbling, drumming or rhythmic dance, in artistic performances. The adoption of standard video descriptions and annotations, as set forth in this work, allows for a wider

[1] Kneller GF. Art and science of creativity. International Thomson Publishing; 1965. [2] Bentley P, Come D. An introduction to creative evolutionary systems. In: Bentley P, Corne D, editors. Creative evolutionary systems. Morgan Kaufmann; 2002. [3] Henriques N, Correia N, Manzolli J, Correia L, Chambel T. MovieGene: evolutionary video production based on genetic algorithms and cinematic properties. EvoMUSART’2006, 4th European workshop on evolutionary music and art, Budapest, Hungary, April 10–12. In: Applications of evolutionary computing: EvoWorkshops 2006, Lecture Notes in Computer Science, Berlin/Heidelberg: Springer, Volume 3907/2006; 2006. p. 707–11. [4] Chambel T, Miguel GD, Correia L, Henriques NAC, Correia N, Manzolli J. Creative video editing through evolutionary algorithms. In: Proceedings of Artech 2006—third international conference on digital art and electronic, Pontevedra, Galiza, Spain, 17–18 November, 2006. p. 42–6. [5] Hatcher E, editor. Art as culture: an introduction to the anthropology of art. Bergin & Garvey; 1999. [6] Carroll N. Theories of art today. University of Wisconsin Press; 2000. [7] Paul C. Digital art. London: Thames & Hudson; 2003. [8] Digital Aesthetic 2. Does the digital have the potential to change our perception of art? Conference/Exhibition/Website, March–September, 2007, /http://www.digitalaesthetic.org.uk/S. [9] Prince PD. Digital art: the new literacy, a personal view of the evolution of art issues, Computer Graphics, November 1995. [10] Meigh-Andrews C. A history of video art: the development of form and function. Berg Publishers; 2006. [11] Hall D, Fifer SJ, editors. Illuminating video: an essential guide to video art, Aperture, 2005. [12] The ART-VIDEO.ORG Association, /www.art-video.orgS [13] Nam June Paik Official Website, /http://www.paikstudios.com/S. [14] Csikszentmihalyi M. Creativity. Flow and the psychology of discovery and invention. New York: HarperCollins; 1997. [15] Frazer J. An evolutionary architecture. London: Architectural Association; 1995. [16] Schnier T, Gero JS. Learning genetic representations as an alternative to hand-coded shape grammars. In: Gero J, Sudweeks F, editors. Artificial intelligence in design’ 96. Dordrecht: Kluwer; 1996. p. 39–57. [17] Buelow von P. Using evolutionary algorithms to aid designers of architectural structures. In: Bentley P, Corne D, editors. Creative evolutionary systems. Morgan Kaufmann; 2002. [18] Sims K. Artificial evolution for computer graphics. In: Computer Graphics ACM SIGGRAPH’91, 1991. p. 319–28. [19] Soddu C. Recognizability of the idea: the evolutionary process of argenia. In: Bentley P, Corne D, editors. Creative evolutionary systems. Los Altos, CA: Morgan Kaufmann; 2002. [20] Biles JA. Gen Jam: evolution of a jazz improviser. In: Bentley P, Corne D, editors. Creative evolutionary systems. Los Altos, CA: Morgan Kaufmann; 2002. [21] Moroni A, Manzolli J, von Zuben F, Gudwin R. Vox Populi: evolutionary computation for music evolution. In: Bentley P, Corne D, editors. Creative evolutionary systems. Los Altos, CA: Morgan Kaufmann; 2002. [22] Moroni A, Manzolli J, von Zuben F. ArTbitrating JaVox: evolution applied to visual and sound composition. In: Proceedings of SIACG’2006—Ibero American symposium in computer graphics, Santiago de Compostela, 5–7 July 2006. [23] Manovich L. Soft Cinema, /http://www.manovich.netS.

ARTICLE IN PRESS T. Chambel et al. / Computers & Graphics 31 (2007) 837–847 [24] Rocchi C, Zancanaro M. Rethorical patterns for adaptive documentaries. In: Proceedings of AH’2004, international conference on adaptive hypermedia, 2004. [25] Jaeger T. Live cinema unraveled, 2005. Online book available at: /http://www.vj-book.com/S. [26] Makela M. LIVE CINEMA: language and elements. MA in New Media, Media Lab, Helsinki University of Art and Design, April 2006. [27] Yoo H-W, Cho S-B. Video scene retrieval with interactive genetic algorithm. Multimedia Tools and Applications 2007;32(3). [28] Sanchez JM, Binefa X, Kender JR. Coupled Markov Chains for video contents characterization. In: Proceedings of ICPR’02–16th

[29]

[30] [31] [32]

847

international conference on pattern recognition, vol. 2, 2002. p. 20461. Chang P, Han M, Gong Y. Extract highlights from baseball game video with hidden Markov models. In: Proceedings of the international conference on image processing 2002. vol. 1, 2002, p. I-609–12. Jones K. Compositional applications of stochastic processes. Computer Music Journal 1981;5(2):45–61. ISO: MPEG-7 Overview (version 9), /http://www.chiariglione.org/ MPEG/standards/mpeg-7/mpeg-7.htm2004S. Mitchell M. An introduction to genetic algorithms. Cambridge, MA: MIT Press; 1996.