Knowledge-Based Systems 13 (2000) 459±470
www.elsevier.com/locate/knosys
Composition Analyzer: support tool for composition analysis on painting masterpieces q S. Tanaka a,*, J. Kurumizawa a, S. Inokuchi b,1, Y. Iwadate a a
b
ATR Media Integration and Communications Research Labs., 2-2 Hikaridai Seika-cho Soraku-gun, Kyoto 619-0288, Japan Department of Systems and Human Science, Osaka University, 1-3 Machikaneyama-cho Toyonaka City, Osaka 560-8531, Japan
Abstract In this paper, we propose a tool for extracting compositional information from pictures called the Composition Analyzer. This tool extracts such compositional information as the sizes, shapes, proportions, and locations of ®gures, by two processes. More speci®cally, it ®rst segments a picture into ®gures and a ground by a ®gure extraction method we developed. It then extracts the above compositional information from the ®gures based on the Dynamic Symmetry principle. The extracted compositional information is used to re®ne the picture, and as such, facilitates the production of multimedia for non-professionals. q 2000 Elsevier Science B.V. All rights reserved. Keywords: Composition; Paintings; Attractive region
1. Introduction Most problems faced by non-professional multimedia authors in creating good titles are not technological in nature, but rather due to a lack of expertise or knowledge about multimedia designs. Most commercial authoring tools have few functions to support such users in achieving their goals [1]. Consequently, non-professional authors have been suffering from not only problems in understanding tool functions, but also in deciding design details. We believe that the multimedia elements of professional products, such as color combinations, textures, compositions, and lighting effects, encompass a lot of the professional techniques or expert knowledge developed throughout the history of art. Consequently, providing these elements with appropriate tools to non-professional authors can navigate these non-professionals towards creating better products [2]. On this assumption, we have been developing a creative learning environment to help authors with the production of better images (see Fig. 1) [2]. In previous research, Nakakoji, et al. developed a knowledge-based color critiquing support system, eMMaC, which critiques the use of color in a title and suggests appropriate * Corresponding author. Tel.: 181-774-95-1465; fax: 181-774-95-1408. E-mail address:
[email protected] (S. Tanaka). 1 Tel.: 181-6-6850; fax: 181-6-6850-6371. q Derived from `Composition Analyzer: Computer Supported Composition Analysis on Masterpieces', published in the Proceedings of the Third Conference on Creativity and Cognition, Loughborough, UK, October 10± 13, 1999, pp. 68±75. Reproduced with permission from ACM q 1999.
color usage [1]. This system utilizes theories and guidelines on human color perception, cultural associations of color, and appropriate color combinations, which have been studied in visual communications design, to construct a rule base. Our system is an example-based system rather than a rule-based system. In this paper, we present one of the tools the above environment has, i.e. the ªComposition Analyzerº, which extracts compositional information from pictures, such as the shapes, proportions and locations of ®gures. This paper is organized as follows: Section 2 presents an overview of the Composition Analyzer, Section 3 describes the composition analysis processes of this tool, and Section 4 introduces a system that utilizes compositional information extracted by the Composition Analyzer to re®ne a picture. 2. Composition analyzer Composition involves many aspects; however, it can roughly be said that composition is a plan for arranging objects in a picture with a good balance [3±6]. Any picture emotionally affects its viewers differently depending on how the objects in the picture are composed. The composition can create not only emotional effects but also rhythm or dynamics in the picture. It is therefore important to determine where objects should be located, and also what sizes or shapes these objects should have [6]. Throughout the history of art, the golden section has been used to make the most beautiful and ideal proportions of
0950-7051/00/$ - see front matter q 2000 Elsevier Science B.V. All rights reserved. PII: S 0950-705 1(00)00062-9
460
S. Tanaka et al. / Knowledge-Based Systems 13 (2000) 459±470
Fig. 1. Cyber Atelier.
architectures or art work [5]. Such proportions had already been used by the ancient Egyptian civilization. In the Middle Ages, they were called ªDivina Proportionsº and people thought God blessed them. These beautiful proportions have had a signi®cant in¯uence on art work. Many authors who have created masterpieces have used these proportions to compose what they wanted to express in their paintings. In particular, Dutch Masters in the 17th century, such as Rembrandt and Vermeer, maintained this traditional method to create their art work [5]. For example, ªLady Seated at the Virginalsº and ªThe Love Letterº by Vermeer are typical examples of the golden section. In the modern era, Seurat, Cezanne, Dali, Picasso, and Mondorian have composed their paintings based on this idea [5]. Therefore, by using the idea of the golden section, it is possible to analyze relationships among objects in a picture. The Composition Analyzer extracts compositional information from a picture based on the idea of the golden section. It ®rst segments a picture into ®gures and a ground by a ®gure extraction method. It then extracts the sizes, shapes, locations, and proportions of the extracted ®gures based on the idea of the golden section. In the next section, the above composition analysis process is described in detail.
3. Composition analysis process 3.1. Figure-ground segmentation There are regions able to be recognized as the ®gures of a picture and regions able to be recognized as the ground of
Fig. 2. An example picture of a search problem.
Fig. 3. An example of a result from experiments.
the picture [7]. We assume that when a viewer looks at a picture, he/she ®rst looks over the whole picture and recognizes ®gures in the picture, and then, the viewer moves his/ her attention according to the level of attractiveness of the ®gures. Therefore, it can be said that two processes are performed in evaluating the level of attractiveness in the visual system. For instance, in Fig. 2, it is very easy to tell what the most attractive object would be. In this case, ªXº would be the most attractive, because it is completely different from the other objects. But before this could be concluded, i.e. ªXº being the most attractive object, the viewer would obviously have had to segment the picture into ®gures (ªXº and ªLº) and the ground (the white area). He/she would then have had to evaluate the attractiveness of each ®gure, and ®nally, to select ªXº as the most attractive object. To con®rm whether or not the above assumption is valid, we carried out experiments on tracking the eye movements of 10 people while they were looking at pictures. We also carried out experiments while having the subjects discriminate each of the same pictures into ®gures and a ground. An eye tracking camera system was used to track the eye movements. For the latter experiments, we showed the subjects the segmentation results for each picture, and then asked them to select regions that were parts of the ®gures. The Edge Flow model was used for the segmentation [12]. From the results, we found that the subjects mostly paid attention to those regions they selected as ®gures in the second experiments. The subjects ®rst recognized ®gures of the picture. Then, they evaluated the level of attractiveness for each ®gure based on physical stimulus or the meaning of the ®gure, or their interest. Finally, they moved their attention according to their evaluation results. Fig. 3 shows an example of a result from the experiments. As a result of the above experiments, we con®rmed that our assumption is valid.
S. Tanaka et al. / Knowledge-Based Systems 13 (2000) 459±470
In fact, it has been found that the V4 cortex in the human visual system plays an important role in ®gure-ground segmentation [8]. The V4 cortex is sensitive to many kinds of information both in the spatial domain and in the spectral domain relevant to object recognition [8]. In the spatial domain, many V4 cells exhibit some length, width, orientation, direction of motion, and spatial frequency selectivity. In the spectral domain, they are tuned to the wavelength [8]. In particular, it has been found that most V4 cells respond best to a receptive ®eld stimulus if there is a spectral difference between the receptive ®eld stimulus and its surroundings [8]. The above ®ndings conclude that one of the contributions of the V4 cortex to visual processes is ®gure-ground segmentation. At the V4 cortex, no semantic information is processed. Consequently, no attractiveness evaluation is performed based on the meanings of scenes or the viewer's interests at this stage. Accordingly, it is possible for a picture to be segmented into ®gures and a ground only based on physical features, such as the spectral domain (color) and the spatial domain (texture) processed by the V4 cortex. From the above considerations, we use the color contrast and texture contrast of regions for ®gure-ground segmentation. 3.1.1. Contrast parameter de®nition Two types of contrasts can be considered for picture regions. One is a local contrast, i.e. the difference between a region and its surroundings. The other is a global contrast, i.e. the difference between a region and the whole picture. Here, we use both the local contrast and the global contrast. In addition to the above types of contrasts, focus is another important factor for an enhancement of the contrast [6]. A focused region is more attractive than a blurred region. Furthermore, the contour of a focused region is sharp; the contour of a blurred region is not. From the above considerations, the following parameters are used for ®gure-ground segmentation.
fi,1: the local color contrast of region i, RgnColDifi,j: the color difference between region i and neighboring region j, which touches region i, wi: the penalty coef®cient of region i, ei: the euler number of the mask image of region i, tli: the length of the border line of region i and neighboring regions, li: the length of the contour of region i, Lti, ati, bti: the color value of region i in the L p a p b p color space, Lsi, asi, bsi: the color value of neighboring region j in the L p a p b p color space, ni: the number of neighboring regions of region i. 2. Local texture contrast: fi;2
k
max
ColorDif j 2 min
ColorDif k
ColorDif i wi
where
1
k
ni 1 X RgnColDif i;j ni j1
RgnColDif i;j q
Lti 2 Lsj 2 1
ati 2 asj 2 1
bti 2 bsj 2 wi
1 tli uei 2 2u li
k
2
5
max
TexDif j 2 min
TexDif k k
TexDif i wi
ni 1 X RgnTexDif i;j ni j1
6
v u nf uX RgnTexDif i;j t
Tti;k 2 Tsj;k 2
7
k1
where fi,2: the local texture contrast of region i, RgnTexDifi,j: the euclidean distance between the texture feature vector region i and neighboring region j, Tti,k: the texture feature vector of region i, Tsj,k: the texture feature vector of neighboring region j, nf: the number of elements in the texture feature vector. 3. Global color contrast: fi;3
GRgnColDif i 2 min
GRgnColDif k k
8
max
GRgnColDif j 2 min
GRgnColDif k j
ColorDif i 2 min
ColorDif k j
TexDif i 2 min
TexDif k j
1. Local color contrast: fi;1
461
k
GRgnColDif i q wi
Lti 2 Lav 2 1
ati 2 aav 2 1
bti 2 bev 2
9
where fi,3: the global color contrast of region i, Lav, aav, bav: the average color value of the picture. 4. Global texture contrast: GRgnTexDif i 2 min
GRgnTexDif k k
3
fi;4
4
v u nf uX GRgnTexDif i wi t
Tti;k 2 Tavk 2
max
GRgnTexDif j 2 min
GRgnTexDif k j
10
k
k1
11
462
S. Tanaka et al. / Knowledge-Based Systems 13 (2000) 459±470
Fig. 5. De®nition of the contour and the tangent line of a region.
Fig. 4. Response of a Gabor ®lter bank.
where fi,4: the global texture contrast of region i, Tav: the average texture feature vector of the picture. 5. Sharpness of contour: fi;5
Focusi 2 min
Focusk k
max
Focusj 2 min
Focusk j
Focusi wi
12
k
ni 1 X u7Rc i;j
x; yu ni j1
q u7Rc i;j
x; yu Rcx2i;j
x; y 1 Rcy2i;j
x; y
13
14
where fi,5: the sharpness of the contour of region i, Focusi: the average edge magnitude of the contour of region i, u7RC i;j
x; yu : the edge magnitude of pixel j on the contour of region i, Rcx: the gradient in the x direction, Rcy: the gradient in the y direction. For the color difference calculation, we use the CIE L p a p b p color space. This is because color differences in this color space correspond to the human visual sense in general [7]. For texture features, a multi-resolution representation based on Gabor ®lters is used. Gabor features have been used to characterize the underlying texture information in given regions. [10,11]. Because Gabor ®lters are not orthogonal to each other in nature, however, redundant information exists in the ®ltered images. In order to reduce such redundant information, the ®lter parameters must be chosen by using the algorithm presented in [9]. This algorithm ensures that the half peak magnitudes of the ®lter responses in frequency spectra touch each other as shown in Fig. 4. To present texture features, we use 24 ®lters consisting of four scaled and six oriented ®lters. A penalty coef®cient is employed based on the following
characteristics of ®gures. 1. A closed or surrounded region is apt to be regarded as a ®gure [6]. 2. A ®gure is seen as having a contour; not the ground [6]. Concerning item (1) above, a euler number is calculated for each mask image of the region. This number is calculated by subtracting the number of objects in the picture with the number of holes in the object [15]. For instance, if a picture has one object and the object has two holes then the euler number of the picture will be 21. Based on this characteristic, the penalty coef®cient for item (1) (the left side of Eq. (4)) is calculated by the following processes, such that the greater the number of holes in the region, the smaller the contrast value will be. For item (2) above, there are regions that touch the edge(s) of a picture. These regions have a lot of possibility of being the ground of the picture. To represent whether or not a region is completely surrounded by other regions, the length of a tangent line to surrounding regions is measured (excluding the inside of the region, see Fig. 5). Then, the length is divided by the length of the contour of the region. By multiplying this value with every parameter (the right side of Eq. 4, see Fig. 5), the contrast value of the region is forced to be small if the region touches the edge(s) of the picture. 3.1.2. Discrimination function In order to analyze how people achieve ®gure-ground segmentation, we collected data on ®gure regions and ground regions of 100 pictures, by performing subjective experiments with 15 people. In the experiments, we showed complete original pictures individually on a CRT (2048 £ 2048 resolution) and segmented regions on another CRT, and asked the subjects whether each segmented region was a part of the ®gure or not. The Edge Flow model was used for the segmentation [12]. This model utilizes a predictive coding model to identify the direction of change in the color and texture at each image location on a given scale, and constructs and edge
S. Tanaka et al. / Knowledge-Based Systems 13 (2000) 459±470
463
Moreover, there are pictures in which the number of ®gure regions and the number of ground regions are the same. Therefore, let us assume that P(c1) is equal to P(c2). In fact, P(c1) and P(c2) were 0.48 and 0.52, respectively, in the experiments. On the above assumption, Eq. (15) becomes as follows: p
Xuc1 $ p
Xuc2
16
Assuming that p(Xuci) can be represented as a K-dimensional normal distribution, p(Xuci) becomes as follows: 1 1 i 0 Vi21
X 2 X i
X 2 X exp 2 p
Xuci 1=2 2
2pK=2 uVi u
17 where Fig. 6. Result of PCA.
¯ow vector. By iteratively propagating the edge ¯ow, boundaries are detected (at those image locations encountering two opposite directions of ¯ow in the stable state) [12]. Then, the area surrounded by the boundaries is extracted as a region. This method can segment a picture into regions whose colors and textures are homogeneous. Therefore, the method suits our purpose, i.e. ®gure-ground segmentation based on the color and texture contrasts of regions. In our experiments, the optimized parameters for the Edge Flow model were chosen for each picture during the segmentation. After the experiments were completed, the values of the de®ned contrast parameters (the number of data sets was 1162) were measured for each region. Then, principal component analysis (PCA for short) was carried out on the data to detect characteristics of the ®gure region and the ground region. Fig. 6 shows the results of the PCA. From the PCA results, we found one possibility for discriminating the ®gure region from the ground region. As a result of the experiments, we constructed a discrimination function by applying the Maximum Likelihood method to the PCA results. Let X be a feature vector for a region consisting of the principal component values, and let c1 and c2 represent the category for the ®gure region and the category for the ground region. If and only if the following condition is satis®ed, X is a ®gure region. p
Xuc1 P
c1 $ p
Xuc2 P
c2
15
where P(ci): the occurrence probability of category ci, p(Xuc1): the probability that X is a part of the ®gure, p(Xuc2): the probability that X is a part of the ground. Note that P(ci) is an unknown variable and it varies depending on the picture. There are pictures that have more ®gure regions than ground regions, and vice versa.
Vi: the covariance of the principal values in the ®gure or the ground data, X i : the average principal values of the ®gure or the ground data. By substituting Eq. (17) into Eq. (15) and taking the logarithm on both sides of Eq. (15), the following formula results. log
uV1 u 1
X 2 X 1 0 V121
X 2 X 1 uV2 u
2
X 2 X 2 0 V221
X 2 X 2 # 0
18
Here, we de®ne our ®gure-ground discrimination function by using Eq. (18). Dis
X log
uV1 u 1
X 2 X 1 0 V121
X 2 X 1 uV2 u
2
X 2
X 2 V221
X 0
19
2 X 2
If Dis(X) # 0, then X is a part of the ®gure. 3.1.3. Figure extraction method A ®gure extraction method is proposed here. The method has the following process. 1. Segment a picture into plural regions by the Edge Flow model with the optimum parameters to the picture. 2. Measure the contrast parameters of the regions. 3. Transform the contrast parameters of the regions. 4. Transform the parameters to the principal component space. 5. Select those regions whose evaluation values of the discrimination function are less than zero. Here, we use every principal component (®ve principal components) for the discrimination. From the results of the experiments, the following parameters for the discrimination function were obtained.
464
S. Tanaka et al. / Knowledge-Based Systems 13 (2000) 459±470
X 0 x1 x2 x3 x4 2 0:46 0:46 6 6 0:52 20:29 6 6 6 6 0:35 0:68 6 6 6 0:45 20:39 4 0:43 20:30 2
0:96
x5 0:22 20:13 20:22 20:60 0:73
0:51
20:51
3
7 0:63 7 7 7 7 20:49 0:35 7 7 7 20:27 20:46 7 5 0:48
20:44 20:01
1. 2. 3. 4. 5. 6.
3
7 6 6 20:05 7 7 6 7 6 7 6 X 1 6 0:13 7 7 6 7 6 6 0:01 7 5 4 0:0 2
20:66
3
7 6 6 0:03 7 7 6 7 6 7 6 X 2 6 20:09 7 7 6 7 6 6 20:01 7 5 4 0:0 2
2:71
20:02
20:21
1:35
0:0
0:0
0:53
0:04
20:01
0:03
0:01
0:03
1:66
0:07
0:01
0:94
0:01
0:01
0:56
20:03
0:01
20:01
20:02
6 6 20:02 6 6 6 V1 6 20:21 6 6 6 20:09 4 2
6 6 0:07 6 6 6 V2 6 0:01 6 6 6 0:05 4 20:02
20:09 0:03
3
7 0:01 7 7 7 7 20:01 0:03 7 7 7 0:41 0:0 7 5 0:0 0:19 0:04
0:05
20:02
3
7 20:03 20:01 7 7 7 7 0:01 20:02 7 7 7 0:43 0:0 7 5 0:0
idea of the golden section, it is possible to analyze relationships among objects in a picture. A proportion is considered to be an attribute of a form because ratio formulas are often used to determine visual order or visual balance [14]. In ancient Greek civilization, architectures or sculptures were built based on the following proportions to create visual balance [5].
0:20
Fig. 7 shows examples of results with this method. We carried out experiments with this method in order to ascertain how precise the method can extract ®gure regions that viewers will recognize. In the experiments, this method could extract ®gure regions selected by human subjects at the rate of 80% accuracy [15]. 3.2. Composition analysis As mentioned in the previous section, the golden section has been used to create the most beautiful and ideal proportions of architectures and art work [5]. In addition, many famous painters have used the idea of the golden section to compose objects in their paintings. Therefore, by using the
1:1 p 1 : p2 1 : p3 1 : p4 1: 5 1:1.618 (the golden proportion)
A rectangle whose proportion is one of the above ®rst ®ve proportions is called a root rectangle. A rectangle whose proportion is the same as (6) is called a golden rectangle [5]. Jay Hambidge, who was a professor at Yale University, found that these rectangles can be created based on squares. According to the Dynamic Symmetry principle he proposed, these rectangles can be subdivided into smaller, similar rectangles that have parallel or perpendicular diagonals. The aspect ratio of canvas actually has almost the same proportion as one of the above proportions. There are three standard types of canvas currently in use: Figure, Paysage, p and Marine. Payasage is a 2 rectangle, Marine is a golden rectangle, and Figure is a rectangle made by combining two golden rectangles at their long edges [5]. Therefore, applying the Dynamic Symmetry principle to canvas makes it possible to determine not only the size of an object but also its location, to maintain a good visual balance. Consequently, it becomes possible to extract compositional information from a picture by applying the principle to extracted ®gures. 3.2.1. Composition extraction method The way to subdivide a rectangle according to the Dynamic Symmetry principle is as follows: ®rst, a diagonal line is drawn across the rectangle. Second, perpendicular to this diagonal, another line is constructed. This line intersects the long side of the rectangle and divides it into two smaller similar rectangles. The same procedure can be performed on the smaller rectangles (see Fig. 8). As mentioned in the previous section, the Figure type canvas is a rectangle made by combining two rectangles at their long edges [5]. Therefore, it is necessary for the Figure type canvas to be split in half at the beginning to apply the Dynamic Symmetry principle. Moreover, the center of a picture is also important; therefore, the center lines are drawn at the beginning. Here, we propose the following composition extraction method. Procedure: Detect the canvas type
S. Tanaka et al. / Knowledge-Based Systems 13 (2000) 459±470
Fig. 7. Figure extraction results.
Draw the center lines IF the type is Figure Subdivide the picture into two golden rectangles Make the two rectangles the targets ELSE
Make the whole picture the target End IF WHILE it is not stable FOR the target rectangles in the picture Divide the rectangles into smaller rectangles
465
466
S. Tanaka et al. / Knowledge-Based Systems 13 (2000) 459±470
Fig. 8. Subdivision process based on the dynamic symmetry principle.
FOR the smaller rectangles IF a rectangle is too small Ignore the rectangle ELSE IF the rectangle is occupied by the ®gure at the rate of the threshold Ignore the rectangle ELSE IF the rectangle is occupied by the ground at the rate of the threshold Ignore the rectangle ELSE Make the rectangle the target ENDIF ENDIF END END END Extract regions that are constructed by base lines and occupied by the ®gure at the rate of the threshold; Compositional information has now been extracted from the picture consisting of base lines and an abstracted ®gure. Fig. 9 shows an example of a result. As a result of the method, we can explore how the ®gure was drawn. For the example of Fig. 9, it is possible to see that the body line of Venus was determined based on a base line of the golden section, and the location of each part of the body was also determined by golden section points. By using this tool, the user can learn how professional painters maintain the visual balance of a picture. Fig. 9. Result of the composition extraction method.
3.2.2. Experiment of extracting compositional information of masterpieces In order to ascertain how applicable the proposed method is to existing paintings, we performed experiments on extracting the compositions of 100 paintings by the method. We collected paintings from the beginning of the Renaissance to the Modern era, all of which were available on CDROM [16]. In the experiments, we asked professional painters to judge whether or not the method could extract proper golden section points and base lines able to explain how the subjects within the paintings were drawn. The results of the experiments are shown in Table 1. From the results in Table 1, the average of the success
Table 1 Result of experiments Era
Century
Success rate
Beginning of Renaissance Mid-Renaissance Baluch Romance Impressionism Modern
15 16 17 18 19 20
0.25 0.80 0.70 0.13 0.76 0.75
S. Tanaka et al. / Knowledge-Based Systems 13 (2000) 459±470
Fig. 10. Image re-composer.
rates is 56.5%, and the success rates differ depending on the era. To understand these results, we investigated the history of paintings. In fact, the golden section was not a popular technique to determine the composition of paintings at the beginning of the Renaissance, and it was only used by painters in Northern Europe [13]. Then, the golden section gradually spread out to all of Europe, and in the mid-Renaissance period, it became a common technique among painters and architects, especially those who designed churches or painted pictures of altars. In Baluch, the Dutch masters in particular used this traditional technique to compose their paintings. For Romance or Realism, artists were not at all concerned about composition; they simply wanted to compose their works without considering composition. With Impressionism, artists came to realize the importance of composition again, and the golden section was revived. This movement is still going on. Considering the above, it stands to reason why we got the results shown in Table 1. What is more interesting, however, is that this engineering approach can give one quantitative evidence to support theories in Art. From the results in Table 1, the average of the success rates is 56.5%; however, the method can be useful enough to collect a suf®cient amount of compositional information considering the existing number of paintings. 4. Image re-composer Here, we introduce an application system that uses the compositional information extracted by the Composition Analyzer. The system is called ªImage Re-Composerº, and is a post-production tool that decomposes a picture and regenerates it as a new improved picture according to compositional information extracted from masterpieces (see
467
Fig. 10). Since the tool allows the user to generate different pictures depending on speci®ed compositions, the user can experiment with a variety of good compositions on an original picture. As a result, the user can learn how masters have maintained the visual balance of pictures. For the image recomposition, the system asks the user to input three pictures, an original picture, a ground picture, and a guide picture. The ground picture is a picture onto which the system recomposes ®gures of the original picture. Both the original picture and ground picture are input into the system by using an image scanner or by specifying them as ®les. The guide picture is a picture with compositional information, and provides recomposition guidelines for the system. An Image Database is available for guide picture retrieval. When the user searches for pictures, the user can specify an author's name, a picture's name, a type of picture (portrait, scenery, group, etc.), or how many or what kinds of objects there are. After the user inputs the original picture, the system, tries to extract ®gure objects from the picture by the ®gure extraction method. Then, the user chooses objects which are to be recomposed by the system. Image Re-Composer ®nally recomposes the selected objects according to the compositional information of the guideline picture. The following section explains the above processes in detail. 4.1. Object selection After the user inputs a picture having the desired objects to recompose, Image Re-Composer tries to extract those objects by the ®gure extraction method. However, because the extraction method will not always give perfect results, the user is asked to correct the results by the system, when necessary. The system shows the extraction results with ®gure regions in color and ground regions in gray. If the user is not satis®ed with the results, he/she can correct any one of them by selecting or de-selecting regions that have been mis-discriminated by the system. The system then asks the user to discriminate each object that the user wants to recompose. The user can discriminate each object by selecting multiple regions consisting of the object. When the user extracts an object from the input picture, the object appears in an object browser of the Image ReComposer control panel, as shown in Fig. 11. This tool registers all of the objects that the user previously extracted, and it allows the user to specify objects extracted from different pictures in order to recompose them within a picture. Similar browsers are also available for the ground picture and the guide picture. If the user selects two objects, the system gives the objects IDs in order to take the correspondence between objects in the guide pictures and the user selected objects (see the left side of Fig. 11). Currently, the system allows the user to select two objects or less at the same time; because
468
S. Tanaka et al. / Knowledge-Based Systems 13 (2000) 459±470
Fig. 11. Image re-composer control panel.
we have experimentally found that it is better to recompose three objects or more as a group of objects rather than recompose them individually.
pixels of the ground picture, and s the scale coef®cient for the selected object.
4.2. Guide picture search
Then, the system scales the object with the calculated coef®cient. For the location adjustment, we calculate a new location with the following equations. Let wg be the width of a guide picture, hg the height of the guide picture, (xg, yg) the center of the gravity of an object in the guide picture, wb the width of the ground picture, hb the height of the ground picture, and (xf, yf) the center of gravity of the user-speci®ed object.
When the user retrieves a guide picture from the image database, the user can specify what kinds of objects and how many objects he/she wants to recompose as keywords. The system retrieves pictures that match the speci®ed keywords, and then matches the shapes between the user selected objects and the retrieved objects. Finally, the system shows the user guide pictures whose objects are as similar to the user-selected objects as possible. This function is provided to assist the user as much as possible in not specifying a bad combination. For example, when the user tries to recompose a standing ®gure, it is not good for the system to recommend the composition of a sitting ®gure, because the result would obviously be bad. For the shape matching, we employ P type Fourier descriptors [17] to represent the shape, and measure the similarity as the euclidian distance between vectors whose items are the above Fourier descriptors. 4.3. Image re-composition For the image recomposition, the system adjusts the sizes and locations of the selected objects according to the speci®ed guide picture, and composes the objects onto the speci®ed ground picture. For the size adjustment, the system calculates a scale coef®cient for each object as follows: Let x be the number of pixels of a selected object, m the ratio between the size of an object within the guide picture and the size of the whole guide picture, n the number of
s m p n=x
20
xf wb p xg =wg
21
yf hb p yg =hg
22
4.4. Image re-composed results Fig. 12 shows some examples of recomposed results by the system. As shown in Fig. 12, the user can experiment with a variety of compositions on the same objects. Therefore, the system is useful for non-professionals to learn how to maintain the visual balance within a picture. 5. Conclusion In this paper, we described a tool for extracting the compositional information of pictures. The compositional information is extracted from a picture by two processes: ®gure extraction and composition analysis on the extracted
S. Tanaka et al. / Knowledge-Based Systems 13 (2000) 459±470
469
Fig. 12. Re-composed results.
®gures by using the Dynamic symmetry principle. The result of the analysis is used in an application system, Image ReComposer, which is a tool for re®ning a picture according to compositional information extracted from masterpieces. This tool can also help non-professionals learn how to maintain visual balance since they are able to explore a variety of compositions of professional works. Furthermore, the tool can be used by researchers working in art societies to analyze paintings. However, the composition information extracted by the Composition Analyzer is static information. A dynamic composition does exist that represents the context of a picture [4]. This composition has been known as the ªLeading eyeº. In this case, the artist usually leads the attention of viewers from the main object in his/her work to various points in the picture, or leads the attention of viewers from a sub-object to the main object by controlling the level of attraction of sub-objects [4]. In order to extract this information, it is necessary to evaluate the level of attractiveness of regions properly. Future work will therefore involve such attractiveness evaluation.
[2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12]
References [1] K. Nakakoji, B.N. Reeves, A. Aoki, H. Suzuki, K. Mizushima,
[13]
eMMaC: Knowledge-Based Color Critiquing Support for Novice Multimedia Authors, Proc. ACM Multimedia '95, 1995, pp. 467± 476. A. Plante, S. Tanaka, S. Inoue, M-Motion: a creative and learning environment facilitating the communication of emotions, Proc. CGIM '98, 1998, pp. 77±80. D.A. Dondis, A Primer of Visual Literacy, The MIT Press, Cambridge, MA, 1974. Shikaku Design Kenkyusho Corporation, Essence of Composition, 1995 (in Japanese). Yanagi Ryo, Golden Section, Bijyuthu Shuppan Sha, 1998, (in Japanese). R.D. Zakia, Perception and Imaging, Focal Press, 1997. Tadasu Oyama, Shoga Imai, Tenji Wake, Handbook of Sensation and Perception, Seishin Shobo, 1996 (in Japanese). R. Desimone, S.J. Schein, J. Moran, L.G. Ungerleider, Contour, color and shape analysis beyond the striate cortex, Vision Research 25 (1985) 441±452. B.S. Manjunath, W.Y. Ma, Texture features for browsing and retrieval of image data, IEEE Transactions on Pattern Analysis and Machine Intelligence 18 (8) (1996) 837±842. D. Dunn, W.E. Higgins, Optimal Gabor ®lters for texture segmentation, IEEE Transactions on Image Processing 4 (7) (1995) 947±964. K.Jain Anil, F. Farrokhnia, Unsupervised texture segmentation using Gabor ®lters, Pattern Recognition 24 (12) (1991) 1167±1186. W.Y. Ma, B.S. Manjunath, Edge ¯ow: a framework of boundary detection and image segmentation, Proc. CVPR '97, 1997, pp. 744± 749. T. Kanbayashi, K. Shioe, K. Shimamoto, Handbuch der Kunstwissenschaft, Keisou Shobou, 1997 (in Japanese).
470
S. Tanaka et al. / Knowledge-Based Systems 13 (2000) 459±470
[14] C. Wallshlaeger, C. Busic-Snyder, Basic Visual Concepts and Principles, McGraw Hill, New York, 1992. [15] S. Tanaka, Y. Iwadate, S. Inokuchi, A ®gure extraction method based on the color and texture contrasts of regions, Proc. ICIAP '99, 1999, pp. 12±17.
[16] Planet Art, A Gallery of Masters, 1997. [17] Y. Uesaka, A new Fourier descriptor applicable to open curves, IEICE Transaction J67-A (3) (1983) 166±173.