Information Processing Letters 59 ( 1996) 241-244
A framework to animate string algorithms
1
Ricardo A. Baeza-Yates *, Luis 0. Fuentes Depto. de Ciencias de la Computacidn, Vniversidad de Chile, Casilla 2777, Santiago, Chile
Received 18 July 1995; revised 19 July 1996 Communicated by G.R. Andrews
Abstract We present an algorithm animation system, Xaa, which is tailored for string algorithms, and, in general, to algorithms that have as input one or two dimensional sequences of symbols. This system provides data generation tools and statistics of the algorithm’s behavior. Keywords: Algorithms; Algorithm animation; String searching; Visualization
1. Introduction Visualization of a dynamic process facilitates its understanding. There are several general purpose algorithm animation systems [ 1,9,2], the best known of which is BALSA [ 11. Although BALSA is incredibly powerful, it is not trivial to define a new visualization, and the learning effort for this system is high. One solution is to trade functionality by automatically generating animation code, this approach was used in VCC [ 41. Another solution, which is taken in this paper, it is to tailor the animation primitives to a specific class of algorithms. This solution simplifies the animation of an algorithm, and it includes functionality common to the class of algorithms selected. Here we study string algorithms, and we include powerful data generation and behavior analysis tools.
* Corresponding author. Mailing address: Dept. LSI, Univ. Politechnica de Catalunya, Pau Gatgallo 5.08028 Barcelona, Spain. Email:
[email protected]. 1This work was partially supported by Grant 1950622 of Fondecyt.
We choose string algorithms for several reasons. First, few of the current animation systems use string problems as examples. (The main exceptions are one animation done in Zeus [2] and the SALSA [ 81 system which is built on top of Tango [9]. * ) Second, algorithms like string searching are excellent cases of interesting and practical algorithms which are not trivial to understand (for example, the KnuthMorris-Pratt or the Boyer-Moore string searching algorithms). Third, it is a class of algorithms where visualizing its behavior allows for improvements, because many of them are difficult to understand. The events of interest for an animation vary from one algorithm to another. For example, for sorting algorithms in an array we can be interested in visualizing comparisons between array elements and data interchanges. For string problems, we use the class of searching algorithms based on character comparisons, because of their importance and number. Interesting events in this case are character comparisons, shifts z This system was independently and concurrently developed at tbe same time of ours.
0020-0190/96/$12.00 Copyright @ 1996 Published by Elsevier Science B.V. All rights reserved. PII SOO20-0190(96)00117-2
R.A. Baeza-Yates. L.O. Fuentes/lnformation
242
Processing Letters 59 (1996) 241-244
IdPatt = XaaCreateString( _Patt, 1, m 1; XaaDrawStringcIdPatt 1; IdText = XaaCreateStringc -Text, 1, n ); XaaDrawStringcIdText 1; for ( i = 1; i <= n - m + 1; i++ 1 c XaaMoveString(IdPatt,0, i - 1 1; for ( j = i, k = 1; XaaShovCompare(IdText, I, j, IdPatt, 1, k 1, TextC j 1 == PattC k 1; j++, k++ 1 if ( k == m 1 I found++; printf( "The beginningof pattern was found at TextC%dl\n", j -m+l); XaaShowMatch(IdText, 1, j - m + 1, 1, m ); break; ) > XaaDestroyString( IdText >; XaaDestroyString( IdPatt 1; Fig.1.Animation codeexample.
of the pattern over the text, pattern occurrences, etc. With Xaa it is possible to study and compare several string searching algorithms, and exploring how can be improved. A preliminary version of this paper was presented in [ 31.
2. Animation
model
We have chosen a very simple but general model for any string algorithm based in comparisons. We follow the approach developed in Tango [ 91 for general algorithms: we have a set of graphical objects which have specific animation paths on the screen. In our case, there is only one type of graphical objects, strings, which can be one or two-dimensional. Strings can be created, destroyed and displayed. The events animated are string movements (for example, shifting a pattern), highlighting of substrings (temporal or permanent), and comparisons. Every event has its own visualization, as seen later. For example, every comparison increases the gray level of the positions involved. The overall animation library is very small, but powerful. The basic functions to manipulate strings are: XaaCreateString - creates a new string, XaaDestroystring - destroys a string, and XaaDrawString displays a string.
The event animation functions are: XaaMoveString - moves a string leaving a trace of past positions XaaMarkArea, XaaUnMarkArea - highlights or dehighlights a working area of an array XaaShowCompare - shows the comparison between two characters (in the same string or in different strings) XaaShowMatch - highlights permanently a substring (for example, the occurrence of a pattern) Fig. 1 shows the main part of the animation code for the brute-force or naive searching algorithm in the onedimensional case. That is, try all possible positions of the pattern in the text.
3. The animation environment The graphical user interface was developed using XWindows. It provides a set of tools to set-up and select the algorithm to be animated. The animation is done in a main window which allows full scrolling. The input data can be a one-dimensional or a two-dimensional string. Two dimensional strings are used for searching bi-dimensional patterns on a bi-dimensional text (for example, searching on a bit-mapped screen). The following subsections describe each of the tools available to the user.
R.A. Baeza-Yates. L.O. Fuentes/lnfonnation
Processing Letters 59 (1996) 241-244
243
classical string searching algorithm (Knuth-MorrisPratt) is given in Fig. 3. 3.3. Statistics
HaxSiza RlphaSin Random 0::.
IQ
This tool provides a simple analysis of the behavior of an algorithm. Over a sequence of executions, the number of comparisons used in each trial is analyzed, obtaining the best case, the worst case, the average case, and the variance, as well as the number of occurrences found. For each case the associated patterns are also shown (see Fig. 3).
]xrf
/2_
, ,_ _.._ .,.,,....“._..‘\ I $_ltf~~!~~~j~_j
lx-l GenText
i”““““.._._ ..,.,. _._._ _ _““.” .._..... _.......) Randon
Editor Text
i honms,i !L...,.. _ 8: ...L..._._ .._... :_:.._.. _._.._1
aaaabaaaab babaaaabbb abahaaaaab GenPatt m Pattern
3.4. Currently available algorithms
assigned:
-11
Pernutation
assigned:
aaba abab aaaa
Fig. 2. Bidimensional
3.1. Data generation
data
For one-dimensional text we have the following algorithms: Brute-Force or Naive, Knuth-MorrisPratt, Boyer-Moore, Boyer-Moore-Horspool, Boyer~_.“_._ _........._.i Moore-Horspool-Sunday, and Boyer-Moore with k ] [,__, _:~.~~~~~(!~~~,,_,._,~ mismatches. For two-dimensional text we have: Brute-Force or Naive, Bird (or Baker), and Baeza-Yates/Regnier. We refer the reader to [ 71 for details about most of these algorithms. By just modifying one small file which contains the editor. procedure calls to all the algorithms, it is possible to include another animated algorithm into the system.
Xaa provides a string editor shown in Fig. 2. This editor allows one to generate several types of string sequences interactively. Given a user defined alphabet, it is possible to generate random sequences, all the permutations of a given length, or Fibonacci sequences. For the random case, it is possible to fix a percentage of occurrences of the pattern on the text. These sequences can be composed or edited by the user until the desired input data is obtained. The generated data can also be read from or saved to a file. 3.2. Animation tools After the input has been set up, we can choose which algorithm will be animated. The animation can be stopped, started, traced, or cancelled at any time. The animation window has vertical and horizontal scrolling, where the algorithm can be visualized in a continuous way (with a variable animation delay) or step by step (per event). An example of animating a
3.5. Algorithm example We present here the animation of the most complex algorithm currently handled by the system: a twodimensional text searching algorithm due to BaezaYates and Regnier [ 51. We show just one animation screen. In the example of Fig. 4 we can see the current comparison (arrows), an occurrence previously found (square with solid border), and the shifts of the pattern (dashed line). The gray level of every character depends on the number of times that was used in a comparison (white means no comparisons, the first gray level one comparison, and so on). 4. Concluding remarks The system can be very easily extended in several ways, and the source code is freely available [ 31. New
244
R.A. Baeza-Yates, L.O. Fuentes/btformation
,................._............................................................................................. ) g statistics pJi . .................................................... ........... ... ................................... i Sizes:
T-_____T_T______T_ I
I
L_
_____
I
6-d
+-j-qq ______._
b ;k.; ::i:: ::i:: :$:: :$I: b :fi; ::+; :t;:: b
a
Processing Letters 59 (1996) 241-244
Tent:
a
:&.I
:.$:
:&.I
1%:
;#,;
lb.1
b
,&.,
. . .. .
r_;___--___s_r_____
I
_-‘____.____A_A______ :
:
.a.,.
&.,
b
:..::
I
;
.
q
Pattern:
q
Nr,
of patterns:
Nr,
of conparisons:
q
!
]alblal
Patterns
found
for
the worst case:
rbb (31
El . . .
I
. . .
. .
:
............. i
Fig. 3. Statistics.
gorithms based on comparisons. Therefore, Xaa is extensible and more powerful. In fact, this system has been recently extended to include two improved Boyer-Moore type algorithms for string searching at the University of Marne-la-Vall&z in France [ 61,
References algorithms using Balsa - II, IEEE Comput. 21 (5) (1988) 14-36. 121 M.H. Brown, Zeus: A system for algorithm animation and multi-view editing, in: IEEE Workshop on Visual Languages (1991). and L. Fuentes, Xaa: A framework to 131 R. Baeza-Yates animate text algorithms, in: Proc. 19th Latinamerican Conf on btformatics, Buenos Aires, Argentina (1993) 15-22; Software is available at ftp://sunsite.dcc.uchile.cl/pub/users/ rbaeza/xaa r41 R. Baeza-Yates, L. Jara and G. Quezada, VCC: Automatic animation of C programs, in: Proc. COMPVGRAPHICS’92, Lisboa, Portugal (1992) 389-397. 151 R. Baeza-Yates and M. Regnier, Fast two dimensional pattern matching, btform. Process. Len. 45 (1993) 51-57. 161 M. Crochemore, Private communication, 1996. [71 G.H. Gonnet and R. Baeza-Yates, Handbook of Algorithms and Data Structures - In Pascal and C (Addison-Wesley, Wokingham, UK, 2nd ed., 1991). [8 E. Sutinen and J. Tarhio, String matching animator salsa, in: Proc. 3rd Symp. on Programming Languages and Software Tools, Kaariku, Estonia (1993) 120-129. J.T. Stasko, Tango: A framework and system for algorithm animation, IEEE Comput. 23 (9) ( 1990) 27-39.
111 M.H. Brown, Exploring
Fig. 4. Baeza-Yates/Regnier
2D searching
algorithm.
algorithms can be included into the system without difficulty. If it is a string searching algorithm, the animation is relatively simple. Nevertheless, it is possible to animate other algorithms that use similar data structures or similar animation events (for example, algorithms that handle arrays and/or matrices). It should also be possible to extend the system to searching and sorting algorithms over numbers. The SALSA system [8] animates the BoyerMoore, Karp-Rabin, Aho-Corasick and approximate dynamic programming algorithms, which are not all based in character comparisons. Our system differs from SALSA because it is generic and it captures with very few events and animation operators the semantics of most relevant parts of many string al-