North-Holland Microprocessing and Microprogramming 23 (1988) 273 - 278
273
A SORTING PROCESSORFOR MICROCOMPUTERS
Emil J o v a n o v , T i h o m i r A l e k s i d
* I n s t i t u t e "MIHAJLO PUPIN" Computer Systems Department Volgina 15, 11000 Beograd Yugoslavia
, Zora~ S t o j k o v , Du~an S t a r ~ e v i d
** E~ectrotechnical Faculty Bulevar Revolucije 73, 11000 Beograd Yugoslavia
File or database processing in business oriented microcomputers requires frequent data sor t i n g . In a situation of modest system resources of microcomputers, applied software sorting procedures may not be e f f i c i e n t enough for some applications. An alternative approach is to achieve fast sorting by means of specialized hardware. Known hardware sorters are not economically j u s t i f i e d for low cost microcomputers. As an e f f i c i e n t and cost-effective solution, we propose a new concept of a sorting processor for microcomputers, SPR. The SPR is dedicated to accelerate sorting and sort-based database operations. The sorting method is radix sort with 64K radix. The paper presents the basic algorithm and hardware organization of the SPR. A detailed functional description of the sorting operation is given. Using the SPR a sorting algorithm achieves an order of complexity O(N) with specific sorting speed within a few microseconds per 16-bit word. The described hardware is convenient for VLSI implementation.
I . INTRODUCTION I t has been estimated that almost a quarter of the time business computers spend in sorting. Recent developments of database technology made sorting a l l the more important. The best software procedures take a time proportional to N*logN where N is number of data items to be sorted and are not e f f i c i e n t enough for some applications such as relational database systems. Fast sorting is p a r t i c u l a r l y important for database operations especially for "join" operations. The importance of sorting led to the development of hardware sorters 11,2,3, 71 with a sorting time proportional to N. Developing a r e l a t i o n a l database system f o r microcomputers we haven't found a cost e f f e c t i v e s o l u t i o n in e x i s t i n g hardware sorters. Conseque n t l y we developed a new Sorting Processor SPR, s t a r t i n g from ideas in 14,5,6!. I t is a i med at low-cost microcomputers as a general purpose s o r t - a c c e l e r a t o r . The basic method for hardware sorting is radix s o r t , with 64K r a d i x . The use of memory to perform sorting has been already proposed as address c a l c u l a t i o n sort 181. The idea was to calculate address of a key in a sorted l i s t using the value of a key. In a proposed method 141 the address is the value of the key i t s e l f e l i m i n a t i n g the need for post-sorting. However t h i s has not been widely exploited due to the
f o l l o w i n g reasons: a) RAMs have been r e l a t i v e l y expensive for a long time. b) Specific sorting speed may be poor i f the number of sorting data is small. c) C o l l i s i o n problem (repeated sorted d i g i t s ) The reason a) is of no concern anymore because of the low price of RAMs. Problem b) is solved by using Address Marked Memory (AMM) 14,91. I t is used to "mark" written cells of general purpose RAM and read only "marked" addresses in predefined order. That kind of access is achieved using a special device - the memory scanner Finally the c o l l i s i o n problem c) is solved by a convenient method of address chaining using Address Chain RAM (ACM). The SPR is realized as a special purpose board residing on a microcomputer system bus. The SPR accepts data words (snacks) to be sorted and generates an ordered l i s t of tuple i d e n t i f i ers (TIDs). Elements in that l i s t are either the f i n a l result or the head of l i s t containing the TIDs with equal key value. So, sorting is performed in a snack-by-snack, most significant d i g i t f i r s t , manner. We would like to emphasize that AMM in the SPR makes use of an original concept of memory sca-
*This work was supported by the Science Foundation of Republic Serbia, under Grant RZNS 7614/I
274
E. Jovanov et al / A Sorting Processor for Microcomputers mary containing sorted TIDs or heads of unsorted l i s t s .
nner. Except from the preceding p a r a l l e l organ i z a t i o n of the memory scanner 14,7,91 our research contributed in a Serial Memory Scanner (SMS). Besides f a s t s o r t i n g the SMS made also possible the f i l t e r i n g features of the SPR and easy change of the sorted order. I t introduces new tradeoffs and may be a b e t t e r s o l u t i o n f o r microcomputers. This paper presents the hardware concept of the SPR and execution of a sort operation. Realized hardware accelerations of a sortbased database operations are not covered by this a r t i c l e .
2. ARCHITECTUREOF THE SPR The SPR is realized as an i n t e l l i g e n t sort-accelerator on the "MULTIBUS" system board. I t execute primitive sorting sub-operations. A hardware organization is shown in Fig. I.
- Index Memory (IM) contains "mark" bits associated to every cell of SM. The hierarchical structure is employed to accelerate associative access to AMM. - Scanner makes use of IDX to "mark" written cells of SM and l a t e r , on the scanning phase generate only marked addresses. 2) ACM - Address Chain Memory contains unsorted l i s t of TIDs. 3) CONTROLUNIT generates a l l the control signals for the SPR. I t is a PAL based hard-wired control unit. The execution of the SPR operation is controlled through the command register. 4) The register f i l e is a set of special purpose registers. The most important are: RKey, key register accepts the snack of the key to be sorted
The SPR consists of the following: I) AMM - The Address Marked Memory is the most important part of the SPR. Functionally i t consist of: - Sort Memory (SM) that performs address calculation. I t is a standard RAM me-
- Rid, i d e n t i f i e r register contains the current tuple i d e n t i f i e r produced by the CPU or the SPR - Status, result status reqister
SPR
CONTROL UNIT register file
Qccess counter
<
local
bus > AMM
~//////////////,~ ~.
INTERFACE
s c o n n er////~/////////./r//" ADDRESS CHAIN MEMORY
SYSTEM
< Fig. I. The SPR hardware organization
BUS
>
275
E. Jovanov et al / A Sorting Processor for Microcomputers
5) ACCESS COUNTER counts accesses to the RKey register 6) INTERFACE connects the SPR resources to the system The main memory areas are SM and ACM. The capac i t y of both areas is 64K 17 b i t s words. The memory word is devided i n t o two f i e l d s : - data, 1 6 - b i t f i e l d containing TID - EOC, end of chain f i e l d denotes the l a s t member in a l i s t of TIDs with equal key value. We can also consider "mark" b i t from IDX as a f i e l d of the SM word. The SPR is seen as set of I / 0 ports. I t executes p r i m i t i v e sorting sub-operations in parall e l with the CPU s t a r t i n g immediately a f t e r the access to data registers. That p a r a l l e l i s m makes possible trading price f o r speed of the SPR according to the speed of the CPU.
Fig. 2. The state diagram of the SPR Algorithm I:
3. SORTING ALGORITHM Consider N keys residing in operating memory. Every key has m words (snacks) and an associated unique TID. According to r e l a t i v e position of key T[D value is in range IO,N-II. Assume i to be the current value of TID and z>~ the value of sorted snack. Sorting is performed snack-by-snack in less then m sorting passes s t a r t i n g from the most significant digit. One-snack sort is performed using the value of the snack to address the c e l l of SM in which the TID is to be w r i t t e n . When there are d u p l i cate snacks then a chain of TIDs with equal key values is formed. The head of the chain is in SM and a l l the other members are in ACM and we c a l l i t the equivalence class or E-class. The SPR makes use of an o r i g i n a l and elegant method of chaining members of E-class. Instead of maintaining pointers 17i we use the TID value as a pointer to the next class member. The t a i l of the chain has the EOC b i t set. The state diagram of the SPR is shown in Fig.2. In the "RESET" phase the index memory is cleared and the SPR is set to the i n i t i a l state. A f t e r wards,the f i r s t d i g i t sort is performed in the "STARTING LOADING" phase. In t h i s state the TID is r e g u l a r l y incremented f o r every data word g i ven to the SPR. So, the implemented ACCESS COUNTER supplies the current TID value e l i m i n a t i n g the need f o r w r i t i n g succeeding TIDs. Basic algorithms are given below
One snack sort
{* Algorithm executed by the SPR *} IF SMiDi I. mark=1 THEN {* The address is marked, w r i t e previous head of the address chain into ACM *} ACMIi[. data:=SM Di I. data; ACMIi I EOC:=SMDiI.EOC; {* and w r i t e i as new head of address chain *} SMiDil.data:=i; SMIDiI.EOC:=O; {* i t is not the end of address chain *} ELSE {* no duplicates t i l l now, TID is t a i l of E-class *} SMVDiI. d a t a : = i ; SMIDi]. EOC:=I; SMIDi i. mark:=1; {* "marking" *} Algorithm 2:
STARTING_LOADING
FOR i:=O TO N-I DO RKey:=Di; {* Output to SPR r e g i s t e r *} One snack s o r t ; {* Performed by the SPR *} An ordered l i s t i f TIDs accompanied by a status word is generated during the "SCAN" phase. The r e s u l t is generated by scanning marked addresses of SM. Therefore, they represent e i t h e r the f i n a l r e s u l t or the head of E-class. The E-class resides in ACM and is to be sorted using the next snack of key. A p a r t i c u l a r class is chosen by w r i t i n g i t s head i n t o a Rld r e g i s t e r of the SPR and sorted in a "NON STARTING LOADING" phase During E-class loading the SPR generates TID of the next member of the E-class. The CPU accepts the TID and writes the current snack of i t s key. The l a s t member has the EOC b i t set. A f t e r l o ading, the scanned r e s u l t is inserted into an intermediate r e s u l t obtained in the previous sorting pass.
276
TID
E. Jovanov et al / A Sorting Processor for Microcomputers
KEY 1 st word 4
SM EOC mark
relative address
data
2 nd word
0
6
I
0
2
2
I
1
2
5
3
fffe
ffff
4
I I
3
5
5
5
6
0
2
4
0
0
fffd
0
ffff
I
a) Double word data to be sorted r e s i d i n g in OM
I
relative address
I
0
fffe
ACM data EOC data
SM EOC mark 0 0
1
3
I
0
0 3
I
I
5
I
I
I 0
b) The content of SM and ACM a f t e r "STARTING LOADING"
c) The content of SM a f t e r "NON STARTING LOADING"
Fig. 3. The s o r t i n g example Algorithm 3:
NONSTARTING_LOADING
i:=head of address_chain; {* set head of address chain *} REPEAT {* Sort current snack of i - t h key *} RKey:=Di; {* output snack of current key *} next member:=acll .data; last-member:=AC .EOC=I; {* end of chain? * } One~nack s o r t ; {* performed by the SPR in parallel *} i : = n e x t member; {* input next TID * } UNTIL la~_member; The CPU repeats "NON STARTING LOADING" and "SCAN" phases f o r every E-class unless an unique sort order is determined or a l l m words are sorted. E x c ~ l e : Sorting of seven double word keys of Fig. 3a is considered. I n i t i a l l y , a l l the "mark" b i t s are cleared. The f i r s t snack sorting is performed according to algorithm I . The content of the SM and ACM upon t h a t phase completi t i o n is shown in Fig. 3b. During the "SCANNING" phase the intermediate r e s u l t 6,5,0,4,2 is obtained. The tuple 5 is the head of E-class containing also tuples 3 and I w r i t t e n in the ACM. Since tuple 5 is the head of the l i s t , the TID of the next member (3) is found in address 5 of ACM. Further, in address 3 resides TID I , the t a i l of address chain. During "NON STARTING LOADING" E-class is sorted. The content of SM a f t e r s o r t i n g the second word is given in Fig. 3c. Sorted class (3,5,1) is inserted in the intermediate r e s u l t giving the
f i n a l sorted l i s t .
4. DESIGN PARAMETERCONSIDERATIONS The SPR executes s o r t i n g operations in p a r a l l e l with the CPU and is to be tuned to match the processor speed. Therefore, the most l i m i t i n g f a c t o r is the CPU speed, In the loading phase the c r i t i c a l time is the CPU time required to fetch the next snack of the key and in the scanning phase to read the r e s u l t and work his status out. The SPR i t s e l f is tuned by s e t t i n g the period of system clock - Tclk. I t depends on the access time of the used memories Tma and is approximately Tclk=O.5Tma. The "STARTING LOADING", "NON STARTING LOADING" and "SCANNING" execution time are 7Tclk, 12Tclk and Tsearch+3Tclk respectively. The search time (Tsearch) depends upon the access time of IDX and i s , in the worst case, less then I0 /us. The number of data items that can be sorted using the SPR is equal to the ACM capacity. The realized version of the SPR sorts up to 128K keys {records). For increased number of keys larger ACM and a wider memory word of SM is needed.
277
E. Jovanov et al / A Sorting Processor for Microcomputers
II Fa- F I
I RAM
32-bit
cPo]
SPR
SPR I I locol bus extension
INTERFACE
>
<
SYSTEM BUS
b) SPR as a sort co-processor
a) The I n t e l l i g e n t SPR
Fig. 4. Future developments of the SPR
5. FUTURE DEVELOPMENTS
The described r e a l i z a t i o n of the SPR was used to v e r i f y the sorting algorithm and perform some measurements. Now, we plan f u r t h e r improvements of the SPR targeting two devices as shown in Fig. 4. Increasing i n t e l l i g e n c e w i l l aim at independent device - I n t e l l i g e n t SPR (ISPR). The ISPR w i l l increase system throughput executing sorting operations in p a r a l l e l with the CPU. I t can be useful for some applications to integrate the DMA controller on board. The other device is a sort co-processor. A s i mpler version of the SPR can be attached to the processor board. I t w i l l act as a sort accelerator on the local processor bus. The host board is e i t h e r a general purpose processor board or an i n t e l l i g e n t disk c o n t r o l l e r .
6. CONCLUSION We have described the concept and implementation of the SPR. The SPR is implemented as a general purpose sort accelerator achieving s o r t ing with an order of complexity O(N). As the SPR prototype is c u r r e n t l y being tested, f i n a l r e s u l t s w i l l be available s h o r t l y . They w i l l show way to f i n d the optimal organization for a p a r t i c u l a r kind of systems. Preliminary res u l t s j u s t i f i e d expected performance and sort acceleration of systems using SPR. We also plan a VLSI or ASIC r e a l i z a t i o n of the control u n i t and the memory scanner.
REFERENCES
11
Leilich, H.O. and Missikoff, M., Editors, 'Database Machines", Springer Verlag, Berlin, Germany, 1983. 12 Hsiao, D.K., '~dvc~ced Database Machine Architecture", Prentice Hall, Englewood Cliffs, New Jersey, 1983. 13 Sood, A.K., Quadeshi, A.H., Editors, "Database Machines, Modern Trends and Application", Springer Verlag, Berlin, Germany, 1986. [4 Aleksi~, T., University of Belgrade Enternal Report, P-576/86. ]5 Aleksi~, T., Popovi~, M., University of Belgrade Internal Report, P-577/86. 161 Aleksi~, T., Vela~evi~, D., Pejovi~, M., University of Belgrade Internal Report, P-579/86. [71 Raschid, L., Fei, T., Lam, H., Stanley, Y.W., '~ Special Function Unit for Sorting and Sort-Based Database Operations", IEEE Trans. on Comp., Vol. C-35, No 12, December 1986, pp. lO71-1077. 181 Isaac, E.J., Singleton, R.C., Journal of the ACM, 1956, pp. 169-174. 191 Vu~eti~, J.,'~ddress Marked Memories", M.S. Thesis, University of Belgrade, 1985.