MS160 ultra fast non-numeric coprocessor

MS160 ultra fast non-numeric coprocessor

Application note MS160 ultra fast non-numeric coprocessor Eirik Stephansen The MS160 represents a new technique for performing information retrieval...

767KB Sizes 1 Downloads 48 Views

Application note

MS160 ultra fast non-numeric coprocessor Eirik Stephansen

The MS160 represents a new technique for performing information retrieval and pattern recognition. Until recently, almost all searching of large amounts of data have been based on the technique of indexing data. Indexes offer fast access to indexed data, but make adva nced que ries slow, or not possible at a ll. The MS1 60 offers a new and fast wa y to deal with information retrieval without indexes. Keywords: information retrieval, coprocessor, no indexing

WHAT IS MS1607

THE WINDOWS

The MS160 (Figure 7) is a massively parall el, fully customized VLSI chip, w ith the capability of searc hi ng through data at 160MB s- 1 . The 64 bit data bus is organized in groups of 8 bits. Each of these groups may be rout ed to on e or more of eight wi ndows (Fig ure n. Each window is capable of searching for a wo rd wi th a length betw een 1 and 32 bytes. The wo rd may be an exact w ord (lik e 'Norway' ) or a range (like ' [60-65 1') or any combination (like 'Norway 19[60-6 5]'). In addit ion, each wi ndow is capable of remembering a hit for a predefined period of time (number of records shifted). Wh en the wi ndow 's match conditions are satisfied, it signals a wi ndow match Lo the windows match logic (Figure n. The w indows match logic is programmed to signal a hit (or interrupt) when a predefined combination of windows has a match.

Each of the eight windows contains a 32 byte shift register with PEs* (Figure 2). Each register element has two comparators, checking for match with an upper and a lower bound. Each

Microway.MRT, Research ilnd Development

Department, Trondhelrn.

Norw ay

boun d is indivi dually programmable, and gives the possibility of matching the data within any continuous interval in the byte range. Wh en both the upper and lower bound of each el ement are sat isfied, the element has a match. For items larger than one byte, subsequent bytes can be combined into data fields. Data fields may con sist of up to eight bytes for any in terval test and up to 32 bytes! when testing for equal ity. The length of a data field is defined with a field separati on mask. In Figure 3 an ex ample window configuration is shown (searching in a telephone directory). This configuration searches 'B y combi ning w indows th is may even be ex tended to 256 kbytes

' Processing e lements

W indow M atch

I

WI ndOW M atCh

~

A

WinaowlJ ~ch logic Data

Router ConfiguraUon

Window Hit M as k COotigura too

Con figuration

1

Figure1 TIlt' MS 1on

\n

0141-9331/93/090561-04 1993 Butterworth-Heinemann Ltd Microprocessors and Microsysterns Volume 17 Number 9 November 1993

561

MS160 ultra fast non-numeric coprocessor: E Stephansen

MATCH LOGIC

in REPORT mode the chip stores the address of the hit each time a hit occurs. In COUNT mode the chip counts the number of hits. The second mode selection is the STOP/CONTINUE. When the chip runs in STOP mode, the chip stops the data flow until an acknowledgement has been given. In the CONTINUE mode the chip stores the hit address internally but continues to search. If new hits occur before the old is read out, the new hits are lost. The last mode selection is the FLANK/HIT mode. When the chip is in FLANK mode it will generate only one hit for subsequent hits, otherwise (in HIT mode) the chip will generate one hit for every hit. Figure 4 gives an explanation of the two modes.

TO WINDOWS MATCH LOGIC

DATAPATH

Figure2

A window

the window has a match (window match).

for a word whose value is [AA-GEL followed by a field whose value is

[142000-160000]. Each wi ndow can be set up to remember a match for a programmable length of time (distance in data space). This latency opens for context sensitive searching by allowing the user to put various constraints on the ordering of and the relative distance between objects. The feature is designed as a counter that is set to a user programmable value upon a match. The counter is set on every match and then decremented for each record* shifted. As long as the counter is active

DATA ROUTING

THE WINDOWS MATCH LOGIC Each window reports its match to the windows match logic (Figure 7). The combination of matches from each window is used as an 8 bit address to a 256 bit user programmable RAM (hit mask). Depending upon the value at the given address in the hit mask (0 = no hit, 1 =: hit), the chip generates a hit.

The chip receives data through a 64 bit data path. The path is organized in streams of 8 bits. Each of these eight streams may be routed to any window (see Figure 5). The data router is bui It up of three levels of muxes. The first and second levels of muxes make it possible to route any bit stream to any window. The third level of muxes is used to chain two or more windows into super windows (64-256 bytes).

MS160 MODES 'The chip has a feature that allows a hit to occur only when the data stream aligns a record. The record length is programmable from 1 to 256. With a record length of 1 (most common value) a window may have a match for every shift

HOST INTERFACE The chip has several programmable modes. The first is the REPORT/ COUNT selection. When the chip is

The host interface is based on an 8 bit bidirectional port. Read, and write

a:

~

RECORD

g~ (~ we.. _W \

~~E

'\

\~

\

~A~

FIELD #2

FIELD #1

FIELD #3

A

A

A

Y

Y

1

1

1

1

1

J

0

H

N

$00 $00 $00 $00 $00

0

1

1

G

E

D

0

E

A

A

$00

1

1

$FF $FF $FF

$00 $00

\1/ \~ ~ ~ \~ ~ W W ~ ~

0

0

0

0

0

1

IE-.-lCONFIG

1

6

0

0

0

0

fE-

1

4

4

5

5

7

1

4

2

0

0

0

~

~

~

W W W

MATCH LOGIC

t TO WINDOWS MATCH LOGIC Figure 3

562

\ PE 1 PEO

15 PE14

$FF $FF $FF $FF $FF

\

~\

An example window configuration

Microprocessors and Microsystems Volume 17 Number 9 November 1993

\

&

DATA

MS160 ultra fast non-numeric coprocessor: E Stephansen

• How many customers have at least $1 million in their account and have been customers for at least three years?

Clock Search criteria satisfIed Onehit for every clocktic.

Non Flank Mode

Flank Mode Figure 4

Jle-

..:::..:=..=..:.;;o:.--_ _ Onlyone hit.

Flank/hit mode

,-t

0(6356)

D(5548)

0(47,40)

0(39,32)

0(31,24)

0(23,16)

D(15,8)

0(7,0)

Today's system with indexes makes even this simple query a little tricky. With the MS160 this kind of question is easily solved by high speed parallel processing. To give an estimate of the time consumed by the MS160, imagine that each customer record is 150 bytes long. This gives a total amount of 300 MB in the database. Since this question requires the use of two windows, the MS160 uses four parallel data streams (giving a throughput of 80MBs- 1 ) . The total processing time will be about 3.8 s (using no indexes at all). Complex search

I-

As mentioned previously, the chip has eight windows. Each of these windows has the possibility to remember a hit for a given period of time. This advantage makes the MS160 ideal for doing complex searches (logical combinations of several words). An example is:

E]

Figure 5 The data routing network

cycles can be carried out both asynchronously and synchronously. The total configuration consists of 828 bytes, making a complete reconfiguration possible in around 100 p,S4* . In addition, it is possible to make minor changes in the configuration by addressing through the internal address register of the MS"160. This makes the configuration even faster. An example host configuration is shown in Figure 6.

Data flow at high speeds

Imagine a stream of data (30 bits wide at a speed of 10 MHz). One wishes to make sure that the signal, which might be an analogue sampled digital signal, does not contain a high peak for some given frequency components. We design a system as shown in Figure 7. In this application the MS160 is capable of examining several frequency components for unwanted values. The examination is carried out in real time. Large amount of data

MS160 APPLICATIONS

One may ask for which areas the MS160 is potentially well suited. The answer is simple, Use the MS160 wherever there are large amounts of data to be searched for, complex searches, or when data flows at high speeds. The following gives examples on each of the three mentioned areas. 'Limited by the chip write access time

The second application demonstrated searching in a large amount of data, either raw text or structure information. Imagine a bank information system. The system has about two million customers and the bank's financial position depends upon it having a minimum amount of customers with at least $1 million in their account. These customers should also be long term customers. The question to ask the system is:

• Search for any article containing at least one of the following words: 'chip', 'circuit', 'integrated' in addition to the word 'MS160'. Since the MS160 is using four windows to map this search, the total search speed is 40 MB s-I (40 MB is equivalent to 20 000 A4 pages or eight times the complete works of Shakespeare). Another example: • In a customer's database, look up every customer located in the north of Norway (ZIP> 7000) with at least 10 employees and with monthly sales above $100 000. If each customer record consists of 200 characters and the total number of customers is 100000, the MS160 will do this search in about 0.5 s. For the interested reader: How long will a relational database (with indexes) take for the same query? THE MS160 SEARCH ENGINE

The MS160 search engine is a PC board for the ISA bus. The board holds both the MS160 coprocessor and a

Microprocessors and Microsystems Volume 17 Number 9 November 1993

563

MS160 ultra fast non-numeric coprocessor: E Stephansen

Telephone directory

SYSTEMBUS RESULTS

MEMORY Figure6 Typical application

large amount of dedicated memory (up to 320 MB of standard low cost DRAM). Figure 8 gives an overview of the system.

and Identification System). This application allows the user to search for almost any type of information in any type of data. The most obvious usage though is searching for text.

Biochemical databases

Biochemical databases consist of huge amounts of character-based information. The users of such databases have a demand for finding certain combinations of character sequences. An application supporting this kind of query has been developed.

Text retrieval

The MS 160 search engine is suppl ied with the IRIS (Information Retrieval

Signal source

-

More information

More information may be requested from Microway.MRT, Strernsveien 74, N-2010 Strarnrnen, Norway. Fax +4763 80 12 12.

MS160 SEARCH ENGINE APPLICATION

The MS160 search engine is a new product on the market. However, several applications have already been developed:

Often when there is a question about a telephone number, the user is not able to give an exact specification of the query. The user may, e.g. know that the person he/she is searching for lives in a street called '21st' and the last name ends with 'undsen'. The MS160 is ideal for this kind of fuzzy information retrieval. The MS160 search engine is supplied with a complete toolkit to ease programming. The toolkit should make it easy for any C++ programmer to develop applications. All of today's applications are based on this toolkit. In addition to the mentioned applications, Microway.MRT are developing a cl ient-server toolkit for developers.

REFERENCES 1 MS160 Ultra FastNon Numeric Coprocessor, Dat ashe et, Microway.MRT #GI160-1, MS 160 - Norway 2 Stephansen, E and lerijservl, K-S Fundamental softwarefor fuzzy information retrieval based on new hardware, Diploma thesis, lOT, University of Trondheim, Norway (December 1992)

-> Figure7 Real-time application

FFT "Fast Fourier Transformation

r----- --- -------- -- -- - -,------ --- -- ---- -- - --- ------- - ,....------ --- ------- I

MS160 Search Engine

PC Interface

Results

Data Transfer Control Logic

64 bit data path FigureII The MSI60 search engine

564

Microprocessors and Microsystems Volume 17 Number 9 November 1993

MS160