246
Computer Physics Communications 61 (1990) 246—256 North-Holland
Transformation of weather forecasts from textual to cartographic form Stephan M. Kerpedjiev Institute of Mathematics, Acad. G. Bonchev St., bi. 8, Sofia 1113, Bulgaria
This paper analyzes the weather forecasts as a particular specialized text class. A data model of that class is created. It serves as a basis for the automatic transformation of weather forecasts from textual to cartographic form. The method proposed consists of data extraction, data translation and map generation. These stages are discussed in detail. A program system that implements this method is developed and an experiment designed to reveal the validity of the method is described. Some suggestions for future development of this work are made.
1. Introduction Weather conditions are an important factor that influences decision making in various spheres such as agriculture, air-traffic control, etc. In some cases the decision has to be taken under conditions of time deficiency, incomplete and uncertain information as well as other extreme situations. In order to compensate at least partially for the lack of time and facilitate perception, the weather forecast usually created in textual form is illustrated by a weather map. The transformation of weather forecasts from one form to another is a common operation performed by weathermen. The most usual forms of weather forecast representation are the textual and cartographic ones. In some cases tables are used as well. The map either supplements the text of the forecast with additional information or simply repeats its contents. In the latter case the construction of the map could be performed by analyzing the forecast contents and synthesizing an adequate map representation (fig. 1). In this paper the problems arising from the automatic transformation of weather forecast texts into weather maps are discussed. We consider the different forms of weather representation as different languages. So the textual representation of a weather forecast is obtamed by using certain natural language or more 001O-4655/90/$03.50 © 1990
—
precisely a certain subset of the natural language characterised by specific lexis, syntax and semantics. On the other hand the weather maps conform to the rules of the cartographic language which has a particular alphabet (a set of characters, symbols and pictograms), syntax and semantics. From a point of view such as this, the transformation of weather forecasts from textual to cartographic form is a specific sort of translation from one language to another. Hence, before creating a system for transformation of weather forecast texts into weather maps, we should study both languages and construct a mapping between the means of expression used in those languages. That mapping is implemented by a common internal representation of the weather forecasts (cf. fig. 1). The internal representation contains the most
weather forecast text ‘NN~ analysis
transfor~tion
N.
weather map
synthesis
internal representation Fig. 1. Analyzing the forecast contents and synthesizing an adequate map representation.
Elsevier Science Publishers BY. (North-Holland)
SM. Kerpedjiev / Transformation of weatherforecastsfrom textual to cartographic form
essential information in the forecasts identified by their conceptual analysis. The main results of the paper are presented in five sections. First, some related papers are considered in section 2. Then an overview of the text material is made in section 3. The conceptual analysis of weather forecasts is performed in section 4. The problems arising from the automatic analysis of the texts are considered in section 5. The generation of weather maps is described in section 6. Finally, the results obtained from an experiment designed to reveal the quality of the technique proposed are discussed and some intentions for future work are given in section 7.
2. Related work The METEO system [9] automatically translates weather forecasts from English to French. The system was developed in the mid seventies in Canada and in 1977 was completely integrated into the state network for weather information of the Canadian meteorological centre. The main dif-
247
ference between the METEO system and the present work is that the METEO system is a typical specialised machine-translation system and hence produces natural language texts in contrast to the weather maps produced by the system described in this paper. An application system for the automated construction of weather maps is presented in ref. [6]. This system can be classified as a weather map editor. The weather forecaster constructs the map by picking manually the symbols one by one from a menu and placing them onto the map in positions he or she has chosen. The maps constructed in advance are used by the weatherman when reporting the weather forecast during a television broadcast. The main differences between the system in ref. [6] and the present project are shown in table 1. In a broad sense this paper concerns the relationship between the verbal and pictorial forms of information representation. The paper [1] as well as some previous studies deal with the opposite problem, namely description of images by means of natural language text. In ref. [1] this approach
Table 1 Comparison of the weather map editor with the present project The weather map editor
The present project
The weather map construction is only partially automated
The weather map construction is fully automatic
The information in the weather map is intended to supplement the weather forecast being reported by the weatherman
The contents of the weather map is identical with the contents of the weather forecast
Table 2 Comparison of GRAFLOG with the present project GRAFLOG
The present project
The texts being analyzed are simple expressions of the type “this is a student”
The texts are really existing specialized texts with complex phrases and a certain text structure
The images consist of stylized graphic objects which obtain their meaning in the course of interaction with the user through natural language
The graphic objects are stylized weather symbols with semantics defined in advance
The graphic objects and the natural language expressions coexist, supplementing each other
The graphic image is obtained through analysis of the weather forecast text and has the same contents
Both transformations “image —~ text” and “text concerned
Only the transformation “text —+ image” is concerned
—+
image” are
248
SM. Kerpedjieo
/
Transformation of weatherforecastsfrom textual to cartographic form
is being experimented via the SOCCER system that reports in real time short sections of video recordings of soccer games. An experimental system for the construction of end-user interfaces that supports close relationship between graphic objects and natural language expressions is GRAFLOG [5].A brief comparison between GRAFLOG and the present project is made in table 2. In ref. [8] a system is described that reads stories (fables) and creates scenarios for computer annnation. The system parses each sentence of the story, extracts the assertions occurring in that sentence and matches them with the assertions from the neighboring sentences. As a result of the matching the system discovers hidden actions and interpolates or extrapolates some of the actions. In this way a scenario is obtained that conforms to the laws of the physical world. Although the ideas of both the story animation system and the present work are very similar, they differ essentially in the techniques of text analysis. Besides, the story animation system produces scenarios only and does not create any images.
The forecast texts are fairly unified and can serve as a model of specialized texts. The notion of specialized text was introduced in ref. [2] by specifying its features as follows: specialized texts are a human-oriented form of data representation; they do not satisfy any strictly defined (formal) rules but rather follow some traditions evolved in the subject domain; nevertheless, the strong specialization of the texts allows to formalize a greater part of them. Each forecast contains information (specific data items) for a period of three days more detailed for the first day as well as a rough forecast for the second and third days. This information consists mainly of assertions about particular weather items (cloudiness, rain- and snowfalls, wind, temperature, frost and so on) referred to certain regions of the territory of Bulgaria and to certain periods. Each assertion is expressed through a phrase in the text. The subphrases expressing the period and the region of the assertion are called time and region expressions, respectively. Three types of assertions can be distinguished: weather assertion states what weather is expected for a particular region and period (e.g. phrases (1) and (2) specify the degree of cloudiness); process assertion states how the weather will change during a certain period in a particular region (e.g. phrase (4) declares that the current intensity of the wind will be changing to a higher degree); administrative assertion it does not refer to a particular weather item but rather has an organizational function and ensures proper understanding of the other assertions (e.g. phrase (8) means that the following assertions should be referred to Tuesday and Wednesday). —
—
—
—
—
3. Text material The considerations in this paper are based on the weather forecasts published in the Bulgarian Rabotnichesko delo daily. Here is a sample of a weather forecast (the numbers in parentheses are used for referencing the different phrases): “(1) Today mostly sunny weather. (2) In North Bulgana rather cloudy with (3) some rainfall in the afternoon. (4) At the Black sea coast and in Dobroudja the wind will be nsing from the East. (5) Maximal temperature 15—20 degrees. (6) In Sofia about 15. (7) At the Black sea coast 20—22. (8) Outlook for Tuesday and Wednesday: (9) Sunny weather in the whole country. (10) On the first day in North Bulgaria still overcast. (12) No rains. (13) The temperature will be nsing and (14) on the second day will be 20—25 degrees.” The English text of the weather forecast follows the Bulgarian text as closely as possible so that the features used could be demonstrated.
—
—
—
.
.
—
—
4. Conceptual analysis 4.1. Data model The data model of a specialized text class should be created through conceptual analysis of the texts.
S.M. Kerpedjiev
/
Transformation of weatherforecastsfrom textual to cartographic form
249
Table 3 The weather submodel Weather items
Attributes
Values
Changes
cloudiness
degree
sunny, broken cloudiness, overcast, variable cloudiness, rainclouds
clearing breaking thickening
precipitation
type intensity
rain, sleet, snow, hail slight, occasional, moderate, drizzle, heavy, shower
direction
east, south-east, south, south-west, west, north-west north, north-east light, moderate, high
wind
intensity
starting stopping
rising falling
temperature
maximal minimal average
qualitative: hot, warm, cool, cold, frosty quantitative; (by the Celsius scale) integer numbers
rising falling
mist
thickness
mist, fog
breaking thickening
Such a conceptual analysis was performed for the weather forecasts. The created data model is decomposed into three submodels: weather model, time model and territory model. In the data model presented here a very important aspect of information is missing the degree of certainty for each assertion of the forecast. This information is being examined at present. The data model can be considered as an internal language of the weather forecasts as well.
relations are essentially used for recognition and handling of inconsistent assertions. As seen from table 3, the attributes with qualitative values prevail over the quantitative attributes. The qualitative nature of the weather submodel makes its structure look like the system model descriptions in ref. [3] used as the basis for qualitative reasoning about physical systems.
4.2. Weather submodel
Each assertion of the weather forecast refers to a certain period of time. It could be a whole day (“today”, “Monday”, “the second day”), a part of the day (“ the morning”, “the evening”), a couple of days (“ Monday and Tuesday”) or a couple of parts of the day (“about and after noon”). For the sake of conceptual simplicity only a single day or part of the day could be a unit in the internal representation of the assertions. Hence each assertion that refers to a couple of days or parts of the day (such assertions are called compound) should be decomposed into two assertions for the same weather item and region but referring to the different days or parts of the day. The time model used in the forecasts from Rabotnichesko delo is represented in table 4. For example, the assertion corresponding to phrase (1) in the sample weather
—
The most important weather items, their attributes with their possible values and changes are presented in table 3. Along with the definition of attributes and values, the weather submodel includes the relations that exist between some of the weather items. For instance, sunny weather and any sort of precipitation cannot occur at the same place and time. This kind of relation holds also between the wind and mist weather items and is called incompatibility, A more elaborate version of this relation exists between the temperature and precipitation weather items, namely the higher values of temperature are incompatible with snow or sleet while its lower values are incompatible with rain or hail. The
4.3. Time submodel
250
SM. Kerpedjiev
/
Transformation of weatherforecasts from textual to cartographicform
is a successor of another node y if x is a part of y. For example, the Black sea coast is a part of East
Table 4 The time model Day
Bulgaria. If a node x is a successor of two or more regions, then it is a part of their intersection. For example, the Sofia field is a part of both West Bulgaria and the planes of Bulgaria. Hence the expression “the planes in West Bulgaria” refers in particular to the node corresponding to the Sofia field region. The nodes of the graph are grouped in several layers in such a way that the successor of any node is always in a lower layer than its predecessor. This representation provides for simple techniques of reasoning such as the one described earlier and essentially facilitates the translation of region expressions. These might be either simple or compound. A simple region expression is either a proper name of a region (as “the Black sea coast”) or an intersection of two regions specified by their names (as “the mountains of South Bulgaria”). In the former case the node corresponding to the region with the proper name mentioned is the value of the region expression. In the latter case this is the node of the highest level that is a successor of both regions mentioned in the expression. The compound expression is a sequence of sim-
Part of the day whole day
morning noon
after noon
evening
a
Today Tomorrow After tomorrow
forecast refers to the cell marked by a cross, while phrase (3) refers to the cell marked by a circle, 4.4 Territory submodel The territory of Bulgaria is represented through a layered oriented graph (a sample graph containing all the regions mentioned in this paper is presented in fig. 2). The nodes correspond to the different regions. Only regions important from the meteorological point of view and hence often mentioned in the weather forecasts are included in the territory graph. The nodes can be named, i.e. having proper names such as “The Rhodopes”, “Sofia” or unnamed, i.e. nodes that can be obtamed from the named ones by applying different operations union, intersection, difference. The arcs represent the relation of inclusion. A node x —
Bulgaria ___
___
North Bulgaria
East Bulgaria
_____
_____
i/i
North-East Bulgaria __________
The Dobro~xlja
/
The Black sea coast _________
The North Black seal coast
varnaj
~
The planes of Bulgaria
1
West Bulgaria
South Bulgaria
______
____
_____
__
______
The planes of West Bulgaria
]
The Sofia field
[Sofia
(
South-West Bulgaria
The Thracian lowlands
__________
___________
The mountaines of South-West Bulgaria
r~iiaI
j Pir~j
Fig. 2. Bulgarian regions used in the project.
1
The moimtames of Bul~ar~J
The mountames of S. jBulgaria
The Ehodopes ________
S.M. Kerpedjiev / Transformation of weatherforecastsfrom textual to cartographic form
ple expressions connected either by commas or by the conjunction “and” (consider for example phrase (4)). Their translation is discussed in the following section. 4.5. Internal representation The weather forecast is internally represented as a list of assertions. The internal representation of the weather assertion is a triple (w, t, r), while that of the process assertion is a quadruple (a, p, t, r). In this notation w stands for weather item, a for attribute, p for weather process, t for time period, r for territory region. The administrative assertions are not stored in the internal representation. The element w is a variant structure that depends on the kind of the weather item. The structure contains several fields corresponding to the different attributes (e.g. the structure of the precipitation weather item consists of two fields corresponding to the type and intensity attributes). Each field contains a value extracted from the corresponding phrase of the forecast. Some of the fields may be undefined. The elements a and p indicate the name of the attribute that is changing and the direction of change, respectively. The element t points to the corresponding cell of table 4. The element r points to the corresponding node of the territory graph. For example, the internal representation of assertions (1) and (4) in the sample weather forecast should be: CLOUDINESS.DEGREE = SUNNY, TIME = [TODAY, WHOLE DAY], REGION = BULGARIA and
251
5. Text analysis The analysis of weather forecasts is performed according to the scheme described in ref. [2] using the technique from ref. [4]. In brief, this scheme consists of two major stages: data extraction and data translation. Data extraction means marking those substrings in the text that represent the data items being extracted. It is performed by analyzers transition networks resembling to a certain extent the well-known augmented transition networks. Data translation means transforming the substrings marked during the data extraction stage into an internal representation of the data items. In the weather forecast case the phrases corresponding to the assertions are extracted first. Then their translation is performed. It consists of: (1) extracting the data items of each assertion; (2) translating each data item into its internal form according to the data model described in the previous section; (3) decomposing the compound assertions, if any; (4) completing the effiptical constructions, if any; (5) resolving the inconsistent assertions, if any. Finally, after the text has been analyzed, the weather map is generated (this process is described in section 6.2). —
5.1. Decomposition of compound assertions The compound assertion refers to multiple periods and/or multiple regions, i.e. it contains compound expression (time, region or both). Consider the following assertion: “On Tuesday and Wednesday sunny weather is expected in South Bulgaria and at the Black sea coast.”
(ATFRIBUTE = WIND.INTENSITY, PROCESS = RISING FROM EAST, TIME = [TODAY, WHOLE DAYJ, REGION = BLACK SEA COAST and DOBROUDJA)
It contains compound time andinto region sions. Therefore it is decomposed the expresfollow-
respectively. The order of the assertions in the list follows the order of their occurrence in the next,
Sunny weather in South Bulgaria on Wednesday. Sunny weather at the Black see coast on Wednesday.
ing four simple assertions: Sunny weather in South Bulgaria on Tuesday. Sunny weather at the Black see coast on Tuesday.
252
S.M. Kerpedjiev / Transformation of weather forecasts from textual to cartographic form
5.2. Ellipsis resolution Certain expressions are elliptical (incomplete) in a sense that they miss explicit information either about the period or the region they refer to. For instance phrase (1) from the sample weather forecast misses the region, phrase (2) misses the period, while phrase (5) misses both. Nevertheless the reader perfectly complements these phrases with the right information. In order to do this automatically the rules given below are used. Let us introduce two backgrounds for each of both components time and region and call them short-term and long-term backgrounds. The background corresponds to a certain data item which under given conditions complements the elliptical assertion. 1. The short-term background is created when a complete (from the point of view of the corresponding component) assertion occurs and is valid until the next complete assertion occurs. For example the phrase (2) creates a short-term region background with the value “North Bulgaria”. The short-term background complements an elliptical assertion when: (a) the latter follows the assertion creating the background in the same sentence (cf. the complement of phrase (3)); (b) the incomplete assertion contains a demonstrative or a relative pronoun for the time or place (“then”, “there”, “when”, “where”).
2. The long-term backgrounds are initialized with the values “today” for time, and “Bulgaria” for region. They change their values in the following cases: (a) When the administrative assertion “Outlook for x and y” occurs (x and y are any two successive days of the week). Then the background for time is assigned the value “x and y” while the background for the region is assigned the value “Bulgaria”. (b) When the following pattern of text structure occurs:
.
ture referring to a specific region X> The last assertion creates a region long-term background with the new value X. This rule takes mto account the ordering of the assertions m the weather forecasts from Rabotnichesko delo. .
.
The long-term backgrounds are always used where the short-term backgrounds are not applicable. 5.3. Handling of inconsistencies Consider each weather forecast as a set of functions defined on the XY-plane shown in fig. 3 (for the sake of simplicity consider the territory
f (weather attribute)
Bulgaria
Fig. 3. The XY-plane used to define the functions.
S.M. Kerpedjiev / Transformation of weatherforecastsfrom textual to cartographicform
regions as linear intervals on the Y-axis). Each function defines the value of a concrete weather attribute characterizing the corresponding region of the XY-plane. These functions are evidently determined by the assertions of the forecast. From such a point of view we could formally denote the weather assertions by expressions of the type f(a, A), where a is the attribute, A is the region of the XY-plane, f is the weather attribute value. Two assertions f’(a, A) and f”(b, B) are recogni.zed to be inconsistent if A fl B * 0 and either a = b and 1 *f or f and f are incompatible values of different attributes (cf. the weather submodel in section 4.2). For instance, consider phrases (1) and (2). They both concern the attribute “degree of cloudiness”, Assertion 1 determines “sunny” weather for the rectangle OAEC while assertion 2 determines “rather cloudy” weather for the rectangle OADB. As a result the value of the attribute “degree of cloudiness” for the intersection OADB remains ambiguous. The resolution of such inconsistencies is performed according to the following rules: 1. Given two inconsistent assertions f ‘(a, A) and f “(b, B) such that A c B, replace them by the consistent assertions f ‘(a, A) and f “(b, B A). 2. Given two inconsistent assertions f ‘(a, A) and f”(b, B) such that A fl B * 0, A B * 0, B A * 0 and the phrase of f’(a, A) follows the phrase of f “(b, B) in the text, replace them by the consistent assertions f ‘(a, A) and f “(b, B A). These rules take into account the following heuristics: the more specifically the information is given, the higher priority it has (rule 1); and the more recently the information is given the higher priority it has (rule 2). ,,
,
.
~
—
—
—
6. Weather map generation 6.1. General considerations
,
•
.
. -
15
22
-
,‘—~
Fig. 4. Sample weather map.
diness, rainfalls, wind, temperature) as well as their change and meteorological measurable quantities (such as air pressure) or phenomena (such as cyclones and anticyclones) that cannot be directly sensed by human beings but are used by the weather forecaster. These two types of information are usually represented on the map by three different types of graphic images: lines (for representing baric fields); special stylized symbols and pictograms (for the representation of certain weather items such as sunny or cloudy weather, rainfall, wind, mist); characters (mainly digits for representation of numerical values of temperature, pressure or some other quantity). The forecasts from Rabotnichesko delo concern only weather items and therefore the maps being constructed contain only special symbols and characters. A sample map corresponding to that part of the weather forecast from section 3 that refers to the afternoon of the first day is presented in fig. 4. It contains the symbols for “sunny weather”, “rather cloudy”, “rainfall”, “wind from the East”, as well as the numerical values of the maximal temperatures. The positions of the symbols depend on the regions they refer to. Some larger regions (as North Bulgaria, the Black sea coast) require more than one symbol for the same weather event. The region representation on the map is quite schematic usually only state borders are drawn, in some cases the names of regions or cities are written as well. In fig. 4 only the border of Bulgaria is drawn. — —
—
—
The weather map represents the meteorological conditions in a static visual form. It may contain information for both weather items (such as clou-
÷_._
~ ,
.
—
253
254
S.M. Kerpedjiev
/
Transformation of weather forecasts from textual to cartographic form
Time is most difficult to represent on the map. This could be done in several ways: by changing or moving some symbols on the map in correspondence with the weather change (a simplified form of animation); by preparing different maps for the most important periods of the forecast; by symbolic representation of weather processes with or without explicit time notation (cf. the representation of process assertions described below), The first and third ways of time representation emphasize the development of the process of weather change. —
—
—
6.2. Map generation After the text analysis has been accomplished a weather map is generated. The map generation is performed through scanning the list of assertions concerning the corresponding period and transforming each assertion into one or more identical weather symbols on the map. The choice of the symbol is made according to the attribute values of the weather item concerned in the current assertion. Each process assertion is represented symbolically into one or more sequences of three special symbols each. The first symbol denotes the initial state of the corresponding weather item, the second one is always a double arrow and means that a change is being represented, and the third one denotes the final state of the weather process (e.g. fig. 5 represents the process of clearing), The different symbols are constructed preliminarily in a raster form and in the same size through a specialized symbol editor. The positions of the symbols on the map are determined using the following technique. The map is divided into a number of rectangular spots, each one having the size of one weather symbol. A set of spots is assigned to each weather item. The sets of the different weather items should not intersect, thus each spot is reserved for no more
~
,
Fig. 5. Example of a weather change on a map.
than one weather item. A set of spots is assigned to each region from the territory model as well. Then the spots obtained by intersecting the sets of both the weather item and the region of the current assertion are selected for placing the weather symbol. In this way, after carefully designing the spot disposition, the weather symbols arrangement process becomes quite simple and produces maps of acceptable quality. One weather forecast is represented by several maps corresponding to different periods. The exact number of the maps depends on the differences between the assertions referring to these periods. It varies from 1 (for the whole period of the three days) to 12 (for each 4 parts of the three days). Usually a separate map is generated for the whole first day (or its 4 parts) and a common map for the second and third days. The symbolic representation of process assertions allows to reduce the number of weather maps for the whole period to a minimum. Let us note that two weather assertions (w’, t’, r’) and (w”, t”, r”) such that w’.a * w”.a, r’ fl r” *0 and t” follows t’, in fact implicate a new process assertion (a, p, t, r) where p is a function of w’. a and w a, t is the minimal time period that includes both t’ and t”, and r = r’ n r” can be found through the algorithm described in section 4. In this way the differences in the weather conditions during the period under consideration could be represented through process assertions relating to a longer period, thus avoiding the necessity to divide the weather forecast into so many maps and obtaining a more compact cartographic representation. “.
7. Experimental results, discussion and future work 7.1. Experimental results An initial experiment was carried out designed to reveal the validity of the considerations in this paper. For this purpose a program system that automatically transforms weather forecast texts to weather maps using the technique described was implemented. Five groups consisting of 20 original weather forecasts each were analyzed by the system and the corresponding maps were generated.
S.M Kerpedjiev
/
Transformation of weatherforecasts from textual to cartographic form
The assertions obtained in the internal representation are called extracted assertions. Besides the same corpus of weather forecasts was analyzed by the author using the same data model. The assertions obtained are called relevant assertions. Both sets of extracted (E) and relevant (R) assertions of each weather forecast were compared and a new set of the extracted relevant assertions (C = E n R) was formed. By analogy with the information retrieval field [7], the validity of the method proposed was evaluated through two quantities called recall (r) and precision (p) and defmed through the following ratios: r=
CI RI’
‘°
IC I IEI
The results obtained in terms of recall and precision are shown in table 5 It is important to note the reasons for the extraction of irrelevant (incorrect) assertions as well as for the inability of the system to extract relevant assertions. The main reasons can be classified as follows: deviation of certain assertions from the data model accepted, i.e. the corresponding assertions concern weather items, attributes, values, changes, territory regions or time periods missing from the data model; deviation of certain phrases from their formal description, i.e. the corresponding phrases contain words missing from the dictionary or Sequences of lexemes that do not match the corresponding analyser; spelling errors in the weather forecast texts, which in turn leads to deviation from the formal descriptions of the corresponding phrases. —
—
—
7.2. Future work The internal representation of the weather forecasts defined in section 4.5 is rather universal. It was designed to capture as much information from the original texts as possible, yet being sufficiently formalized to allow easy automatic processing. In this paper it was shown how the internal representation could be used for weather map generation. However, a number of other types of processing
255
are feasible. For instance we could formulate the opposite problem given the internal representation of a weather forecast, retell it in a form satisfying certain requirements. These could be: specific ordering of the assertions in the new text (by time, region or weather item); creation of an abstract of the weather forecast within one sentence; retelling this part of the weather forecast that refers to a particular region or period only. Another problem is the automatic comparison of two forecasts produced by different meteorologists. Some of these problems require additional techniques of reasoning in the data model. The conceptual analysis of the weather forecasts performed in this paper makes possible fur—
—
—
—
ther experiments for evaluating the degree of perception of weather forecast contents by human beings when different forms (such as text, speech, map or combinations) for its representation are used. Again the view on the weather forecast as a set of assertions would allow to evaluate the degree of perception in terms of recall and precision just as we did in the experiment described earlier in this section. The considerations in this paper are based on texts from one source only. Weather forecasts from other sources including other Bulgarian and foreign newspapers, weekly and monthly forecasts are going to be explored and the method proposed here will be tried on them. All these new classes of weather forecasts require a more or less significant modification of the formal description of the text elements: the dictionary, the phrase syntax and the text structure. Some of the classes also require partial modification of the data model (e.g. the forecasts from foreign newspapers need an entirely different territory submodel, the weekly and monthly forecasts need an extended time submodel, etc.).
Table 5 Results in terms of recall and precision
r ~v
group 1 0.93 0.97
group 2 0.82 0.94
group 3 0.90 0.92
group 4 0.85 0.90
group 5 0.90 0.85
256
SM. Kerpedjiev
/
Transformation of weather forecasts from textual to cartographicform
8. Conclusion The transformation from textual to cartographic form was considered in the case of the weather forecasts from the Bulgarian Rabotnichesko delo daily newspaper. The automatic transformation was produced as a composition of three subprocesses, namely data extraction, data translation and map generation. The features of the weather forecasts considered as a particular specialized text class were given a special emphasis. These features were essentially used for the data model definition, the formal phrase description and for overcoming such phenomena as ellipsis and inconsistency. The system that automatically transforms weather forecast texts into weather maps was ~ plemented on an IBM PC/AT microcomputer with EGA monitor using the system for specialized text analysis described in ref. [4]. The map generation was produced through a cartographic system being developed at the Software department of the Institute of Mathematics. At present the system is being adopted by the Centre of Hydrology and Meteorology at the Bulgarian Academy of Sciences.
Acknowledgements I greatly appreciate the useful discussions with Peter Barney on the general problems of information transformation from one form to another. I
am very much obliged to Ivan Bosov whose assistance in implementing the map generation procedure was essential. I would also like to thank all my colleagues for reading the draft and giving me valuable comments. This work was partially supported by the Ministry of Culture, Science and Education according to contract no. 607.
References [1] E. Andre, G. Herzog and Th. Rist, in: Proc. European Conf. on Artificial Intelligence, ECAI 88, Munich, 1—5 August 1988, p. 449. [2] P. Barney and S. Kerpedjiev, SERDICA Bulg. Math. Pubi. 13 (1987) 137. [3] B. Bredeweg and B.J. Wielinga, in: Proc. European Conf. on Artificial Intelligence, ECAI 88, Munich, 1—5 August 1988, p. 195. [4] 5. Kerpedjiev, SERDICA BuIg. Math. PubI. 13 (1987) 239. [5] L.A. Peneda, E. Klein and J. Lee, Comput. Graph. Forum 7(1988)97. [6] I. Pissanov, A. Deiysky and D. Ivanov, in: Proc. PERSCOMP 2nd Nat. Conf. Personal Computers, Sofia, 21—24 April, 1987) vol. 1 (Bulgarian Academy of Science, 1987) p. 500 (in Bulgarian). [7] G. Salton, in: Proc. ACM Conf. Research and Development in Information Retrieval, Nsa, 8—10 September 1986 (ACM, New York, 1986) p. 1. [8] H. Shimazu, Y. Takashima and M. Tomono, in: Proc. COLING 88, Budapest, 22-26 August 1988 (John van Neumann Soc. for Computing Sciences, Budapest, 1988) p. 620. [9] B. Thouin, in: Practical Experience of Machine Translation, ed. V. Lawson (North-Holland, Amsterdam, 1982) p. 39.