Fuzzy Sets and Systems 160 (2009) 2241 – 2252 www.elsevier.com/locate/fss
Managing uncertainty in location-based queries G. Bordognaa,∗ , M. Pagania , G. Pasib , G. Psailac a CNR-IDPA c/o POINT, via Pasubio, 5-I-24044 Dalmine (BG), Italy b Università di Milano Bicocca, DISCO, via Bicocca degli Arcimboldi 8, 20126 Milano, Italy c Università di Bergamo, Facoltà di Ingegneria, Viale Marconi 5, Dalmine (BG), Italy
Available online 26 February 2009
Abstract Location-based queries (LBQ) are becoming more and more useful in location-based services (LBSs) such as those provided through mobile phones, personal digital assistants (PDAs), and laptops. They are context aware since they support the access to information by taking into account the spatial context of the user when submitting the query, and the spatial location of the searched information (instances). Generally, the key-selection condition is a constraint on the distance of the instances from the user location. One deficiency of current approaches in evaluating LBQs is the fact that they do not manage the uncertainty that often characterizes the knowledge of either the user position or the searched instances or both of them, thus they do not produce query answers with estimates of their possible validity. In the paper, after analyzing the processes involved in a LBS that may generate uncertainty, a model for representing and evaluating LBQs affected by uncertainty is proposed, in which uncertainty and imprecision can affect both location information and the spatial condition, i.e., the query scope. Distinct situations of uncertainty in LBQs are analyzed and for each of them a two-step evaluation procedure is proposed based on a fixed-cost filter phase and on a refinement phase that produces ranked results reflecting an estimate of their validity. © 2009 Elsevier B.V. All rights reserved. Keywords: Location-based service; Uncertain location-based query; Possibility distribution; Soft spatial constraint
1. Introduction The impressive growing market of mobile devices, such as mobile phones, personal digital assistants (PDAs) and laptops has greatly influenced the lifestyle of communications, allowing to access the Internet from everywhere. At the same time the diffusion of positioning technologies such as the global positioning system (GPS), radio frequency identification systems, and global systems for mobile communications has given more impulse to the design and production of location-based services (LBSs) [1–4]. This stimulated the research interest for more efficient and effective ways to evaluate location-based queries (LBQ) constraining the spatio-temporal context of the searched information (that may be either resources in remote repositories or web pages on the internet, hereafter named instances), to the user’s location. In fact, LBQs are context aware queries, since their basic selection condition is a spatial constraint on the distance of the instances from the user location. One deficiency of current approaches is the fact that the mobility of users and, in some applications, also the varying position of the resources cannot be detected precisely [5,6]. ∗ Corresponding author.
E-mail address:
[email protected] (G. Bordogna). 0165-0114/$ - see front matter © 2009 Elsevier B.V. All rights reserved. doi:10.1016/j.fss.2009.02.016
2242
G. Bordogna et al. / Fuzzy Sets and Systems 160 (2009) 2241 – 2252
Location uncertainty can derive from several causes, such as measurement errors or limited resolution of the positioning systems and characteristics of the communication network underlying the LBS. In some cases, uncertainty can be introduced on purpose to mask the exact location for preserving the privacy of the user [7–9]. In a moving environment, typically the instances positions are tracked at time-stamps so that their precise locations are known only at the time of the update [5]. Taking into account the uncertainty of the locations at the time the query results are received by the user is a requirement of several applications. For example, a robot moving around a room must identify “the obstacles on its route”, such as other moving objects crossing its path, in order to avoid possible collisions. To provide a safe answer, the system should know the precise locations of the objects when the query results are received, which are generally ill-known at the time the query is issued. Since in many applications this is not feasible, an improvement with respect to current approaches is to rank the answers by giving an estimate of their validity. In other situations, uncertainty may affect only the user or only the instances. An example is the case of geographic information retrieval, in which an outdoor user submits a LBQ to a search engine from a mobile device (such as “retrieve web-pages reporting advices on good restaurants at close distance from my location”), and wants to get an answer in order to take a decision on subsequent actions to do right away (what restaurant to choose). Such decisions involve the user position and surroundings, i.e., the geographic information is used as a key context in constraining information retrieval [10]. Generic web pages can hardly be geo-referenced with precision by using heuristic content-based analysis, thus it is necessary to deal with imprecision of their geo-reference when computing their ranking scores. Therefore, the need to model uncertain locations in evaluating LBQs applies in all the cases in which the resolution of location information about the user and/or the instances can alter the validity of query results. Another situation in which it is necessary to manage location uncertainty is when the query has an imprecise scope, i.e., it is formulated with an imprecise spatial condition (a soft spatial constraint) such as “find the taxi cabs that are within a walking distance of 10 min”. In the literature, location uncertainty in data has been modelled by means of probability distributions on the spatial domain [6], and the research in this respect mainly focuses on efficiency issues [5,6,11,12]. In our proposal, we model the representation and evaluation mechanism of uncertain LBQs by taking ideas from flexible querying in fuzzy databases [13–18]. Specifically, we are assuming an underlying fuzzy geographic information system (GIS) extended to manage fuzzy regions, i.e., regions with broad boundaries [19–21]. In this paper, by uncertain LBQs we intend soft range queries against possibly ill-defined location information admitting degrees of satisfaction. A range query in LBS specifies a spatial selection condition that consists in a circle centered at the user location: only the instances located within the circle satisfy the query. With the terms soft range queries we mean LBQs specifying an imprecise scope expressed by a linguistic term such as close, i.e., a soft spatial constraint admitting degrees of satisfaction [22]. While location uncertainty is represented by means of a possibility distribution defined on the bi-dimensional spatial domain [13], the soft spatial constraint is defined in an absolute way as a fuzzy relation on the bi-dimensional spatial domain: e.g., close is defined with a membership function that decreases with the distance from the coordinates’ origin. The relative soft spatial constraint with respect to the uncertain user location is derived during query processing by computing the generalized Minkowski sum [22]. The degree of satisfaction of the query is then computed as the fuzzy inclusion degree of the instances’ locations in the fuzzy set representing the relative soft spatial constraint, and is interpreted as an estimate of the validity of retrieved instances with respect to the query [23,24]. In Section 2, the processes involved in a LBS that may generate uncertainty are analyzed. In Section 3, related approaches to manage uncertainty in LBQs are described, and the motivation of our proposal is discussed. In Section 4, LBQs are classified into distinct types, depending on the presence of uncertainty; in its subsections the procedures proposed to evaluate each type of uncertain LBQ are described. The conclusions summarize the main achievements. 2. LBSs and related uncertainties Two definitions of LBSs have been provided in the literature: the first one defines them as “information services, accessible with mobile devices through the mobile network, with the ability to make use of the location of the mobile device [25]”. The second one, due to the international Open Geospatial Consortium [26], defines LBSs as “wireless_IP services, which use geographic information to serve mobile users”. More generally, we can say that LBSs are context aware services that provide information: in fact, the location is the most commonly considered element of context. It allows information, services and resources (more generally,
G. Bordogna et al. / Fuzzy Sets and Systems 160 (2009) 2241 – 2252
2243
database instances) to be localized. A user’s location can be absolute when it is described by a pair of geo-referenced coordinates, or relative, for example a room inside a building. Another contextual information used by LBSs is the user’s orientation to determine the user direction. By exploiting orientation information, the service can determine what is in front, behind and to either side of users (e.g., which paintings and sculptures when visiting a museum). Such location information can be affected by uncertainty due to the peculiarity of the information itself. For instance, a visitor of a National Park could be interested in places where mushrooms did grow in the last years, or where he/she can find the next campfire location [27]. To provide this information, context-sensing methods can be used to track the position and collect the history of actions of previous visitors. However, these methods raise user security and privacy concerns. To face privacy issues, the user should be always informed about the information that is collected about his/her actions and he/she should be provided with the option to decide to turn on/off context-based service features [2,7]. A more sophisticated way to solve this problem could be to blur, i.e., obfuscate, her/his position information [8]. Another situation in which the location is known with uncertainty is the case of location-based searches on the Internet (e.g., searches for restaurants, theaters in a specific location). For example, to search the best restaurants, suggested by a local newspapers, which are close to Milano center one could submit a request in which close may have several distinct meanings depending on the user context: e.g., “within a walking distance of 10 min”, “within a drive distance of 10 min”, etc. To answer this query, one should be able to identify web pages reporting suggestions of restaurants close to Milano center which are authored by local (not foreigners) newspapers. In this context close defines the scope of the search. IP location services can retrieve the location of IP providers but not the actual geographic location of the servers where the web pages reside. Besides this, it is even harder to exactly localize the geographic areas which are semantically associated with the content of a web page, by applying heuristic methods such as those based on the use of gazetteers [10]. The spatial scope is another dimension of LBSs that may be affected by imprecision. The spatial scope is the scale of the spatial conditions that constrain the positions of the instances with respect to that of the user. Three types of spatial scopes have been identified [28]. • A microscale when one asks for the exact positions of the resources; the microscale is defined with respect to the exact position of the user who is central in this definition. • A mesoscale when one looks for something reachable. The mesoscale is defined with respect to a well-defined environment (possibly circular) of which the user is the (possibly imprecise) center point. • A macroscale when the user needs an overview, e.g., “all restaurants in North of Italy”. The macroscale is a bigger environment around the user restricted by the (possibly imprecise) borders of a region. Further, LBSs may have distinct objectives: basically one can ask to know her/his location with respect to something else. Users may search for anything (resources, objects, events, information, etc.); they may want to be guided to a place (navigating); they may ask for identifying events at a certain location (checking). The objective of a search may require a more or less precise answer. Also the requested information content may have distinct characteristics with respect to the spatio-temporal dimension: it may be static over a period of time, such as yellow pages. It may change while the user is moving around (topical information) (e.g., traffic information, weather forecasts, arrival/departure trains, moving robots); in such a case the information checked previously may no longer be valid since its position in time has varied [3]. In this respect, location information has great importance when representing the uncertainty of topical information, e.g., to estimate the states of traffic on a road, or the possibility of weather changes in an area. 2.1. LBQ processing Generally, a LBS is activated by a LBQ; in order to run the service several infrastructural elements are needed that may introduce uncertainty in the process. • A mobile device to request the needed information such as a PDA, a Smart Phone, a laptop. When such devices are missing the user could manually communicate her/his location directly in the query. • A communication network that transfers the user data and query from the mobile device to the service provider and then the requested information back to the user.
2244
G. Bordogna et al. / Fuzzy Sets and Systems 160 (2009) 2241 – 2252
• A positioning system that uses the GPS or the mobile communication network, e.g., WLAN stations. If the positioning service uses a network to obtain user location it can generate uncertainty depending on both the network range and topology which vary depending on the type of the network [2]. • A service provider that is responsible for processing the LBQ. Several services can be considered, e.g. specialized services for searching “Italian Renaissance paintings” in a museum, or more general services such as search engines on the Internet. • A data and content provider that provides the contents. Let us consider the example of a microscale request consisting in searching a typical restaurant at a walking distance of 10 min. 1. The user expresses her/his need by selecting the appropriate function (a query) on her/his mobile device. 2. The location of the user’s mobile device is obtained from the positioning system. 3. Afterwards, the mobile client sends the information request (which contains the content to search and the position via the communication network) to a gateway. 4. The gateway exchanges messages between the mobile communication network and the Internet. It will store also information about the mobile device that has asked for the information. 5. The application server reads the request and activates the appropriate service, in this case a geographic-search engine on the Internet. 6. The service analyzes the search criteria (“typical restaurant”) and user position and finds the requested information. Having now all the information the service generates a spatial buffer around the user’s location depending on the query scope to get some restaurants (e.g., “at a walking distance of 10 min”). After calculating a list of close restaurants, the result is sent back to the user via Internet, gateway and mobile network. The restaurants will be now presented to the user either as a text list or drawn in a map. Notice that the selection of the items in the buffer and their ranks are determined solely based on their distance from the user location (both assumed as precisely known) and do not reflect the uncertainty involved in the whole process. Second, the calculation of the user position is based on the known positions of the base stations, the cells or satellites (in case of GPS). When using cells, the user’s position is known in a circle around a base station and thus the radius depends on the cell size. When using an antenna that detects the direction of the signal, the angle of arrival of the signal from the mobile device can be detected only approximately within an angle of 90◦ , 120◦ or 180◦ . Thus cell-based positioning can be very inaccurate delivering a coarse position (between 100 m to kms). Currently, the positioning technology and its accuracy influence the applicability of LBSs. For applications requiring accurate knowledge of user location, such as driving directions, the GPS is generally used which reaches an accuracy of a few meters. WLANs are better suited for information services with fine granularity, e.g., navigation in a museum, while WWANs are more suited for large scale services [2,29]. The ability of a LBS to represent and manage the uncertainty in the localization of both the user position and searched instances, and in the evaluation of LBQs, could greatly improve the applicability of LBSs even when coarse positioning techniques and diverse query scopes are used. 3. Related works The research on LBQs focused mainly on efficiency issues such as the investigation of new ways of indexing and caching spatial data to support the processing of LBQs including point query, window query, nearest-neighbor (NN) search, k nearest-neighbor search [30–33]. Another issue was the management of LBQs in a distributed way so as to achieve efficiency while allowing complex queries based on the use of location-dependent operators [34,35]. Another direction is the study of LBQs involving some uncertainty in location data. This is also the focus of our proposal. Uncertainty on the user location can derive from several causes as outlined in the previous section, or it can be introduced on purpose to mask the exact location for preserving the privacy of the user [7–9]. In this respect, the amount of uncertainty required to meet both privacy and the requirement on the service quality has been studied [5]. Generally, location uncertainty has been modelled by means of probability distributions on the spatial domain, basically bi-dimensional Gaussian functions or uniform distributions within a window or neighborhood [6]. The research on this topic faced the evaluation of probabilistic queries, such as probabilistic range queries [36]. Probabilistic queries
G. Bordogna et al. / Fuzzy Sets and Systems 160 (2009) 2241 – 2252
2245
evaluate uncertain location information and provide plausible answers in the form of probabilities. Uncertain LBQs have also been studied in the case of moving objects [5,6,12]. To our knowledge, no one has yet considered that the location of both the user and the objects in the spatial database and the spatial selection condition itself (the query scope) can be imprecise such as in the query “find the closest taxi cab to my path”. In our proposal we consider these three situations alone and in any combinations with one another, and provide a modelling by taking ideas from flexible querying in fuzzy databases [13–18]. Location uncertainty is represented by means of possibility distributions on the bi-dimensional spatial domain. Possibility distributions are easier to define than probability distributions, due to the simplest normalization condition they require: it is much easier to define a bi-dimensional distribution with the only constraint of having one element fully possible (with possibility equal to 1) with respect to defining a bi-dimensional probability distribution with an integral equal to 1. For example, when using cell-networks at the basis of the positioning service one just knows that all points inside a radius depending on the cell size are fully possible positions, and this knowledge can be represented by a possibility distribution. In the probabilistic framework, this is not possible without making arbitrary assumptions (i.e., not supported by the knowledge available). The query scope is represented by a soft spatial constraint defined in an absolute way on the bi-dimensional spatial domain relative to the coordinates’ origin (0, 0). One can exploit the power of fuzzy sets to define any kind of soft spatial constraint such as “close to (0, 0)”, “far from (0, 0)” etc. From the absolute constraint, e.g. “close to (0, 0)” the soft spatial constraint relative to the user’s location, e.g., “close to u”, is generated by dynamically computing the fuzzy Minkowski sum [22]. This way, we derive a relative soft spatial constraint that the instances’ locations, possibly uncertain, must satisfy to some extent in order to be retrieved. Similarly to what happens in fuzzy databases, a matching function between fuzzy sets is defined to compute the retrieval status value (RSV) of the instances, that in our context is a fuzzy inclusion function between spatial distributions [23,14]. The RSV can be interpreted as the estimate of the validity of the item with respect to the query. The implementation of the model can rely on a GIS extended to manage spatial objects with fuzzy boundaries, i.e., capable to represent regions with broad boundaries and to evaluate degrees of satisfaction of the topological relationships between fuzzy regions [21]. 4. Uncertain LBQs In this section we classify LBQs with respect to the presence of uncertainty affecting the user location, the instances locations, and the query scope, and define their formal representations and evaluation functions. For the sake of simplicity, in Table 1 we restrict our analysis to range queries and to their soft versions and define a mechanism to compute degrees of satisfaction of the instances that takes the uncertainty into account. We consider the linguistic expression close as an example of a soft range condition. Notice that close could be replaced by any other linguistic term defining a soft constraint on the distance from a position such as “very close”, “not too far”, “far” “within a walking distance of 10 min”, etc. Such soft range conditions can be regarded as variants of the specification of a NN search condition. In a crisp range query (type 1 in Table 1) such as “find instances located at a maximum distance r , from my location” both the location data and the range condition are precisely represented by their coordinates (x, y) on the spatial domain and the circle C(xu ,yu )(r ) of radius r that is implicitly referred as centered at the user location (xu , yu ). In this case, the instances whose coordinates (xi , yi ) fall within the circle C(xu ,yu )(r ) are selected as fully satisfying the range condition. Generally, in spatial databases a two-step procedure is adopted to optimize the time needed for this evaluation [37]. In the first filter phase, the cost of which is constant independently of the geometry, the instances included in the bounding box (BB) of the circle C(xu ,yu )(r ) are identified as candidates to satisfy the range query: Select Candidate i|(xu − r xi xu + r ) ∧ (yu − r yi yu + r ) The selected candidates are successively checked to verify if their coordinates fall within the geometry of C(xu ,yu ),(r ) and are selected if they pass this final refinement phase: Select i|{(xi , yi )} ⊆ C(xu ,yu ),(r ) if (xi − xu )2 + (yi − yu )2 r 2 In the case of a NN search, a satisfaction degree for each selected instance is also computed based on the inverse of its distance from the circle’s center (user location).
2246
G. Bordogna et al. / Fuzzy Sets and Systems 160 (2009) 2241 – 2252
Table 1 Types of uncertain LBQ. Query type
User location
Instances location
Spatial condition
1 2 3 4 5 6 7 8
(xu , yu ) around(xu , yu ) (xu , yu ) around(xu , yu ) (xu , yu ) around(xu , yu ) (xu , yu ) around(xu , yu )
(xi , yi ) (xi , yi ) around(xi , yi ) around(xi , yi ) (xi , yi ) (xi , yi ) around(xi , yi ) around(xi , yi )
r r r r close close close close
In the following subsections we analyze the other types of range queries classified in Table 1 on the basis of their uncertainty. We describe how, by representing and managing the uncertainty at distinct levels, it is possible to rank the instances depending on their degrees of validity in satisfying the range query. 4.1. LBQs with uncertain user location In this type of LBQ (type 2 in Table 1), the user location is affected by uncertainty around(xu , yu ) while the instances’ locations and the selection conditions are precise. This is for example the case of a moving robot looking for some stable resources within a maximum distance from its current position, that is around(xu , yu ). Evaluating this kind of queries implies having a representation of the uncertain user location around(xu , yu ) that we represent by means of a possibility distribution u : X × Y → [0, 1] on the bi-dimensional spatial domain X × Y . The shape of u depends on both the specific application and its needed accuracy, and the technologies supporting the LBS as discussed in Section 2; some examples of definitions of the uncertain user location data are hereafter described. 4.2. Examples of definition of the uncertain user location u In the case of uncertainty introduced on purpose to mask the user position, u can be defined as a uniform distribution within a circle C(xu ,yu )() centered at (xu , yu ) of radius with membership function: 1 ∀x, y with (x − xu )2 + (y − yu )2 < 2 u (x, y) = 0 otherwise In the case of a moving user such as a robot, u can be derived based on a data driven approach by monitoring the robot, and by determining its speed and direction. Given two subsequent precise locations of the robot, in (x0 , y0 ) and (x1 , y1 ) at time t0 and t1 , respectively, we can build u by considering the robot’s speed and the most possible position (xt , yt ) at the answer time t computed as xt = x1 + then we derive u (x, y) =
x1 − x0 y1 − y0 (t − t1 ) and yt = y1 + (t − t1 ) t1 − t0 t1 − t0 ⎧ 2 2 ⎨ e[−(1/2)k((x−xt ) +(y−yt ) )] ⎩
0
for (x − xt )2 + (y − yt )2 < (x1 − xt )2 + (y1 − yt )2 otherwise
(1)
in which k is a constant defining the wideness of the distribution. In this case u decreases smoothly with the distance from the most possible location (xt , yt ) and becomes null outside the circle centered in (xt , yt ) and radius (x1 − xt )2 + (y1 − yt )2 . More sophisticated definitions of the possible location of a user can exploit other information, such as topographic information [27] or the road network on which the user is moving, generating a network-like possibility distribution.
G. Bordogna et al. / Fuzzy Sets and Systems 160 (2009) 2241 – 2252
2247
By taking into account the underlying road network, and by modelling location uncertainty, more effective results can be obtained. Consider as an example the case in which the user is driving on a highway and is looking for resources that can be reached quickly from an nearby exit of the highway. In this situation, the delay between the query submission time and the answer time, that may last only a few seconds, can severely affect the validity of the results: in fact, when the user receives the answer from the service she/he can have overpassed an exit of the highway thus making some resources no more easily reachable from his current position. Furthermore, a user may submit a LBQ that is answered quickly by the service, while she/he may browse the results after sometime. In the meantime, her/his actual location can be substantially changed, and a re-ranking of the results based on her/his possible current location may be necessary. In the case in which an antenna can detect the approximate direction of the user who is walking or driving (e.g. can assume the imprecise values [0◦ , 90◦ ], [90◦ 180◦ ], [180◦ , 270◦ ], [270◦ , 360◦ ]), and the user location is imprecisely known at time t0 within a circle C(x0 ,y0 )() centered in (x0 , y0 ) and radius , we can infer her/his possible position u at time t when he/she receives the query results by knowing her/his speed : 1 if (x, y) ∈ C(x,y),() u (x, y) = 0 otherwise in which x = x0 + cos() ∗ (t − t0 ) and y = y0 + sin() ∗ (t − t0 ) Once we have the user’s location representation u and the precise range condition r , we derive a representation of the region delimiting candidate instances that satisfy the query to some extent. We assume that r defines a circle C(0,0),(r ) of radius r centered in the coordinates origin (for brevity C(0,0),(r ) is indicated by r in the following of the paper). First we compute the set S of the spatial domain that represents possible locations of the user: S = support(u ) = {(x, y)|(u (x, y) > 0)} Then, we derive the region delimiting the candidate instances by applying the Minkowski sum ⊕ of the set S with the spatial range condition C(0,0),(r ) =r . 4.2.1. Definition of the Minkowski sum We recall that the Minkowski sum ⊕ of two polygons S and Z on the Euclidean spatial domain is defined as follows [22]: S ⊕ Z = {s + z|s ∈ S and z ∈ Z }
(2)
in which s and z are points on the spatial domain. The Minkowski sum is defined as the union of all the translations of Z by a point s located in S. For type 2 queries (see Table 1) the Minkowski sum support(u ) ⊕ C(0,0),(r ) = support(u ) ⊕ r can be interpreted as the union of all the range queries by considering all possible positions of the user who is located somewhere inside support(u ); this corresponds to the union of all the circles of radius r centered somewhere in support(u ). It can be regarded as defining the spatial constraint relative to the user location. Clearly, only the instances whose location is within the region S ⊕ Z = support(u ) ⊕ r satisfy the query. Then, the evaluation of a type 2 query corresponds to select the instances whose locations satisfy the topological relationship inclusion in support(u ) ⊕ r . This can be performed in two steps: the filter phase selecting as candidates the instances in the BB of support(u ) ⊕ r (hereafter indicated by BB(support(u ) ⊕ r )): Select candidate i|(xi , yi ) ∈ BB(support(u ) ⊕ r ) with
(3)
BB(g) = (x, y) min (g(x, y)) x max(g(x, y)) ∧ min(g(x, y)) y max(g(x, y) x
x
y
y
2248
G. Bordogna et al. / Fuzzy Sets and Systems 160 (2009) 2241 – 2252
Fig. 1. Fuzzy Minkowski sum in a one-dimensional domain.
After that, the refinement phase evaluates each candidate instance (xi , yi ) for its satisfaction of the relative spatial constraint: Select i|(xi , yi ) ⊆ (support(u ) ⊕ r )
(4)
To compute degrees of satisfaction for the instances, we apply the generalized fuzzy Minkowski sum ⊕ F that we defined in [22], hereafter recalled. 4.2.2. Definition of the generalized fuzzy Minkowski sum Given two fuzzy sets S and Z defined on a spatial domain X, the generalized fuzzy Minkowski sum S ⊕ F Z is defined as the fuzzy union (max) of all the translations of Z by every element s belonging to some extent to the fuzzy set S: S ⊕ F Z = (( S⊕ F Z (r ), r )|r = s + z and s ∈ S, z ∈ Z ) with max (min( S (s), max( S (r ), Z (z)))) S⊕ F Z (r ) = ∀s∈S, ∀z∈Z |s+z=r
(5)
Example: To illustrate the results of the fuzzy Minkowski sum application, let us consider a simple example in a one-dimensional spatial domain X = [−8, 8] (see Fig. 1). Given the uncertain location S and the query scope Z, Z defined as a soft spatial constraint in an absolute way with respect to the origin (0) we compute the fuzzy Minkowski sum R = S ⊕ F Z : S := {0.5/2, 1/3, 1/4, 0.7/5} Z := {0.2/−2, 0.8/−1, 1/0, 0.8/1, 0.2/2} R = S ⊕ F Z = 0.2/0, 0.5/1, 0.8/2, 1/3, 1/4, 0.8/5, 0.7/6, 0.2/7 R defines the soft spatial constraint relative to the uncertain location S. R is then used to constrain the instances’ locations. It can be proved that: • S⊕ F Z ⊇ S, i.e., S⊕ F Z (x) S (x), ∀x ∈ X : the fuzzy set R, representing the result of generalized fuzzy Minkowski sum R = S ⊕ F Z , includes S; • support(S ⊕ F Z ) ≡ support(S) ⊕ support(Z ), i.e., the support of the generalized fuzzy Minkowski sum S ⊕ F Z is the crisp Minkowski sum between the supports of the fuzzy sets S and Z; • when S and Z are classic sets S⊕ F Z (x) = 1, ∀x ∈ S ⊕ F Z and S⊕ F Z (x) = 0 otherwise; i.e., S ⊕ F Z reduces to the crisp Minkowski sum. By considering a GIS in which spatial data can be represented in a discrete form by a grid, i.e., the tessellation representation of the spatial domain into regular cells, formula (5) can be computed in a finite number of steps by first computing the crisp Minkowski sum support(S) ⊕ Z and then by computing the membership degrees S⊕ F Z (r ) for each cell r ∈ support(S) ⊕ Z based on formula (5).
G. Bordogna et al. / Fuzzy Sets and Systems 160 (2009) 2241 – 2252
2249
In the context of the evaluation oftype 2 queries with uncertain user location S = u and precise range condition
Z = r , we define Z (z) = 1 for (x z2 + yz2 ) r while z (z) = 0 otherwise. In this case, the generalized fuzzy Minkowski sum u ⊕ r identifies the union of all the translations of the circle r by any point s = (x s , ys ) belonging to some extent to the user location distribution u . It can be interpreted as the fuzzy union of all the range queries by considering all possible positions u of the user. The instances whose locations (xi , yi ) are included in the support of the fuzzy set u ⊕ F r satisfy the query. The evaluation of type 2 queries is performed in the two steps previously presented for type 1 queries (filter phase plus refinement phase). In the refinement phase, for the candidate instances we estimate their degrees of satisfying the LBQ based on the fuzzy inclusion of their precise locations (xi , yi ) in the fuzzy set u ⊕ F r as follows: degr ee({(xi , yi )} ⊆ F (u ⊕ F r )) = u ⊕ F r (xi , yi )
(6)
These degrees of fuzzy inclusion make it possible to rank the instances with respect to the uncertain range query. 4.3. LBQs with uncertain instances’ locations This type of LBQ (type 3 queries in Table 1) is dual with respect to the previous one. In this case, the instances’ locations are ill-known while the user position is precise. For example, this is the case of moving objects, such as taxicabs with the user being located at a taxi station. To evaluate this kind of queries, first we represent the instances’ locations by means of possibility distributions on the spatial domain around(xi , yi ) = i . As in the previous case, we can adopt a data driven approach to generate i by exploiting collected information on previous positions of the instances. To evaluate this kind of queries, we can adopt two alternative procedures. Procedure A: With this procedure, we first build the crisp Minkowski sum R = (xu , yu ) ⊕ r . In this case, this is simply the circle of radius r centered in (x u , yu ). To compute the degrees of satisfaction of the LBQ by the N instances, we apply the two step procedure. The filter phase consists in evaluating, for each instance i, the intersection between the BB of support(i ) and of (xu , yu ) ⊕ r : Select Candidate i|BB(support(i )) ∩ BB((xu , yu ) ⊕ r ) ∅
(7)
In the refinement phase for each candidate instance i we evaluate the degree of a fuzzy inclusion of its uncertain location i in the fuzzy set (xu , yu ) ⊕ F r defined with membership function (xu ,yu )⊕ F r (x, y) = 1, ∀(x, y) ∈ (x u , yu ) ⊕ r , and 0 otherwise: degr ee(i ⊆ F ((xu , yu ) ⊕ F r )) To compute this degree, we recall the definition of fuzzy inclusion. 4.3.1. Definition of fuzzy inclusion between fuzzy sets One way to define the fuzzy inclusion between two fuzzy sets A and B on a bi-dimensional spatial domain X × Y is based on the cardinality Card of the fuzzy sets: min( A (x, y), B (x, y)) d x d y Card(A ∩ F B) degr ee( A ⊆ F B) = = X ×Y (8) Card(A) X ×Y A (x) d x d y where ∪ F is the intersection of fuzzy sets. Given that we assume a discrete representation of the spatial domain, i.e., a tessellation of the spatial domain X × Y into a finite number of regular cells, the fuzzy sets A and B are represented as finite sets of pixels with distinct membership values; thus their cardinality Card can be computed as the sum of their membership values. Notice that formula (8) reduces to formula (6) in the particular case in which the fuzzy set A is a single point of the spatial domain, as in the case of type 2 queries. In the case of type 3 queries, formula (8) reduces to compute the following: (x ,y )⊕r i (x, y) d x d y degr ee(i ⊆ F ((xu , yu ) ⊕ F r )) = u u (9) suppor t(i ) i (x, y) d x d y
2250
G. Bordogna et al. / Fuzzy Sets and Systems 160 (2009) 2241 – 2252
By considering a tessellation of the spatial domain into regular cells, formula (9) reduces to the discrete case, i.e., to compute the ratio between the sum of the values i (x, y) with (x, y) in C(xu ,yu ),(r ) and the sum of all the values i (x, y). To summarize: procedure A computes just once the crisp Minkowski sum based on formula (2) (i.e., C(xu ,yu ),(r ) circle of radius r centered in (xu , yu )) and then evaluates a maximum number N of fuzzy inclusions between fuzzy sets by applying formula (9), with N the number of the instances that satisfy the filter phase. Procedure B: This is an alternative procedure for evaluating LBQs of type 3. By adopting this procedure, a type 3 query is regarded as N queries of type 2 (N being the number of the instances selected by the filter phase), each query defined in such a way that the user location is exchanged in each query with the location of an instance. The decision on which procedure (A or B) to adopt for evaluating type 3 queries depends on efficiency considerations. By adopting procedure B we have the costs of N fuzzy Minkowski sums (i ⊕ F C(xu ,yu ),(r ) ) (based on definition (5)) between the uncertain instance location i and the circle C(xu ,yu ),(r ) (that is obtained as C(xu ,yu ),(r ) = (xu , yu ) ⊕ r ) plus the computation of N inclusion degrees based on formula (6). By adopting procedure A we have to compute N intersections between support(i ) and the crisp Minkowski sum defined by (xu , yu ) ⊕ r and then, for the n N instances i with not null intersection we have to compute their fuzzy inclusion degrees (based on formula (9)) in the crisp Minkowski sum (xu , yu ) ⊕ r . It can be proved that procedure A is the cheapest one since, even in the worst unfortunate situation in which n = N , the cost of the fuzzy Minkowski sum (i ⊕ F r ) based on (5) is more expensive than the cost of a fuzzy inclusion degree based on (9). 4.4. LBQs with uncertain user’s and instances’ locations This situation is the one in which both the user and instances locations are ill-known (query type 4 in Table 1), such as in the example of the moving robot with moving obstacles. In this case, by applying procedure A we have the increased complexity of computing a fuzzy Minkowski sum u ⊕ F r based on (5). Further, to rank the instances, since also their locations i are uncertain, we have to evaluate n N fuzzy inclusions i ⊆ F (u ⊕ F r ) where n are the instances for which support(i ) has a not null intersection with support(u ⊕ F r ). In this case to compute the n fuzzy inclusion degrees we have to apply formula (8) that is more complex than formula (9) since we have to compute the fuzzy intersection of two fuzzy sets on the bi-dimensional domain. On the other side, in this situation it does not make any sense to apply procedure B since also the user’s location u is uncertain. 4.5. LBQs with soft query scope and uncertain locations We now generalize the cases by considering soft LBQs specifying a soft spatial condition by means of a linguistic term such as close. In the context of fuzzy databases, soft conditions are defined as soft constraints on the domains of attributes. We retain this representation and define a soft spatial constraint on the spatial domain X × Y in an absolute way with respect to the coordinate origin (0, 0). The membership function close (x, y) → [0, 1], ∀x, y ∈ X × Y of close is defined so as to decrease with the distance from (0, 0). For example it can be defined as a bi-dimensional bell-shaped surface centered in (0, 0) decreasing with the distance from the origin (Fig. 2 depicts a trapezoidal membership function for close on a one-dimensional domain). When evaluating the query, the soft spatial constraint relative to either the users or the instances’ locations is generated. To evaluate this kind of queries, we first compute the fuzzy Minkowski sum, as defined in formula (5), of the user location locationu (that can be either precise for type 5 and type 7 queries, or imprecise u for type 6 and type 8 queries), and the fuzzy set close representing the soft query scope: closeu := locationu ⊕ F close . closeu represents the actual soft constraint relative to the user location that must be satisfied to some extent by the location of the i-th instance, locationi . The two step procedure can be adopted with the filter phase selecting the candidate instances that are then passed to the refinement phase. The filter phase selects the candidates when there is a non-null intersection between the BBs of the supports of closeu and locationi : Select Candidate i|BB(support(i )) ∩ BB(support(closeu )) ∅
G. Bordogna et al. / Fuzzy Sets and Systems 160 (2009) 2241 – 2252
2251
1
0.8 µclose 0.6
0.4
0.2
0 −0.1
−0.05
0 X
0.05
0.1
Fig. 2. Soft range condition close on a one-dimensional domain X .
The refinement step computes the degrees of the fuzzy inclusion of each candidate instance in the fuzzy set closeu . This step has a different complexity if we are evaluating type 5 and type 6 queries with respect to type 7 and type 8 queries. For type 5 and type 6 queries, since the instances’ locations are precise (xi , yi ), we just take, as satisfaction degree of an instance i, the value locationu ⊕ close (xi , yi ) computed by formula (6). In the case in which locationi = i as in type 7 and type 8 queries, we have to compute the fuzzy inclusion degree i ⊆ F (locationu ⊕ F close ) by applying definition (9). 5. Conclusions In this paper, a model for evaluating uncertain LBQs within a possibilistic framework is proposed. The approaches based on probability distributions faced just some cases in which uncertainty affects either user location, or instance locations, but never both of them at the same time, and not in conjunction with a soft spatial condition specified in the query. Our proposal has the advantage, with respect to the probabilistic approach, of representing the ill-defined knowledge on user and instance locations, without the need to make unsupported assumptions on their distributions. The evaluation procedure for each type of uncertain LBQ is composed of two phases with a first filter phase, used to identify the candidate instances, possibly satisfying the query, characterized by a fixed low cost, and a more complex refinement phase, used to compute the degrees of satisfaction of the query by the candidate instances. The fuzzy Minkowski sum is applied to generate the soft spatial constraint relative to the possibly uncertain user location. The fuzzy inclusion is used to compute a degree of satisfaction of uncertain LBQs. The availability of satisfaction degrees of a LBQ, reflecting the uncertainty of the knowledge available on user and instance locations, greatly improves current practice of LBSs that does not estimate the validity of the retrieved instances. The implementation of the proposed model largely depends on the used underlying GIS that must be able to manage spatial objects with broad boundaries and compute degrees of fuzzy inclusions between them [19]. In this paper we considered a tessellation representation of location data, and analyzed the computation of the fuzzy inclusion degree based on this representation. Further extensions of the model could concern the evaluation of other fuzzy topological relationships between the uncertain location data, such as degrees of overlapping. References [1] [2] [3] [4]
J. Lee, D. Xu, B. Zheng, Data management in location-dependent information services, IEEE Pervasive Computing 1(3), 2002. S. Steiniger, M. Neun, A. Edwardes, Foundations of location based services—cartouche project http://www.e-cartouche.ch/. J.H. Schiller, A. Voisard, Location-based Services, Morgan Kaufmann Publishers, Los Altos, CA, 2004. J. Warrior, E. McHenry, K. McGee, Know where you are, Spectrum 40 (7) (2003) 20–25.
2252 [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22]
[23] [24] [25] [26] [27] [28] [29] [30] [31] [32] [33] [34] [35] [36] [37]
G. Bordogna et al. / Fuzzy Sets and Systems 160 (2009) 2241 – 2252 R. Cheng, D. Kalashnikov, S. Prabhakar, Evaluating probabilistic queries on imprecise data, in: Proc. of ACM SIGMOD, 2003. D. Pforser, C. Jensen, Capturing the uncertainty of moving-objects representations, in: Proc. of SSDBM, 1999. A. Beresford, F. Stajano, Location privacy in pervasive computing, IEEE Pervasive Computing 2 (2003) 46–55. E. Damiani, S.S. De Capitani di Vimercati, S. Paraboschi, P. Samarati, Computing range queries on obfuscated data, in: Proc. of the Inform. Processing and Management of Uncertainty in Knowledge-Based Systems (IMPU’04), 2004. M. Gruteser, D. Grunwald, Anonymous usage of location-based services through spatial and temporal cloaking, in: Proc. 1st Internat. Conf. on Mobile Systems, Applications and Services, 2003. Z. Li, C. Wang, X. Xie, X. Wang, W. Ma, Indexing implicit locations for geographical information retrieval, in: Proc. of the GIR’06, 2006, pp. 15–29. R. Cheng, Y. Xia, S. Prabhakar, R. Shah, J. Vitter, Efficient indexing methods for probabilistic threshold queries over uncertain data, in: Proc. Int. Conf. on Very Large DataBases (VLDB’04), 2004. Y. Tao, D. Papadias, Q. Shen, Continuous NN search, in: Proc. Internat. Conf. on Very Large DataBases (VLDB’02), 2002. P. Bosc, H. Prade, An introduction to the fuzzy set and possibility theory-based treatment of flexible queries and uncertain and imprecise databases, in: A. Motro, P. Smets (Eds.), Uncertainty Management in Information Systems, Kluwer, Dordrecht, 1997, pp. 285–324. P. Bosc, B. Buckles, F. Petry, O. Pivert, Fuzzy databases, in: J. Bezdek, D. Dubois, H. Prade (Eds.), Fuzzy Sets in Approximate Reasoning and Information Systems, Kluwer Academic Publisher, Dordrecht, 1999, pp. 403–468. R. De Caluwe, Fuzzy and Uncertain Object-Oriented Databases, Concepts and Models, World Scientific, Singapore, 1997. D. Dubois, H. Prade, Tolerant fuzzy pattern matching: an introduction, in: P. Bosc, J. Kacprzyk (Eds.), Fuzziness in Database Management Systems, Physica Verlag, Berlin, 1994. F.E. Petry, Fuzzy Databases, Kluwer, Dordrecht, 1996. M. Vila, J. Cubero, J. Medina, O. Pons, Conceptual approach for dealing with imprecision and uncertainty in object-based data models, International Journal of Intelligent Systems 11 (10) (1996) 791–806. R. De Caluwe, G. De Tre, G. Bordogna, Spatio-temporal Databases: Flexible Querying and Reasoning, Springer, Berlin, 2004. F. Petry, V. Robinson, M. Cobb, Fuzzy Modelling with Spatial Information for Geographic Problems, Springer, Berlin, 2005. P.A. Burrough, A.U. Frank, Geographic Objects with Indeterminate Boundaries, Taylor & Francis, New York, 1996. G. Bordogna, M. Pagani, G. Pasi, G. Psaila, Flexible location-based spatial queries, in: O. Castillo, P. Melin, O. Montiel Ross, R. Sepulveda Cruz, W. Pedrycz, J. Kacprzy (Eds.), Theoretical Advances and Applications of Fuzzy Logic and Soft Computing, Advances in Soft Computing, Vol. 42, Springer, Berlin, 2007, pp. 36–45. G. Bordogna, G. Pasi, A semantics for soft selection conditions in fuzzy databases based on a fuzzy inclusion, in: Proc. of IFSA’03, 2003. P. Bosc, O. Pivert, Formal framework for representation-based querying of databases containing ill-known data, in: Proc. of the IIA’99 SOCO’99, 1999. K. Virrantaus, J. Markkula, A. Garmash, Y. Terziyan, Developing GIS-supported location-based services, in: Developing GIS-Supported Location-Based Services, Kyoto, Japan, 2001, pp. 423–432. Open Geospatial Consortium (OGC), Open location services 1.1, 2005. A. Nivala, L.T. Sarjakoski, Need for context-aware topographic maps in mobile devices, in: Proceedings o f ScanGIS 2003, 2003, pp. 15–29. T. Reichenbacher, Mobile cartography—adaptive visualisation of geographic information on mobile devices, Ph.D. Thesis, Munchen Technical University, 2004. P. Krishnamurthy, K. Pahlavan, Wireless communications, in: Telegeoinformatics, CRC Press, Boca Raton, 2004, pp. 111–142. H. Hu, J. Xu, W. Wong, B. Zheng, D. Lee, L. Lee, Proactive caching for spatial queries in mobile environments, in: Proc. 21st Internat. Conf. on Data Engineering (ICDE’05), 2005. J. Xu, W. Lee, The d-tree: an index structure for planar point queries in location-based wireless services, IEEE Transactions on Knowledge and Data Engineering 16 (12) (2004) 01–16. J. Xu, B. Zheng, W. Lee, D. Lee, Energy efficient index for querying location-dependent data in mobile broadcast environment, in: Proc. 9th Internat. Conf. on Data Engineering (ICDE’03), Bangalore, India, 2003, pp. 239–249. B. Zheng, J. Xu, W. Lee, D. Lee, Energy-conserving air indexes for nn search, in: Proc. 9th Internat. Conf. on Extending DataBase Technologies (EDBT’04), Heraklion, Greece, 2004, pp. 48–66. L. Liu, Mobieyes: distributed processing of continuously moving queries on moving objects in a mobile system, in: Proc. Internat. Conf. on Extending DataBase Technologies (EDBT’04), 2004, pp. 67–87. S. Illiarri, E. Mena, A. Illarramendi, Location-dependent queries in mobile context: distributed processing using mobile agents, IEEE Transactions on Mobile Computing 5 (8) (2006) 1029–1043. O. Wolfson, A. Prasad Sistla, S. Chamberlain, Y. Yesha, Updating and querying databases that track mobile units, Distributed and Parallel Databases Journal, special issue on Mobile Data Management and Applications 7 (3) (1999) 257–287. P. Rigaux, M. Scholl, A. Voisard, Spatial Databases: With Application to GIS, Morgan and Kaufmann, Los Altos, CA, 2001.