ScienceDirect
Available online at www.sciencedirect.com Procedia Computer Science 00 (2018) 000–000
ScienceDirect
www.elsevier.com/locate/procedia
Available online at www.sciencedirect.com Procedia Computer Science 00 (2018) 000–000
www.elsevier.com/locate/procedia
ScienceDirect Procedia Computer Science 142 (2018) 278–285
The 4th International Conference on Arabic Computational Linguistics (ACLing 2018), November 17-19 2018, Dubai, United Arab Emirates The 4th International Conference on Arabic Computational Linguistics (ACLing 2018), CategorialNovember Analysis of 2018, Agreement Asymmetries 17-19 Dubai, United Arab Emirates in Arabic
Ismaïl Biskri , Adel Jebali * Categorial Analysis of Agreement Asymmetries in Arabic a
a
b
LAMIA, Département de Mathématiques et Informatique, Université du Québec à Trois-Rivières, Trois-Rivières G9A5H7, Canada b a University, Montréal b H3G1M9, Canada Département d’études françaises, Concordia
Ismaïl Biskri , Adel Jebali *
a
LAMIA, Département de Mathématiques et Informatique, Université du Québec à Trois-Rivières, Trois-Rivières G9A5H7, Canada b Département d’études françaises, Concordia University, Montréal H3G1M9, Canada
Abstract
Agreement asymmetries are the most debated issue in Arabic linguistics. Even though the facts suggest a unified treatment based Abstract on the properties of agreement, most of the researchers in this field don’t take into account the essential difference between grammatical agreement and anaphoric agreement. We do propose such a distinction to explain these asymmetries and we propose Agreement asymmetries are theinmost debated issue in Arabic linguistics. though the facts suggest a unified treatment based an analysis that we implement the Applicative Combinatory Categorial Even Grammar (ACCG) framework. on the properties of agreement, most of the researchers in this field don’t take into account the essential difference between grammatical agreement and anaphoric agreement. We do propose such a distinction to explain these asymmetries and we propose an analysis we implement theElsevier Applicative © 2018 Thethat Authors. Publishedinby B.V. Combinatory Categorial Grammar (ACCG) framework. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/3.0/) © 2018 The Authors. Published by Elsevier B.V. Peer-review under responsibility of the scientific committee of the 4th International Conference on Arabic Computational This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/3.0/) © 2018 The Authors. Published by Elsevier B.V. Linguistics. Peer-review under responsibility of the scientific committee of the 4th International Conference on Arabic Computational Linguistics. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/3.0/) Peer-review under responsibility of the scientific the ; 4th International Conference Keywords: Applicative Combinatory Categorial Grammar ;committee CombinatoryofLogic Agreement Assymetries in Arabic.on Arabic Computational Linguistics. Keywords: Applicative Combinatory Categorial Grammar ; Combinatory Logic ; Agreement Assymetries in Arabic. 1. Agreement asymmetries
The facts concerning agreement asymmetries in Arabic can be split in two categories: Order asymmetries and 1. Agreement asymmetries categorical asymmetries. The facts concerning agreement asymmetries in Arabic can be split in two categories: Order asymmetries and categorical asymmetries.
* Corresponding author. Tel.: +1-819-376-5011#3837. E-mail address:
[email protected] 1877-0509 © 2018 author. The Authors. Published by Elsevier B.V. * Corresponding Tel.: +1-819-376-5011#3837. E-mail address:
[email protected] This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/3.0/) Peer-review under responsibility of the scientific committee of the 4th International Conference on Arabic Computational Linguistics . 1877-0509 © 2018 The Authors. Published by Elsevier B.V. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/3.0/) Peer-review under responsibility of the scientific committee of the 4th International Conference on Arabic Computational Linguistics . 1877-0509 © 2018 The Authors. Published by Elsevier B.V. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/3.0/) Peer-review under responsibility of the scientific committee of the 4th International Conference on Arabic Computational Linguistics. 10.1016/j.procs.2018.10.497
2
Ismaïl Biskri and Adel Jebali / Procedia Computer Science 00 (2018) 000–000 Ismaïl Biskri et al. / Procedia Computer Science 142 (2018) 278–285
279
1.1. Order Asymmetries These asymmetries can be observed in sentences where the first component is a nominative noun phrase or a verb. In the first case, as shown in example (1), the agreement is said to be rich, as the subject marker –uu (3rd person plural masculine, or 3PM) agrees in number with the preverbal noun phrase alawlaadu (boys). (1)
al- awlaad –u the- boys –NOMINATIVE “The boys, they left.”
xaraj -uu left -3PM
However, when the first component of the sentence is a verb, as shown in (2), the agreement in number is not allowed, hence the singular subject marker –a (3rd person singular masculine, or 3SM) in (2a) and the ungrammaticality of (2b) even though the postverbal noun phrase is the same as in (1). (2)
a.
xaraj –a al- awlaad –u left -3SM the- boys –NOMINATIVE “The boys left.”
b.
*xaraj –uu left -3PM
al- awlaad –u the- boys -NOMINATIVE
As seen in these examples, the singular form of the subject marker in (2a) is in a disagreement relation with the plural form of the noun phrase with whom it is supposed to agree in all morphosyntactic features (person, number and gender). This is not, however, the only agreement asymmetry in Arabic. 1.2. Categorical Asymmetries The asymmetries shown in section (2.2) are only observed when the component placed preverbally or postverbally is a noun phrase. When this component is an independent pronoun, the agreement patterns show to be different. The examples in (3) show this particular behavior. (3)
a.
hum xaraj –uu they left -3PM “Them, they left.”
b.
xaraj –uu hum left -3PM they “They left, them.”
When the preverbal nominative is an independent pronoun, as seen in (3a), the agreement shows to be rich as it is the case when this component is a noun phrase in (1). However, when this pronoun is postverbal, the agreement is also rich (3b), hence differing from the example (2a), where there is no agreement in number. Why do we have such a behaviour with noun phrases and independent pronouns? Why is the placement of the noun phrase (preverbal/postverbal) having these effects on agreement? These questions will be adressed in the next section. 2. Our hypotheses The solution we propose for these agreement asymmetries is based on three complementary hypotheses:
1st hypothesis: Contrary to what is admitted by some authors in the generative framework of Chomsky, there is no genuine SVO configuration in Arabic. The preverbal noun phrase (or independent pronoun) is a topic
Ismaïl Biskri and Adel Jebali / Procedia Computer Science 00 (2018) 000–000 Ismaïl Biskri et al. / Procedia Computer Science 142 (2018) 278–285
280
3
and not a subject. This means that the only subject/verb agreement relation in Arabic is observed in the VSO configurations. We prove that this hypothesis is true when we apply the independently motivated tests proposed by Li and Thompson [12] to make the distinction between subjects and topics. In Arabic, topics and verbs agree by means of the incorporated pronoun attached to the later and redundantly expressing the morphosyntactic features of the former.
2nd hypothesis: Agreement in the so-called SVO configurations is anaphoric, hence between a noun phrase (or an independent pronoun) and an incorporated pronoun. The subject marker in this configuration is an incorporated pronoun anaphorically bound to the preverbal component. That’s why the pronoun agrees in all morphosyntactic features (gender, number and person) with the noun phrase [4] [9].
3rd hypothesis: Agreement in the VSO configurations is a grammatical one. This is the only context in which there is an agreement relation between subjects and verbs in Arabic. In this particular configuration, the subject marker is proven to be an agreement marker and not an incorporated pronoun. The grammatical agreement is also shown to be poor (hence only in gender) in this context.
Regarding the last hypothesis, we have to explain why the grammatical agreement shown in the VSO configurations is poor. We think that this is somewhat normal in Arabic because it is seen elsewhere in this language: in the agreement relations between nouns and adjectives, for example (see [11] for more details). It is also seen in the agreement relations between numbers and counted nouns. The question yet to be answered is why there is a difference between noun phrases and independent pronouns? Why do independent pronouns trigger a rich agreement even when they appear in a postverbal position? We think that the answer is straightforward. In Arabic, independent pronouns are always left or right dislocated. They appear in the sentence periphery and are always topics (again according to the tests proposed in the literature). The relation between the independent pronoun and the subject marker in a sentence such as (3b) is, from this perspective, anaphoric. This sentence has to be read as if there were an opposition between two elements. This kind of structure is also observed in coordination: (4)
xaraj –uu hum wa akhawaat –u left -3PM them and sisters -NOM "They left, them and their sisters."
-hum -their
In the next section, we show how this particular linguistic analysis is to be integrated into a categorial grammar, the ACCG model. 3. Categorial Grammar Analysis According to Desclés [7] and Shaumyan [15], the languages have multiple levels of representation. We distinguish, in particular, the level of observable statements and the level of predicative representations underlying such statements. It is at observable expressions that the syntagmatic organization (word order, morphology, etc.) is considered. At predicative representations level, statements of the language take shape in a formal applicative language in which units of language apply to each other through the operation of application. More generally, this level is used to establish a universal formulation of the interaction of grammatical categories of language independently of their morphosyntactic features. Obviously, in each language system, the predicative representations of syntactically correct statements are encoded by specific strategies and procedures. Our goal here is twofold. It is to define and implement a strategy and give algorithms, firstly to validate the morphosyntactic correctness of statements, and secondly to build their predicative representation. The model chosen is the Applicative Combinatory Categorial Grammar (ACCG) [2] [3]. ACCG establishes a canonical relationship between the extended categorial combinatory rules and combinators of the combinatory logic. This model, like the whole of the other categorial models [1] [8] [13] [14] [16], falls under a paradigm for the analysis
4
Ismaïl Biskri and Adel Jebali / Procedia Computer Science 00 (2018) 000–000 Ismaïl Biskri et al. / Procedia Computer Science 142 (2018) 278–285
281
of languages which allows a complete abstraction of the grammatical structure than its linear representation and, also, a complete abstraction of grammar than its lexicon. It conceptualizes the languages as organized systems of linguistic units (words, morphemes, lexemes, etc) of which some function as operators whereas others function as operands. It makes it possible to explicitly implement a logical-linguistic process which enables us to carry out in an effective way the translation of observable texts of a current language in a formal ideographic language of the combinatory logic with universal properties. The expressions of this ideographic language, it does not matter the translated language, have a normal form which is useful, mainly, to clarify the conditions of the correctness of the sentences, but also to extract “paraphrastic classes”. Categorical Grammars conceptualize languages as a linear organization of linguistic units that function as operators or operands [7] [16]. To each lexical token is assigned a syntactical category. Syntactical categories are orientated types developed from basic types and from two constructive operators ‘/’ and ‘\’ : (i) Basic types (N for noun, S for sentence, N* for noun phrase, etc.). (ii) If X and Y are orientated types then X/Y (The type of an operator whose typed operand Y is positioned on the right) and X\Y (whose typed operand Y is positioned on the left) are orientated types. X/Y and X\Y are functional orientated types. A linguistic unit 'u' with the type X/Y (respectively X\Y) is considered as operator (or function) whose typed operand Y is positioned on the right (respectively on the left) of operator. The linguistic syntactical analysis is implemented by a mathematical model which is embodied in an inferential calculus of simplification of types. The combinatory logic was developed to analyze Russell paradoxes and the concept of substitution. Just like in the lambda-calculus of Church, combinatory logic is currently used by specialists in informatics to analyze the semantic properties of high level programming languages. The principal difference between the two logics lies in the fact that combinatory logic is a variable-free logic. Combinatory logic uses abstract operators called combinators to make it possible to construct more complex operators starting from more elementary operators. Each combinator is introduced or eliminated by a -reduction. For illustration, we present the -reduction rules of B, and C* (There are other combinators. The reader might have a look at [5] [10]): ((B U1U2) U3) U1 U2 U3) U4) ((C U1) U2) *
(U1 (U2 U3)) (U1 (U2 U4) (U3 U4)) (U2 U1)
U1, U2, U3, U4 are typed applicative expressions which function either as operators or as operands. According to the Church-Rosser Theorem, these rules establish a relationship, which is independent of the meaning of the arguments, between an expression with combinators and a single expression (if it exists) without combinators called the normal form that is equivalent to the first (from a certain point of view). In the ACCG model, normal forms represent functional semantic interpretation or what we called earlier the predicative representation. Let us give now the rules of the ACCG model. The premises in each rule is the concatenation of linguistic units, whereas the consequence in each rule is the application of one linguistic unit to another one with the possible introduction of one combinator. Application rules: [X/Y : u1] - [Y : u2] -------------------------->; [X : (u1 u2)]
[Y : u1] - [X\Y : u2] --------------------------< [X : (u2 u1)]
282
5
Ismaïl Biskri and Adel Jebali / Procedia Computer Science 00 (2018) 000–000 Ismaïl Biskri et al. / Procedia Computer Science 142 (2018) 278–285
Type raising rules: [X : u] ---------------------->T ; [Y/(Y\X) : (C u)] *
[X : u] ----------------------
[X : u] ---------------------->Tx ; [Y/(Y/X) : (C u)] *
[X : u] ----------------------
Functional composition rules: [X/Y : u1]-[Y/Z : u2] ------------------------->B ; [X/Z : (B u1 u2)]
[Y\Z : u1]-[X\Y : u2] -------------------------
[X/Y : u1]-[Y\Z : u2] --------------------------->Bx; [X\Z : (B u1 u2)]
[Y/Z : u1]-[X\Y : u2] -------------------------
We think that categorial grammars, applicative systems, combinators and their reduction process are effective, fertile and sufficiently powerful concepts to precisely formulate assumptions on the nature of the markers in Arabic. Each analysis of Arabic sentence is reduced to its normal form which makes it possible to analyze functional semantic interpretation of it and eventually to create paraphrastic classes. We think that this approach presents an alternative more in agreement with the empirical facts and especially with the requirements of formalization. Let us now analyze examples given above and let us show how we can give an account, by means of ACCG, of the agreement asymmetries in Arabic language. The major challenge is, of course, the addition of morphological information on gender and number to categories. Category N*3PM assigned to al-awalaad-u (example 5) typifies a noun phrase of 3rd person plural masculine. N*3PF assigned to akhawaat-u–hum (example 10) typifies a nominal phrase of 3rd person plural feminine. N*3M typifies a noun phrase of 3rd person masculine and whose number is knowingly omitted. This category is used as part of the complex category (S/N*3M)\(S/N*) assigned to the marker –a (example 6). This marker is considered as an operator whose operand is the verb xaraj. The result of the application of the marker –a to the verb xaraj would be an operator whose operand would be either singular or plural. al-awlaad-u ---------------N*3PM
xaraj -uu ---------S/N* (S\N*3PM)\(S/N*) -----------------------------------------< S\N*3PM : (-uu xaraj) -----------------------------------------------< S : ((-uu xaraj) al-awlad-u)
(5)
As claimed above, the marker -uu in the sentence (5) is an incorporated pronoun that agrees in all morphosyntactic features (gender, number and person) with the preverbal noun phrase al-awlaad-u. We establish this agreement through the assignment of types N*3PM and (S\N*3PM)\(S/N*) respectively to al-awlaad-u and –uu. In fact, -uu is considered as an operator that applies to the verb xaraj in order to construct an operator (-uu xaraj) of type S\N*3PM whose operand is a noun phrase (of the 3rd person plural masculine) on its left.
6
Ismaïl Biskri and Adel Jebali / Procedia Computer Science 00 (2018) 000–000 Ismaïl Biskri et al. / Procedia Computer Science 142 (2018) 278–285
xaraj -a al-awlaad-u ------------------------S/N* (S/N*3M)\(S/N*) N*3PM ------------------------------------------------< S/N*3M : (-a xaraj) -----------------------------------------------------------------------> S : ((-a xaraj) al-awlaad-u)
283
(6)
In sentence 6, the marker –a agrees only in gender with the subject positioned on the right of the verb. -a is considered as an operator that applies to the verb xaraj in order to construct an operator (-a xaraj) of type S/N*3M whose operand is a noun phrase (the subject of the 3rd person masculine) on its right. This noun phrase can be either plural or singular. *xaraj -uu al-awlaad-u ------------------------(S\N*3PM)\(S/N*) N*3PM S/N* -------------------------------------------------< S\N*3PM : (-uu xaraj) ---------------------------------------------------------------------Analysis fails
(7)
The sentence (7) is ungrammatical. Its categorial analysis fails when one try to apply (-uu xaraj) to the subject alawlaad-u. In fact, (-uu xaraj) is the result of the application of the marker –uu to the verb xaraj. As in the sentence 5, -uu is still considered as an incorporated pronoun that agrees in all morphosyntactic features (gender, number and person) with a preverbal noun phrase. But in the case of this sentence, the noun phrase (the subject) al-awlaad-u is postverbal. That is why the analysis of the sentence fails. hum -----P3PM
xaraj -uu --------S/N* (S\P*3PM)\(S/N*) ----------------------------------------------------< S\P3PM : (-uu xaraj) ------------------------------------------------------< S : ((-uu xaraj) hum)
(8)
xaraj -uu hum --------------P3PM S/N* (S/P3PM)\(S/N*) -----------------------------------------------< S/P3PM : (-uu xaraj) --------------------------------------------------------------------> S : ((-uu xaraj) hum)
(9)
When the preverbal nominative (in this case hum) is an independent pronoun, as in (8), the agreement with the marker –uu is rich. This agreement is implemented by assigning types P3PM and (S\P*3PM)\(S/N*) respectively to hum and -uu. The agreement is also rich in (9) when the pronoun hum is postverbal. However, the type (S/P3PM)\(S/N*) assigned to -uu in this case allows to construct one complex operator (-uu xaraj) whose operand is positioned on its right not on its left as in (8).
284
7
Ismaïl Biskri and Adel Jebali / Procedia Computer Science 00 (2018) 000–000 Ismaïl Biskri et al. / Procedia Computer Science 142 (2018) 278–285
For sentences with coordination (10) and (11), the same implementation rules apply. Thus, when the subject (in the first member of the coordination is a noun phrase which occurs after the verb, the agreement is poor (just in gender), whereas when it takes the form of an independent pronoun, the agreement is rich (in gender and number). We express this by assigning types (S/N*3M)\(S/N*) and (S/P3PM)\(S/N*) respectively to -a and -uu. In the first case the agreement does not take in account the number but only the gender whereas in the second case the agreement takes in account the gender and the number. Finally, the categorial types assigned to -uu allow the parser to distinguish cases of rich agreement from cases of poor agreement. Let us give the analysis of the sentence (10): Xaraj -----S/N*
-a al-awlaad-u wa ---------------------(S/N*3M)\(S/N*) N*3PM (X\X)/X -----------
al-banaat-u -------------N*3PF
(10)
-----------
(wa (C* al-banaat-u) (B (C* al-awlaad-u) –a)) Xaraj) (C* al-banaat-u)(B (C* al-awlaad-u) –a)) Xaraj) ((C* al-banaat-u) Xaraj) ((B (C* al-awlaad-u) –a) Xaraj)) (Xaraj al-banaat-u) ((B (C* al-awlaad-u) –a) Xaraj)) (Xaraj al-banaat-u) ((C* al-awlaad-u) (–a Xaraj))) (Xaraj al-banaat-u) ((–a Xaraj) al-awlaad-u))
Let us give the analysis of the sentence (11): xaraj -----S/N*
-uu ----(S/P3PM)\(S/N*)
hum wa --------P3PM (X\X)/X -------
akhawaat -u –hum ---------------------N*3PF
(11)
--------------------
8
Ismaïl Biskri and Adel Jebali / Procedia Computer Science 00 (2018) 000–000 Ismaïl Biskri et al. / Procedia Computer Science 142 (2018) 278–285
285
The steps of the construction of the functional semantic interpretation are: 1. 2. 3. 4. 5. 6.
((wa (C* akhawaat-u–hum) (B (C* hum) –uu)) xaraj) (( (C* akhawaat-u–hum) (B (C* hum) –uu)) xaraj) ( ((C* akhawaat-u–hum) xaraj) ((B (C* hum) –uu) xaraj)) ( (xaraj akhawaat-u–hum) ((B (C* hum) –uu) xaraj)) ( (xaraj akhawaat-u–hum) ((C* hum) (–uu xaraj))) ( (xaraj akhawaat-u–hum) ((–uu xaraj) hum))
To conclude, when the agent responsible for the verbal action is a noun phrase, the categorial type (S\N*3PM)\(S/N*) is assigned to the suffix -uu (for plural) attached to the verb. This type allows to validate sentence only if the noun phrase is preverbal. The categorial types (S\N*3M)\(S/N*) or (S/N*3M)\(S/N*) are assigned to the suffix –a (for singular) depending on whether the noun phrase is preverbal or postverbal. When the agent responsible for the verbal action is represented by an independent pronoun, the categorial types (S\P*3PM)\(S/N*) or S/P3PM)\(S/N*) are assigned to the suffix –uu (for plural) attached to the verb, depending on whether the pronoun is preverbal or postverbal. 4. Conclusion We have shown how our linguistic analysis is to be integrated into the ACCG framework to better understand the agreement asymmetries in Arabic language. These asymmetries have to be solved by stipulating three hypotheses. The ACCG framework is shown to be strong enough to account for phenomena between morphology and syntax. We have though to go further to see if this is possibly done in other structures and for other varieties of Arabic (i.e. modern Arabic dialects), where agreement patterns are different from those presented in this paper. References [1] Baldridge, Jason, and Geert-Jan Kruijff (2003). Multi-Modal Combinatory Categorial Grammar. In EACL 2003 Proceedings. [2] Biskri, Ismaïl (2008). A typed logic for the categorial annotation of coordination in Arabic. In Procedings of the 2008 FLorida Artificial Intelligence Society. AAAI Press. [3] Biskri, Ismaïl, and Jean-Pierre Desclés (2006). Coordination de Catégories Différentes en Français. Faits de Langue No 28. [4] Bresnan, Joan, and Sam A. Mchombo (1987). ”Topic, Pronoun and Agreement in Chichewa”. Language, vol. 63, no 4, pp. 741–782. [5] Curry, Haskell B., and Robert Feys (1958). Combinatory Logic Vol. I. Amsterdam: North-Holland. [6] Desclés, Jean-Pierre, Guibert, G., and Benoit Sauzay (2018). Logique combinatoire et -calcul : des logiques d’opérateurs. Éditions Cépaduès (2018). [7] Desclés, Jean-Pierre (1996). Cognitive and Applicative Grammar : an Overview. C. Martin Vide, ed. Lenguajes Naturales y Lenguajes Formales XII, 29-60. [8] Dowty, David, (2000), The Dual Analysis of Adjuncts/Complements in Categorial Grammar. In Linguistics 17. [9] Fassi Fehri, Abdelkader (1993). Issues in the Structure of Arabic Clauses and Words. Springer. [10] Hindley, J. Roger, and Jonathan P. Seldin (2008). Lambda-calculus and Combinators, an Introduction. Cambridge University Press. [11] Jebali, Adel (2009). La modélisation des marqueurs d’Arguments de l’arabe standard dans le cadre des grammaires à base de contraintes. PhD dissertation. Université du Québec à Montréal, Canada. [12] Li, Charles N., and Sandra A. Thompson (1976). Subject and topic: A new typology of language. Subject and Topic, ed. by Charles N. Li, 458-89. New York: Academic Press. [13] Moortgat, Michael (1997), "Categorial Type Logics", In Johan Van Benthem and Alice Ter Meulen, eds., Handbook of Logic and Language 93-177. Amsterdam : North Holland. [14] Morill, Glynn (1994), Type-Logical Grammar. Dortrecht : Kluwer. [15] Shaumyan, Sebastian K. (1998). “Two Paradigms of Linguistics : The Semiotic Versus Non-Semiotic Paradigm.”. In Web Journal of Formal, Computational and Cognitive Linguistics. [16] Steedman, Mark (2000). The Syntactic Process. MIT Press/Bradford Books.