Dependency as Modality, Parsing as Permutation

Author: Konstantinos Kogkalidis
LOT Number: 648
ISBN: 978-94-6093-429-2
Pages: 239
Year: 2023
1st promotor: prof. dr. Michael Moortgat
€36.00
Download this book as a free Open Access fulltext PDF

This thesis presents a novel approach to the processing and representation of natural language syntax and semantics, combining symbolic and neural techniques.

The symbolic core is powered by a linear type system that uses modalities to capture dependency structures on top of function-argument relations, enabling a more flexible and expressive way of representing grammatical utterances.
The practical applications of this approach are showcased through the computational study of Dutch, utilizing a set of tools and resources developed specifically for this purpose. These include a large proofbank, i.e., a collection of sentences associated with tectogrammatic theorems and their corresponding programs, supported by an extensive type lexicon, which provides type assignments to almost one million lexical tokens within a given linguistic context. Parsing is handled by a combination of static type-checking, a state-of-the-art supertagger based on heterogeneous graph convolutions, and a massively parallel proof search component formulated as a neural bijection learner.

Overall, this thesis demonstrates the power of an integrated neurosymbolic approach to natural language processing combining the best of both worlds - the symbolic representation of meaning and the statistical power of modern neural networks.

This thesis presents a novel approach to the processing and representation of natural language syntax and semantics, combining symbolic and neural techniques.

The symbolic core is powered by a linear type system that uses modalities to capture dependency structures on top of function-argument relations, enabling a more flexible and expressive way of representing grammatical utterances.
The practical applications of this approach are showcased through the computational study of Dutch, utilizing a set of tools and resources developed specifically for this purpose. These include a large proofbank, i.e., a collection of sentences associated with tectogrammatic theorems and their corresponding programs, supported by an extensive type lexicon, which provides type assignments to almost one million lexical tokens within a given linguistic context. Parsing is handled by a combination of static type-checking, a state-of-the-art supertagger based on heterogeneous graph convolutions, and a massively parallel proof search component formulated as a neural bijection learner.

Overall, this thesis demonstrates the power of an integrated neurosymbolic approach to natural language processing combining the best of both worlds - the symbolic representation of meaning and the statistical power of modern neural networks.

Categories