This thesis presents a novel approach to the processing and representation of natural language syntax and semantics, combining symbolic and neural techniques.
The symbolic core is powered by a linear type system that uses modalities to capture dependency structures on top of function-argument relations, enabling a more flexible and expressive way of representing grammatical utterances.
The practical applications of this approach are showcased through the computational study of Dutch, utilizing a set of tools and resources developed specifically for this purpose. These include a large proofbank, i.e., a collection of sentences associated with tectogrammatic theorems and their corresponding programs, supported by an extensive type lexicon, which provides type assignments to almost one million lexical tokens within a given linguistic context. Parsing is handled by a combination of static type-checking, a state-of-the-art supertagger based on heterogeneous graph convolutions, and a massively parallel proof search component formulated as a neural bijection learner.
Overall, this thesis demonstrates the power of an integrated neurosymbolic approach to natural language processing combining the best of both worlds - the symbolic representation of meaning and the statistical power of modern neural networks.