Research

I do research on topics related to natural language processing.

I was an early employee at Aylien -- we built NLP as a service before it was cool. Aylien was acquired by Quantexa in early 2023, and I had the opportunity to build a new team focused on applied NLP. The Quantexa NLP (QNLP) team works on a mix of product-focused prototyping and innovation, as well as open-ended exploratory research.

For a complete list of publications please see my Google Scholar Profile

Please connect on LinkedIn, I check it regularly.

Selected Publications

GLiREL -- Generalist Model for Zero-Shot Relation Extraction

We introduce GLiREL (Generalist Lightweight model for zero-shot Relation Extraction), an efficient architecture and training paradigm for zero-shot relation classification. Inspired by recent advancements in zero-shot named entity recognition, this work presents an approach to efficiently and accurately predict zero-shot relationship labels between multiple entities in a single forward pass. Experiments using the FewRel and WikiZSL benchmarks demonstrate that our approach achieves state-of-the-art results on the zero-shot relation classification task. In addition, we contribute a protocol for synthetically-generating datasets with diverse relation labels.

Code

KGValidator: A Framework for Automatic Validation of Knowledge Graph Construction

This study explores the use of Large Language Models (LLMs) for automatic evaluation of knowledge graph (KG) completion models. Historically, validating information in KGs has been a challenging task, requiring large-scale human annotation at prohibitive cost. With the emergence of general-purpose generative AI and LLMs, it is now plausible that human-in-the-loop validation could be replaced by a generative agent. We introduce a framework for consistency and validation when using generative models to validate knowledge graphs.

News Signals: An NLP Library for Text and Time Series

News Signals is an open-source Python library for building and using datasets where inputs are clusters of textual data, and outputs are sequences of real values representing one or more time series signals. The news-signals library supports diverse data science and NLP problem settings related to the prediction of time series behaviour using textual data feeds. For example, in the news domain, inputs are document clusters corresponding to daily news articles about a particular entity, and targets are explicitly associated real-valued time series: the volume of news about a particular person or company, or the number of pageviews of specific Wikimedia pages. Despite many industry and research use cases for this class of problem settings, to the best of our knowledge, News Signals is the only open-source library designed specifically to facilitate data science and research settings with natural language inputs and time series targets.

Code

Efficient Unsupervised Sentence Compression by Fine-tuning Transformers with Reinforcement Learning

We design loss functions for unsupervised text-compression that use auxiliary signals for text compression quality, such as PLM-derived fluency and consistency with source inputs. The models outperform existing approaches that use discrete-search and are also very efficient at inference time due to a policy-based reinforcement learning training setup, which distills the ensemble of training targets into a single classification decision.

Code

DynE: Dynamic Ensemble Decoding for Multi-Document Summarization

We propose a novel ensembling method for multi-input sequence to sequence models. This simple method leverages large pre-trained single document summarization models, and achieves SOTA performance on two multi-document summarization benchmarks

Code

A Large-Scale Multi-Document Summarization Dataset from the Wikipedia Current Events Portal

We present a new large-scale dataset for training and evaluating Multi-Document summarization systems in realistic settings.

Code

Evaluating the Supervised and Zero-shot Performance of Multi-lingual Translation Models

We evaluate several methods of decoder parameter sharing for multilingual translation models, and present a means of evaluating performance on language pairs where no references are available. Ours was the only single-model multilingual system at WMT 2019.

Deep Interactive Text Prediction and Quality Estimation in Translation Interfaces (Ph.D. Thesis)

In my PhD thesis, I created novel Computer-Aided Translation interface components, and developed new models for Machine Translation Quality Estimation and interactive Machine Translation.

Lexically Constrained Decoding for Sequence Generation Using Grid Beam Search

Grid Beam Search (GBS) extends the beam search algorithm to allow the inclusion of pre-specified lexical constraints. The algorithm can be used with any model which generates sequences token by token. Lexical constraints take the form of phrases or words that must be present in the output sequence. This is a very general way to incorporate auxillary knowledge into a model’s output without requiring any modification of the parameters or training data.

Code

Ensembling Factored Neural Machine Translation Models for Automatic Post-editing and Quality Estimation

Factored NMT models (models with multiple input representations) are used for both Automatic Post-Editing and Word-Level Translation Quality Estimation by tuning for the metrics on the respective task, and achieve good performance on both tasks.

Code