Tutorial: ACL-23: Indirectly Supervised Natural Language Processing


Wenpeng Yin, Muhao Chen, Ben Zhou, Qiang Ning, Kai-Wei Chang and Dan Roth.

Date and Time

July 9, 2023, 9:30am ET.

Goal of Tutorial:

This tutorial targets researchers and practitioners who are interested in ML technologies for NLP from indirect supervision. In particular, we will present a diverse thread of indirect supervision studies that try to answer the following questions: (i) when and how can we provide supervision for a target task T, if all we have is data that corresponds to a ''related'' task T'? (ii) humans do not use exhaustive supervision; they rely on occasional feedback, and learn from incidental signals from various sources; how can we effectively incorporate such supervision in machine learning? (iii) how can we leverage multi-modal supervision to help NLP? To the end, we will discuss several lines of research that address those challenges, including (i) indirect supervision from T' that handles T with outputs spanning from a moderate size to an open space, (ii) the use of sparsely occurring and incidental signals, such as partial labels, noisy labels, knowledge-based constraints, and cross-domain or cross-task annotations---all having statistical associations with the task, (iii) principled ways to measure and understand why these incidental signals can contribute to our target tasks, and (iv) indirect supervision from vision-language signals. We will conclude the tutorial by outlining directions for further investigation.


Conventional approaches to NLP rely on task-specific labeled examples of a large volume. This does not apply to scenarios where tasks may be too complicated or costly to annotate, or the system is required to handle a new task immediately. Many people increasingly perceive that pretrained language models (PLMs) use self-supervision, and therefore there is no need for supervision anymore. While this is probably true for Encoder-only models (e.g., BERT), this does not hold for Decoder models, where people nowadays use vast amounts of supervision and reinforcement learning signals. Therefore, it is still desirable to gather supervision that has already existed in related tasks or is pretty cheap, which is termed "indirect supervision" in this tutorial.

Recently, there have been increasing works that study indirect supervision for a wide range of NLP tasks. For example, and respectively leveraged the rich annotation of a source task (natural language inference or summarization) to address the poorly-annotated target tasks. To make better use of the natural texts, some literature proposed to explore incidental supervision, e.g., phonetic similarity and similar temporal distribution for named entity transliteration, to help downstream tasks. That sort of incidental supervision is often weak signals that exist in the data and the environment independently of the tasks at hand, and is hard to be encoded by PLMs. Furthermore, when accessing supervision from pure text is challenging, researchers turned to other modalities for indirect supervision.

This tutorial presents a comprehensive introduction of those lines of frontier research on indirectly supervised NLP. In particular, it tries to answer the following questions: (i) Which source task is easier to be adapted to solve various target tasks and any constraints there? (ii) What are the limitations of pretrained language models in discovering supervision from natural texts, and how can we alleviate them with incidental signals? (iii) Are there any theoretical measures that can indicate the benefits of the incidental signals to a given downstream task? (iv) How to mitigate the gap between different modalities if we want to utilize image/video knowledge to guide NLP? By addressing those critical questions, we believe it is necessary to present a timely tutorial to comprehensively summarize the new frontiers in indirectly supervised NLP research and point out the emerging challenges that deserve further investigation. Participants will learn about recent trends and emerging challenges in this topic, representative tools and learning resources to obtain ready-to-use models, and how related technologies benefit end-user NLP applications.

Tutorial Outline

(Automatic Summary)

Background and Motivation [20 min]

We will begin motivating this topic with a selection of real-world applications and emerging challenges of NLP with limited end-task annotations.

Indirect Supervision from NLU Tasks [30 min + 5 min QA]

We begin with indirect supervision from a source task that efficiently handles a moderate size of outputs in the target task. In many zero/few-shot text classification tasks, such as topic classification, entity typing, and relation identification, the main challenge is enabling systems to understand label semantics. Instead of converting labels into indices as conventional supervised classifiers do, we introduce NLI-based approaches that consider both input and label semantics. Specifically, we discuss various works that treat different topics, stances, entity types, event types, relations, and question-answer scenarios as hypotheses and use pretrained NLI systems for classification tasks with a given set of labels. Furthermore, we present extractive question answering (Ex-QA) based supervision for downstream tasks. The advantage of Ex-QA-based indirect supervision over NLI-based supervision is its ability to handle sequence tagging and span detection tasks, while NLI-based approaches mainly focus on classification.

Indirect Supervision from NLG and IR [30 min + 5 min QA]

We will introduce methodologies that utilize indirect supervision signals from natural language generation (NLG) and information retrieval tasks to tackle low-resource discriminative tasks more effectively. By formulating discriminative tasks as generation tasks, we can guide PLMs to leverage the semantics of decision labels more efficiently. This approach typically results in a sequence-to-sequence generation process that produces a verbalization of the decision label based on the input sequence. Instead of predicting classification logits, these models represent the class as a concise structure and use controlled decoding for generation. This allows for cross-task signal transfer from high-resource NLG tasks and captures a semantically rich representation of the discriminative task's original decision space. We will also introduce methods that reformulate tasks as retrieval tasks, enabling the use of a dense retrieval model's inductive bias to handle discriminative tasks with large decision spaces, such as entity linking and fine-grained typing.

Incidental Supervision from Natural Text [25 min + 5 min QA]

We recognize that natural texts contain numerous incidental signals that can be used for various downstream tasks with minimal human effort. Although PLMs can provide these signals for a wide range of tasks, they lack control over the types of knowledge they contain. Therefore, we introduce incidental relations found in text spans, such as keywords and linguistic patterns, which can offer supervision for tasks like relation extraction, temporal reasoning, and affordance reasoning. Additionally, global information like publication dates, titles, and authors can establish relations that assist with complex tasks. By designing and collecting these linguistic patterns, we inject human knowledge into the process, creating diverse automatic supervision for many tasks that PLMs cannot find on their own.

Theoretical Analysis of Incidental Supervision [25 min + 5 min QA]

We previously present real-world applications of incidental signals and discuss the challenges of measuring their benefits and understanding their impact on the learning problem. We introduce existing efforts that conver a unified theoretical framework for multi-class classification and a PAC-Bayesian motivated informativeness measure, PABI. We share studies demonstrating PABI's effectiveness in quantifying the value added by incidental signals to sequence tagging tasks. Lastly, we identify gaps in this research area and suggest future research directions.

Indirect Supervision from Multi-modalities [25 min + 5 min QA]

In the previous section, we discussed leveraging indirect supervision from text data. Now, we will explore methods that use indirect supervision in multimodal data for cross-modality tasks, focusing on vision-language tasks. We will introduce methods that align visual and text tokens based on image caption data, which can be applied to various text, image, and mixed modality tasks. Additionally, we will present approaches that use only indirect supervision from object recognition models to learn text-image alignment from unaligned language and vision data. Finally, we will delve into methods for learning to ground language elements to image regions without explicit supervision.

Future Research Directions [15 min]

We explore indirect supervision as a solution for NLP tasks with limited labeled data. We will conclude by discussing challenges and potential research topics, including explaining model predictions with indirect supervision, incorporating incidental signals expressing human knowledge, and using task instructions as supervision.


  • Tutorial slides