Skip to main navigation Skip to search Skip to main content

Doctoral Dissertation Research: Discourse relation annotation in speech databases

Project: Research

Project Details

Description

When speakers engage in conversation, they produce utterances that are meaningfully connected to what they and others have said, which makes the conversation or discourse coherent. For example, speakers may elaborate on what they just said, provide an explanation, or connect multiple events through narration. These connections form what are known as rhetorical or discourse relations between utterances and have been studied using multiple theories and frameworks. Each one has proposed its own set of names for these relations and ways of classifying them, but there has not yet been a study that investigates which categories people actually perceive. It is also very hard to identify these relations, both for humans and computational systems. In this doctoral dissertation project, researchers investigate how many and what relations English speakers perceive without presupposing any list of discourse relations. Along with training of a doctoral student, this project collects data that will improve our understanding of how speakers perceive and categorize discourse relations, which can be used to improve computational systems such as dialogue agents and writing assistants. This project approaches the study of discourse relations from a semantic categorization perspective, whereby categories of discourse relations are inferred based on similarity perceived by English speakers. The investigators test these categories through the annotation of spontaneous conversational data. The dissertation project consists of two studies – a categorization task, which includes an analysis of empirically-derived categories of discourse relations and existing taxonomies, and a speech database study based on the annotation of discourse relations on a subset of the Switchboard speech database of telephone conversations. Together, this project develops a taxonomy of discourse relations suitable for spoken conversational data based on empirical research, compares how discourse relations are conceptualized by English speakers to existing taxonomies of discourse relations, and builds a dataset of discourse relations in spontaneous conversations that analyzes variation and individual differences among speakers, and that can also be used as input or training data to computational systems or other applications. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
StatusFinished
Effective start/end date03/1/2402/28/25

Funding

  • National Science Foundation: $19,320.00

Fingerprint

Explore the research topics touched on by this project. These labels are generated based on the underlying awards/grants. Together they form a unique fingerprint.