Skip to main navigation Skip to search Skip to main content

Fitting a Square Peg into a Round Hole: Creating a UniMorph dataset of Kanien'kéha Verbs

  • National Research Council of Canada

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

1 Scopus citations

Abstract

This paper describes efforts to annotate a dataset of verbs in the Iroquoian language Kanien'kéha (a.k.a. Mohawk) using the UniMorph schema (Batsuren et al., 2022a). The dataset is based on the output of a symbolic model - a hand-built verb conjugator. Morphological constituents of each verb are automatically annotated with UniMorph tags. Overall the process was smooth but some central features of the language did not fall neatly into the schema which resulted in a large number of custom tags and a somewhat ad hoc mapping process. We think the same difficulties are likely to arise for other Iroquoian languages and perhaps other North American language families. This paper describes our decision making process with respect to Kanien'kéha and reports preliminary results of morphological induction experiments using the dataset.

Original languageEnglish
Title of host publicationComputEL 2024 - 7th Workshop on the Use of Computational Methods in the Study of Endangered Languages, Proceedings of the Workshop
EditorsSarah Moeller, Godfred Agyapong, Antti Arppe, Aditi Chaudhary, Shruti Rijhwani, Christopher Cox, Ryan Henke, Alexis Palmer, Daisy Rosenblum, Lane Schwartz
PublisherAssociation for Computational Linguistics (ACL)
Pages39-51
Number of pages13
ISBN (Electronic)9798891760868
StatePublished - 2024
Event7th Workshop on the Use of Computational Methods in the Study of Endangered Languages, ComputEL 2024 - Hybrid, St. Julian's, Malta
Duration: Mar 21 2024Mar 22 2024

Publication series

NameComputEL 2024 - 7th Workshop on the Use of Computational Methods in the Study of Endangered Languages, Proceedings of the Workshop

Conference

Conference7th Workshop on the Use of Computational Methods in the Study of Endangered Languages, ComputEL 2024
Country/TerritoryMalta
CityHybrid, St. Julian's
Period03/21/2403/22/24

Fingerprint

Dive into the research topics of 'Fitting a Square Peg into a Round Hole: Creating a UniMorph dataset of Kanien'kéha Verbs'. Together they form a unique fingerprint.

Cite this