Improved Explainability and Visual Representation for Confidence and Uncertainty in Speaker Models

PI: Helen Armstrong
Co-PI: Dr. Matthew Peterson
Research Assistants: Rebecca Planchart & Kweku Baidoo

The NC State Laboratory for Analytic Sciences Team.

RESEARCH QUESTION. How can interactive visualizations of confidence scores enable language analysts to more effectively calibrate trust with speaker model outputs?

RESEARCH OBJECTIVE: Reveal potential innovations in visualization and interface design that will increase the likelihood of language analysts efficiently validating speaker model outputs, especially from a human-machine trust calibration perspective.

OBJECTIVE 1: Explore and evaluate potential visualizations and UX patterns for signifying confidence and uncertainty in speaker model outputs.

OBJECTIVE 2: Create three different visual prototypes — in this case, mockups that provide explicit visual specifications for implementation — representing three possible solutions to this problem space. These visual prototypes should be structured so that usability testing might be efficiently conducted by the IC at the conclusion of the project.

In this 5 month project, our team dug into UX/UI A.I. explainability challenges related to speech recognition and speech diarization. Language analysts currently work with percentage scores that express the confidence and uncertainty of their speaker model’s output.

Analysts struggle to interpret or validate these numerical scores for which existing software platforms provide little context. The scores and surrounding user interface fail to provide context-sensitive layers of explainability, leading to analysts’ misunderstanding and trust miscalibrations of LLMs.

Our team explored alternative visualization methods, situating the results within current language analyst workflows. We worked with LAS-side developers and potential users to create testable prototypes that demonstrate these alternative visualizations in use, revealing additional explainability features that might bolster existing software capabilities.