Accepted Papers – Co-Data @ CHI 2026

AstraClean-RAG: Supporting Human–LLM Collaboration in Tabular Data Cleaning through Dual-Source Retrieval

Bingxiang Chen, Toni Taipalus · Tampere University, Finland

Abstract

Data cleaning is a critical component of data preparation, particularly for domain-specific tables where conventions and interpretations play an important role. While rule-based approaches provide transparency, they are difficult to maintain as data and practices evolve, and large language models (LLMs) can produce inconsistent or unreliable suggestions when used in isolation. In collaborative data settings, cleaning decisions often need to be revisited, compared and justified, as similar but not identical cases recur over time. We present AstraClean-RAG, a human–LLM collaborative framework for tabular data cleaning that uses Retrieval-Augmented Generation (RAG) to incorporate external evidence into repair suggestion generation. The framework retrieves information from two sources: domain-specific rules and historical logs of human-verified corrections. When retrieved evidence from different sources diverges, the system surfaces the conflicting information and produces neutral summaries to support human review, rather than automatically resolving the conflict. An exploratory evaluation on benchmark datasets shows that dual-source retrieval improves consistency and supports human-in-the-loop workflows, suggesting that LLMs can generate more grounded repairs while leaving ultimate judgment to human reviewers.

When Vibes Are Not Enough To Work With Data: Literacy and Skill Gaps In Vibe Coding

Besjon Cifliku · Center for Advanced Internet Studies (CAIS), Germany

Abstract

Using AI as a peer for coding and software development has been the subject of growing debate within the scientific community, especially regarding the agency, autonomy, and creativity of software developers. In this paper, I examine Human-AI collaboration from the perspective of non-technical journalists who could use AI to build journalistic tools. I consider vibe coding a collaborative tool that could shape users' data workflows, empowering them to develop their own tools for handling data. However, in this position paper, I argue that vibe coding, although portrayed as a tool that requires no technical expertise, requires a higher level of computational literacy. I elaborate on the main issues HCI should target to effectively enable Human-AI collaboration to democratize software creation and data processing.

Empowering Users in Graph Rule Mining via Large Language Models

Francesco Cambria, Francesco Invernici, Andrea Colombo, Anna Bernasconi

Abstract

In the era of interconnected data, graphs have emerged as an effective abstraction for modeling complex systems in an intuitive format, especially with the rise of Property Graphs, which offer an intuitive and scalable way of navigating non-intuitive structures. In this context, graph mining techniques have been developed for testing complex graph-based rules, as the MINE GRAPH RULE operator, which, however, require users to have prior expertise both in graph theory and formal query language. In this work, we propose to bridge the gap between users and the graph-association rule-mining process by showing how Large Language Models (LLMs) can be easily prompted to formulate, refine, and interpret complex relational rules, directly producing MINE GRAPH RULE queries.

Curation Cards: Making Data Curation Decisions Visible

Shreya Chappidi, Andra V. Krauze, Jatinder Singh · University of Cambridge; National Cancer Institute, NIH

Abstract

Early-stage decision-making during the algorithm development lifecycle, including problem formulation and data labelling, critically shape the design and outcomes of downstream processes such as algorithmic performance and user adoption. Human-LLM collaborations bring multiple new dynamics to this early-stage decision-making as users can more explicitly shape data annotation and labeling practices. First, users may not understand how design choices made during data curation could alter or shape their intended problem formulations. Second, the non-deterministic nature of LLMs means that small (and often implicit) choices during data curation such as prompt style or phrasing can meaningfully influence the distribution of resulting data labels. To support early-stage decision-making on how to operationalize problem formulations into LLM-supported data curation practices, we propose curation cards—a structured documentation approach that captures purpose/goals of data curation, data input and model specifications, labeling schema and concept definitions, system prompts, reference sources, prompt text and non-semantic details, pipeline design, and error analysis.

Towards A New Taxonomy to Design and Validate Human-AI Collaborative (HAIC) Workflows for Open Text Analysis

Madison Elliott, Yongwei Yang, Seth Margolis · Google, USA

Abstract

Open text analysis consistently faces challenges regarding subjectivity and scalability. While Large Language Models (LLMs) offer transformative potential, the field lacks structured frameworks for selecting specific modeling approaches and standardized validation methods. We propose a novel taxonomy for Human-AI Collaborative (HAIC) workflows, categorizing text analysis into three distinct scenarios: Low-Precision Descriptive Insights, High-Precision Thematic Sizing, and Granular Issue Flagging. By decomposing the analysis pipeline into seven core modular tasks, we define optimal distributions of labor between human expertise and machine scale, ranging from human-led architectural roles to AI-driven automation. Furthermore, we advocate for an argument-based evaluation approach to address LLM non-determinism, emphasizing the need for continuous calibration and uncertainty quantification. Our work provides a prototypical model for designing human-in-the-loop (HITL) agentic workflows, offering a roadmap for researchers to balance speed and rigor in evolving AI-driven research environments.

Understanding LLM Usage Behavior of Online Survey Participants

Ruijie Sophia Huang, Joey Li, Monica P. Van, Nayeli Suseth Bravo, Jonathan Q. Li, Matthew L. Lee · Toyota Research Institute, USA

Abstract

Recent studies report that many online participants use tools such as ChatGPT when completing surveys, especially for open-ended questions, raising concerns about whether responses reflect their own experiences and opinions. To better understand this, we instrumented an online survey with a custom LLM chatbot side-by-side with survey questions to observe how people collaborate with LLM while answering. Our preliminary findings suggest that LLM use is shaped by question characteristics, influencing patterns such as the number of interaction turns and whether participants adopt or discard LLM-generated content, such that final responses alone do not fully capture how answers are produced. Based on the results, we suggested implications for designing survey experiences and data processing pipelines that account for hybrid human–LLM responses.

An LLM-Assisted Toolkit for Inspectable Multimodal Emotion Data Annotation

Zheyuan Kuang, Weiwei Jiang, Nicholas Koemel, Matthew Ahmadi, Emmanuel Stamatakis, Benjamin Tag, Anusha Withana, Zhanna Sarsenbayeva · University of Sydney; Monash University; UNSW, Australia

Abstract

Multimodal Emotion Recognition (MER) increasingly depends on fine grained, evidence grounded annotations, yet inspection and label construction are hard to scale when cues are dynamic and misaligned across modalities. We present an LLM-assisted toolkit that supports multimodal emotion data annotation through an inspectable, event centered workflow. The toolkit preprocesses and aligns heterogeneous recordings, visualizes all modalities on an interactive shared timeline, and renders structured signals as video tracks for cross modal consistency checks. It then detects candidate events and packages synchronized keyframes and time windows as event packets with traceable pointers to the source data. Finally, the toolkit integrates an LLM with modality specific tools and prompt templates to draft structured annotations for analyst verification and editing. We demonstrate the workflow on multimodal VR emotion recordings with representative examples.

Redistributing Interdependence in Organizational Knowledge Work: Lessons from Deploying an LLM Mediator in Research Labs

Sangwook Lee, Sang Won Lee · Virginia Tech, USA

Abstract

Large Language Models (LLMs) are increasingly embedded in collaborative workflows, yet their structural effects on the relationships between human stakeholders in data-intensive knowledge work remain underexplored. In this position paper, we reinterpret findings from a month-long field deployment of an LLM-based chatbot that mediates organizational memory across four university research labs (N=21) through the lens of Interdependence Theory. Our analysis reveals two key dynamics. First, the LLM redistributes the dependence structure between students and lab directors: while students gain autonomous access to institutional knowledge, directors lose visibility into knowledge gaps, shifting bilateral dependence toward unilateral dependence. This redistribution is moderated by organizational culture, amplifying mutual responsiveness in psychologically safe environments while dampening it where students default to private interaction. Second, the system fails to support the transformation of motivation needed for collaborative knowledge stewardship. We derive design implications including privacy-preserving awareness mechanisms, graduated contribution pathways, and designing for the LLM's dual role as a boundary object across stakeholder groups.

Meaning-Making at Scale: Structuring Productive Human-AI Interdependence in Qualitative Inquiry

Feng Zhou, Jacqueline Meijer-Irons, Ambar Murillo

Abstract

While Large Language Models (LLMs) can process vast datasets at scale, they often struggle to match the nuanced, context-dependent depth of human analysis—a tension that suggests that maximizing automation is at odds with the interpretive nature of qualitative inquiry. We argue that effective Human-AI collaboration is not an automation problem, but an interdependence problem. This paper proposes a formal framework to structure human-AI interdependence to resolve the dilemma between broad scale and deep analysis by dynamically selecting an appropriate Level of Automation (LoA) for each analytical phase. Grounded in Interdependence Theory and an industry case study, we present three core design principles: 1) capability asymmetry, which defines roles based on human and AI strengths; 2) dynamic calibration, which adaptively structures workflows by weighing total risk (the effects of error and its likelihood) against validation cost; and 3) bi-directional validation, which maintains analytic integrity through mutual auditing loops. These principles demonstrate how to leverage AI as a powerful partner while preserving the researcher's irreplaceable role in the interpretive process of meaning-making.

From Tool to Collaborator: Designing LLM-Mediated Policy Analysis for Multidisciplinary Data Work Across Africa

Mboh Bless Pearl N · Carnegie Mellon University Africa, Rwanda

Abstract

Collaborative policy analysis across African data protection frameworks presents a uniquely challenging instance of multidisciplinary data work: 54 countries, five or more official languages, and regulatory maturity that ranges from comprehensive legislation to no enacted law. We present Policy Analyzer, a deployed Retrieval-Augmented Generation (RAG) platform that positions large language models (LLMs) as active collaborators, rather than passive retrieval tools, in this complex analytical landscape. Through the design and deployment of a 10-stage query processing pipeline with eight specialized LLM agents, we identify five reusable human–LLM collaboration patterns: layered decision-making, multilingual legal-semantic mediation, trust-aware delegation, adaptive query processing, and multitenant data isolation. We ground these patterns in established HCI frameworks—Interdependence Theory, Distributed Cognition, and Common Ground—and reflect on implications for designing collaborative data systems for the Global South.

Swimming in Logs: Streamlining Triage with LLM-Supported Log Analysis

David J. Murphy, Jude Yew, Angelica Rosenzweig Castillo, Noah Kersey · Google LLC

Abstract

Debugging failures using system logs is a critical, yet friction-filled process due to log volume and complexity. In this case study, we present a collaborative human-AI system designed to streamline log triage by leveraging Large Language Models (LLMs). Our novel approach focuses on automatically identifying, sanitizing, and analyzing the differential log lines between passing and failing test runs, effectively mitigating the signal-to-noise problem and managing the LLM's context budget. Through an iterative design and evaluation process (RITE), we demonstrate that this Human-LLM collaboration significantly accelerates fault diagnosis, improving task success rates. Furthermore, users report substantial time savings (15–60 minutes per use) by enabling the AI to propose causal chains and productive next steps. Our work suggests that effective human-LLM collaboration in data-intensive tasks is achieved not through full automation, but by co-designing workflows where human-led filtering reduces the AI's search space, allowing the LLM to provide semantic guidance for complex system failures.

From Queries to Conversations: Pragmatic and Interdependent Human–LLM Collaboration in Data Work

Vidya Setlur · Tableau Research, USA

Abstract

Data-intensive workflows are rarely solitary endeavors. Analysts, domain experts, and data engineers collaborate to collect, clean, integrate, annotate, and query data, often negotiating meaning, intent, and responsibility through conversation. As Large Language Models (LLMs) are increasingly embedded in data tools, their role is typically framed as responding to isolated queries or automating discrete steps. We argue that this framing underutilizes LLMs' potential and obscures important challenges in collaborative data work. In this position paper, we propose treating LLMs as pragmatic collaborators rather than tools—participants that engage in ongoing conversations, mediate semantic differences, and adapt their behavior based on interdependence among stakeholders. Grounded in Gricean Maxims and Interdependence Theory, we examine how LLMs can support intent refinement, handle vagueness, and maintain discourse coherence across multi-party data workflows. We focus on semantic misalignment and goal misalignment as recurring sources of breakdown, and identify failure cases where violations of pragmatic norms undermine trust and shared outcomes.

Designing Human–LLM Collaboration by Preserving a Human Locus of Agency in Qualitative Analysis

Crystal Silver, Alex Fawzi, Diane Morrow, Stefano De Paoli · Abertay University, UK

Abstract

This case study examines how qualitative analysis with large language models (LLMs) can move beyond simple query–response interaction toward staged, role-differentiated human–LLM collaboration in time-constrained, multi-stakeholder research contexts. Following post hoc validation against prior human-led reflexive thematic analysis, a human-LLM thematic analysis workflow was deployed across two large, multinational digital research infrastructure projects. Implemented through a bespoke open-source graphical application, it decomposes reflexive thematic analysis into bounded stages, exposing prompts, outputs, and model parameters such as temperature and top-p for inspection and control. Pattern processing and candidate synthesis are allocated to the LLM, while interpretive judgement and analytic progression remain with the research team. The primary design challenge was legitimacy friction around "AI-generated" artefacts, particularly personas and scenarios that represent interview participants. We introduced a stakeholder evaluation stage in which partners selected between paired artefacts curated by researchers, extending stakeholder agency over artefact endorsement while retaining research team interpretive authority.

Exploring Role-Sensitive AI Mediation in Collaborative Data Work: A Wizard-of-Oz Pilot Study

Anmol Singhal

Abstract

Collaborative software requirements elicitation is often hindered by poor coordination and difficulty articulating and refining ideas within teams. In recent years, Large Language Models (LLMs) have increasingly been introduced into collaborative work settings in software development, yet little is known about how AI assistance interacts with naturally emerging team role dynamics. To address this gap, we conducted an exploratory Wizard-of-Oz pilot study that examines how different AI intervention styles influence small-group collaboration during software requirements elicitation discussions. In three-person groups, participants received simulated AI cues designed to provide (1) divergent support that encouraged idea expansion, (2) convergent support that facilitated refinement and narrowing, or (3) Theory-of-Mind (ToM)–oriented support that prompted perspective-taking. We qualitatively coded discussion transcripts to identify emergent collaborative roles and examine how these roles evolved in response to AI interventions.

LLMs and Agentic AI as Value-Adding Collaborators in Multi-Stakeholder, Heterogeneous Data Workflows: Practical Insights from the FinTech Industry

Wojtek Buczynski, Jingkun (Charly) Zhu · University of Cambridge; King's College London; Fast Audit AI; FirmView AI

Abstract

Financial services have been one of the most data-intensive industries in the world for decades. Data—its accuracy, precision, completeness and timeliness, sometimes measured in milliseconds—can make the difference between profit and loss. This paradigm has now started to shift, with Large Language Models (LLMs)—increasingly working as orchestrators for (multi)agentic AI—gradually becoming capable of searching, analysing and synthesizing qualitative and quantitative data from various heterogeneous sources to a standard close to that of a highly-educated human analyst. However, it is not without its trade-offs: LLMs lack context and common sense to make judgement calls, and they inherently confabulate (hallucinate)—for this reason alone, human oversight is essential. Our paper discusses how humans and LLMs can collaborate in data-intensive workflows using equity research as the case study, presenting a concrete, auditability-first technical blueprint for regulated human–LLM collaboration.

When Language Models Help Patients Understand Clinical Documentation

Rolande Umuhoza · Minnesota State University, USA

Abstract

Hospital discharge notes are written for clinical precision but are expected to guide patients after hospitalization. Large Language Models show potential for translating these notes into accessible language, yet they frequently introduce unsupported statements and omit critical medical information. This study evaluates the effectiveness of an oracle-guided feedback loop in improving the faithfulness of patient-facing explanations of discharge documentation. Using 30 high-quality synthetic discharge summaries, we measured the rate of unsupported statements using a Natural Language Inference framework centered on the RoBERTa-large-MNLI model. Our results show that oracle-guided feedback significantly reduced the unsupported rate from 0.7608 to 0.5588 (p = 0.00077). The medium-to-large effect size (Cohen's d = 0.6978) suggests that structured feedback is a robust mechanism for enhancing the reliability of clinical NLP tools for patient use. This work frames discharge notes as shared data artifacts and studies how structured feedback enables LLMs to act as reliable mediators between clinician documentation and patient understanding.

Untangling Coordination in Conversational Form Filling: A Position on Interdependence and Mediation

Timothy Veigel

Abstract

Form filling is a foundational mechanism for collaborative data work and an inherently asynchronous coordination practice. Errors and ambiguities are therefore often discovered only during reuse, when repair is costly or infeasible. Conversational assistants mitigate some of these issues by eliciting missing information and clarifying intent, but in doing so, they introduce a mediation layer that interprets input, controls interaction flow, and commits structured values on behalf of downstream users. Rather than removing asynchrony or risk, conversational form filling redistributes asymmetric dependence across contributors, assistants, and downstream roles. We propose a positioned lens for analyzing conversational form filling as collaborative data work, grounded in Interdependence Theory and Coordination Theory. From this lens, we derive a diagnostic evaluation decomposition: record commitment fidelity, dependency-respecting coordination, and contributor experience. We argue that many UX-reported issues in conversational data collection reflect upstream breakdowns that remain conflated in existing evaluations.

Beyond One User and One Model: Rethinking LLMs as Mediators in Collaborative Sensemaking and Ideation

Pengqi Wang, Emily Kuang · York University, Canada

Abstract

As LLMs enter qualitative data workflows, they are predominantly designed for individual researchers rather than collaborative teams—risking disruption to the collective negotiation essential to sensemaking, driving premature consensus, and potentially erasing marginalized voices. In this position paper, we argue that LLMs must be reconceived as active team mediators. Grounded in empirical studies of human-LLM collaborative analysis, we propose three design directions: quote-level traceability to preserve accountability of human and LLM contributions, controllable delegation boundaries to maintain researcher agency across workflow stages, and structured dissent mechanisms to surface interpretive conflicts that teams might otherwise suppress. We ground these directions in accessibility research as a domain where methodological rigor and representational equity are obligatory.

BeingDB: Git-Versioned Facts Queried by Language Models

John Moore · The National Archives, UK

Abstract

Structured knowledge queries such as "Who created this?" and "Where was it shown?" require reliable, joinable facts. BeingDB treats facts like source code: stored as Prolog-style predicates in Git, versioned through pull requests, and compiled into a query-optimised snapshot served by a read-only API. Subject matter experts contribute facts using familiar Git workflows, bypassing database expertise barriers. LLMs handle natural language reasoning and translation, but cannot modify the authoritative fact base, creating a clear boundary that prevents hallucinations from altering curated facts. We present the system design and a planned deployment for digital humanities that will gather qualitative and quantitative metrics on user trust and query success. We describe how architectural separation—with facts as human-curated source and LLMs as query mediators—could enable trustworthy, accessible knowledge bases.

What Should LLMs Do Here? Toward Transitional Collaboration in a Data Design Practice

Arganka Yahya · Eindhoven University of Technology; Eindhoven AI Systems Institute; Canon Production Printing, The Netherlands

Abstract

Collaborative data-design practices in high-technology industrial settings are inherently heterogeneous, varying in nature as interdisciplinary teams move from strategically selecting data sources to collaboratively constructing meaning from integrated datasets. This position paper grounds this observation in two industrial studies situated along a data-enabled design trajectory within a data design practice spectrum: one concerned with ideating on sensor configurations at the divergence phase, the other with transdisciplinary sensemaking of behavioral and engine data at the emergence phase. By reflecting on these studies, we propose that the collaborative role of LLMs should be similarly transitional, shifting in form and dynamics as engagement with data and artifacts evolves along this trajectory. We identify opportunities and tensions in both divergence and emergence, and close with questions about how LLMs might support these phases and enable transitions between them.