Insights into LLM-Based Conversational Search: A Study of Tetun-Speaking Users’ Search Behavior

Published in International ACM SIGIR Conference on Innovative Concepts and Theories in Information Retrieval (ICTIR), 18 July 2025, Padova, Italy, 2025

Advancements in large language model (LLM)-based conversational assistants have transformed search experiences into more natural and context-aware dialogues that resemble human conversation. However, limited access to interaction log data hinders a deeper understanding of their real-world usage. To address this gap, we analyzed 16,952 prompt logs from 904 unique users of Labadain Chat, an LLM-based conversational assistant designed for Tetun speakers, to uncover patterns in user search behavior, engagement, and intent.

Our findings show that most users (29.87%) spent between one and five minutes per session, with an average of 43 unique daily users. The majority (93.97%) submitted multiple prompts per session, with an average session duration of 16.9 minutes. Most users (95.22%) were based in Timor-Leste, with education and science (28.75%) and health (28.00%) being the most searched topics.

We compared our findings with a study on Google Bard logs in English, revealing similar search characteristics—including engagement duration, command-based instructions, and requests for specific assistance. Furthermore, a comparison with two conventional search engines suggests that LLM-based conversational systems have influenced user search behavior on traditional platforms, reflecting a broader trend toward command-driven queries. These insights contribute to a deeper understanding of how user search behavior evolves, particularly within low-resource language communities. To support future research, we publicly release LabadainLog-17k+, a dataset of over 17,000 real-world user search logs in Tetun, offering a unique resource for investigating conversational search in this language.

Keywords: Prompt log analysis, Tetun, Low-resource languages, Conversational search, Large language models, Dataset.