Insights into LLM-Based Conversational Search: A Study of Tetun-Speaking Users’ Search Behavior

Published in the 11th ACM SIGIR / the 15th International Conference on Innovative Concepts and Theories in Information Retrieval (ICTIR), Padua, Italy, 18 July, 2025

Advancements in large language model (LLM)-based conversational assistants have transformed search experiences into more natural and context-aware dialogues that resemble human conversation. However, limited access to interaction log data hinders a deeper understanding of their real-world usage. To address this gap, we analyzed 16,952 prompt logs from 904 unique users of Labadain Chat, an LLM-based conversational assistant designed for Tetun speakers, to uncover patterns in user search behavior, engagement, and intent.

Our findings show that most users (29.87%) spent between one and five minutes per session, with an average of 43 unique daily users. The majority (93.97%) submitted multiple prompts per session, with an average session duration of 16.9 minutes. Most users (95.22%) were based in Timor-Leste, with education and science (28.75%) and health (28.00%) being the most searched topics.

We compared our findings with a study on Google Bard logs in English, revealing similar search characteristics—including engagement duration, command-based instructions, and requests for specific assistance. Furthermore, a comparison with two conventional search engines suggests that LLM-based conversational systems have influenced user search behavior on traditional platforms, reflecting a broader trend toward command-driven queries. These insights contribute to a deeper understanding of how user search behavior evolves, particularly within low-resource language communities. To support future research, we publicly release LabadainLog-17k+, a dataset of over 17,000 real-world user search logs in Tetun, offering a unique resource for investigating conversational search in this language.

Keywords: Prompt log analysis, Tetun, Low-resource languages, Conversational search, Large language models, Dataset.

Download paper

Share on

Bluesky Facebook LinkedIn X (formerly Twitter)

Gabriel de Jesus

Share on