Talks and presentations

AI for Tetun: Building Timor-Leste’s Inclusive Digital Future

November 21, 2025

Keynote Speaker at the TLNOG2 Conference, F-FDTL Auditorio Room, Fatuhada, Dili, Timor-Leste

This talk explores how artificial intelligence can support Tetun, a low-resource and official language of Timor-Leste, by enabling inclusive digital access through language technologies, datasets, and information retrieval systems.

Download Slides

Labadain: The Foundation of Tetun Language Technology

November 20, 2025

Invited Talk at the DEI–FECT–UNTL National Seminar, FECT Auditorium Room, Hera, Dili, Timor-Leste

This talk introduces Labadain as the foundation of Tetun language technology, showcasing how datasets, tools, and AI systems enable inclusive digital access for Tetun speakers.

Download Slides

Conference Proceedings Talk on the Labadain Crawler Pipeline for LRLs

May 22, 2024

Conference proceedings talk, at the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), Torino, Italy

This conference proceedings talk presents the Labadain Crawler, a web-based data collection pipeline designed for low-resource languages, detailing its architecture, language processing components, and its application to building a high-quality Tetun text corpus.

Download Slides

Conference Proceedings Talk on Labadain-30k+ Dataset Construction

May 20, 2024

Conference proceedings talk, at the 3rd Annual Meeting of the Special Interest Group on Under-resourced Languages at LREC-COLING 2024, Torino, Italy

This conference proceedings talk presents the construction of Labadain-30k+, a manually audited Tetun text dataset, outlining the data collection pipeline, quality control process, and key insights from content analysis to support NLP and information retrieval research in a low-resource language.

Download Slides