Conference Proceedings Talk on Labadain-30k+ Dataset Construction

Date:

This conference proceedings talk presents the construction of Labadain-30k+, a manually audited Tetun text dataset, outlining the data collection pipeline, quality control process, and key insights from content analysis to support NLP and information retrieval research in a low-resource language.