Conference Proceedings Talk on the Labadain Crawler Pipeline for LRLs

Date:

This conference proceedings talk presents the Labadain Crawler, a web-based data collection pipeline designed for low-resource languages, detailing its architecture, language processing components, and its application to building a high-quality Tetun text corpus.