Karen Hao reports on a groundbreaking initiative by Te Hiku Media in New Zealand, led by Peter-Lucas Jones and Keoni Mahelona. Fighting against the decline of the Māori language (te reo) and the extractive nature of Big Tech, the pair purchased their own hardware to train natural language processing models.
The article frames this struggle as a defense against “data colonialism.” Mahelona notes that “Data is the last frontier of colonization.” Rather than allowing large corporations to harvest their language data and sell it back to them, Te Hiku established strict data sovereignty protocols based on kaitiakitanga (guardianship). They successfully crowdsourced thousands of hours of speech from the community to build accurate recognition tools, proving that advanced AI development isn’t the exclusive domain of Silicon Valley and can be an act of cultural preservation.