South African scientists use news headlines to power AI — Quartz


If an AI researcher wants to build a natural language processing model in English, there’s no shortage of data to train her algorithms.

With a click, she could have 1.8 million articles from the New York Times archives, carefully tagged by topic. She might throw in 800,000 stories from the Reuters archives, or 30 million words of text from the Wall Street Journal. Of course, she could also just use the state-of-the-art GPT-3 language model, which cut its teeth on more than 290 billion English words scraped from around the web.

But if she wants to build a model that will work for Setswana…



Source link

Tags:
ACM
About Author: ACM
This information is 3rd party content that is added to ACM strictly for non-commercial informational purposes. Information is the Key to Set the Black Mind Free