Can semantic search increase the efficiency of human rights defenders in building a shared database on restrictions of digital rights?
The experiment will test the potential for human rights organisations to interact with machine-generated intelligence in their work. The grantee will look into whether machine learning can increase the effectiveness and efficiency of human rights defenders in curating large collections of human rights documents, and enable them to make better use of data generated by collective intelligence. HURIDOCS will test algorithms to improve semantic search of human rights documents in a digital rights database covering 20 countries in the Arab League. Semantic search aims to improve search accuracy by understanding the searcher's intent and the contextual meaning of terms to generate more relevant results. This means that search results will reveal not only documents containing exactly that word, but also will show documents from related fields that might use different terminology to talk about the same subject.
The researchers found that one algorithm, FT_HR fasttext, outperformed the others in keyword searches. USE, an algorithm trained on large collections of documents, performed best when it came to searching similar sentences. The sample size for the second part of the experiment was too small to draw meaningful conclusions.
Making informed decisions in today’s saturated world requires not only the ability to access information, but also the ability to filter it based on its relevance to your needs. This is true across many different sectors, including human rights. From legislation and legal cases to progress reports and diplomatic commitments, there exists an abundance of human rights information that can potentially support efforts to protect people’s fundamental dignity and freedoms. For this information to be meaningful, however, human rights defenders need to be able to quickly and efficiently find it - and human rights defenders are a diverse group, working across multiple languages, sets of terminology and professional expertises. This means joining different databases, and therefore accessing the knowledge in them, is challenging and can hinder collaboration between different groups who all pursue the same goal.