Supporting the crowdsourced documentation of human rights violations with AI
The Syrian Archive project was launched in 2014 to preserve footage that was being shared online to document events taking place in Syria during the war. The team developed open-source tools to perform daily automated scraping of citizen-generated video content on social media. Combining this data with a smaller subset of active submissions from citizens and journalists, they assembled a vast archive – 3,314,265 sources of digital content contributions as of January 2020– and standardised the data into a format that could be accessed by a global community of human rights practitioners. The verification of video timestamps and geolocations was provided manually. However, video content is challenging to search because it contains many frames and only some of them might contain objects of interest for the user. To make the resource more useful for researchers, it needed to be searchable by various types of content relevant to conflict zones.
The team set out to create a workflow and a visual search engine that was specialised to support research and activism in conflict zones. Working with the technology partner VFRAME, they focused on the challenge of identifying and indexing cluster and cargo munitions (which are banned under international treaty) in individual video frames, using a neural network computer vision approach. Similar computer vision tools already existed, but they were all privately owned (many for military purposes) or too generic for use in this specialised context. Initially the team used a labour-intensive manual annotation approach to generate hundreds of labelled data objects to train the model – with the help of students and researchers at Berkeley Human Rights Center. This effort was not sufficient to train the data-hungry computer vision algorithm, so the team coupled their machine-learning pipeline to a 3D modelling software that simulated new photorealistic images of cargo and munitions under a wide variety of photographic conditions. The interaction between researchers and the algorithm is continuous. As researchers add resources to the database and tag relevant video frames with annotations of the munitions they contain, this data is fed back into training the algorithm, ensuring ongoing refinement of its performance.
The AI model has been integrated into the visual search engine, where it automatically indexes images and video frames relevant to search queries. The use of AI in this project has been designed to accelerate and amplify the existing efforts of researchers and human rights investigators. The team has started working towards transferring their approach to documented footage from the conflict in Yemen, demonstrating the wider impact of their approach. Although still in early stages of deployment, it has the potential to radically transform human rights investigations.