Publications

TIGQA: An Expert-Annotated Question-Answering Dataset in Tigrinya

Published in LREC-COLING 2024, 2024

The absence of explicitly tailored, accessible annotated datasets for educational purposes presents a notable obstacle for NLP tasks in languages with limited resources. This study initially explores the feasibility of using machine translation (MT) to convert an existing dataset into a Tigrinya dataset in SQuAD format. As a result, we present TIGQA, an expert-annotated dataset containing 2,685 question-answer pairs covering 122 diverse topics such as climate, water, and traffic. These pairs are from 537 context paragraphs in publicly accessible Tigrinya and Biology books. Through comprehensive analyses, we demonstrate that the TIGQA dataset requires skills beyond simple word matching, requiring both single-sentence and multiple-sentence inference abilities. We conduct experiments using state-of-the-art MRC methods, marking the first exploration of such models on TIGQA. Additionally, we estimate human performance on the dataset and juxtapose it with the results obtained from pre-trained models. The notable disparities between human performance and the best model performance underscore the potential for future enhancements to TIGQA through continued research. Our dataset is freely accessible via the provided link to encourage the research community to address the challenges in the Tigrinya MRC.

Recommended citation: Hailay Kidu Teklehaymanot, Dren Fazlija, Niloy Ganguly, Gourab K. Patro, and Wolfgang Nejdl (2024). "TIGQA: An Expert-Annotated Question-Answering Dataset in Tigrinya" In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation. https://arxiv.org/abs/2404.17194

How Real Is Real? A Human Evaluation Framework for Unrestricted Adversarial Examples

Ongoing Work

Published in AAAI 2024 Spring Symposium on User-Aligned Assessment of Adaptive AI Systems, 2024

In the image domain, adversarial examples represent maliciously perturbed images that look benign to humans but greatly mislead state-of-the-art ML models. Previously, researchers ensured the imperceptibility of their altered data points by restricting perturbations via ℓp norms. However, recent publications claim that creating natural-looking adversarial examples without such restrictions is also possible. With much more freedom to instill malicious information into data, these unrestricted adversarial examples allow attackers to operate outside the expected threat models. However, surveying existing image-based methods, we noticed a lack of human evaluations of the proposed image modifications. To analyze the imperceptibility of these attacks, we propose SCOOTER – an evaluation framework for unrestricted image-based attacks containing guidelines, standardized questions, and a ready-to-use web app for annotating unrestricted adversarial images.

Recommended citation: Dren Fazlija, Arkadij Orlov, Johanna Schrader, Monty-Maximilian Zühlke, Michael Rohs, Daniel Kudenko (2024). "How Real Is Real? A Human Evaluation Framework for Unrestricted Adversarial Examples" AAAI 2024 Spring Symposium on User-Aligned Assessment of Adaptive AI Systems. https://aair-lab.github.io/aia2024/papers/fazlija_aia24.pdf

Reporting on Real-World Datasets and Packages for Causal AI Research

Non-Archival

Published in AICPM 2023, 2023

Causal reasoning has garnered much attention in the AI research community, resulting in an influx of causality-based AI methods in recent years. We believe that this sudden rise of Causal AI has led to many publications that primarily evaluate their proposed algorithms in specifically designed experimental setups. Hence, comparisons between different causal methods, as well as existing state-of-the-art non-causal approaches, become increasingly more difficult. To make Causal AI more accessible and to facilitate comparisons to non-causal methods, we analyze the use of real-world datasets and existing causal inference tools within relevant publications. Furthermore, we support our hypothesis by outlining well-established tools for benchmarking different trustworthy aspects of AI models (interpretability, fairness, robustness, privacy, and safety) healthcare tools and how these systems are not prevalent in respective Causal AI publications.

Recommended citation: Dren Fazlija (2023). "Reporting on Real-World Datasets and Packages for Causal AI Research" In Artificial Intelligence, Causality and Personalised Medicine Symposium 2023.

A Review of the Role of Causality in Developing Trustworthy AI Systems

Currently under review at ACM CSUR

Published in arXiv, 2023

State-of-the-art AI models largely lack an understanding of the cause-effect relationship that governs human understanding of the real world. Consequently, these models do not generalize to unseen data, often produce unfair results, and are difficult to interpret. This has led to efforts to improve the trustworthiness aspects of AI models. Recently, causal modeling and inference methods have emerged as powerful tools. This review aims to provide the reader with an overview of causal methods that have been developed to improve the trustworthiness of AI models. We hope that our contribution will motivate future research on causality-based solutions for trustworthy AI.

Recommended citation: Niloy Ganguly, Dren Fazlija, Maryam Badar, Marco Fisichella, Sandipan Sikdar, Johanna Schrader, Jonas Wallat, Koustav Rudra, Manolis Koubarakis, Gourab K. Patro, Wadhah Zai El Amri, and Wolfgang Nejdl (2023). "A Review of the Role of Causality in Developing Trustworthy AI Systems" arXiv:2302.06975. https://arxiv.org/abs/2302.06975