Semantic Description of Explainable Machine Learning Workflows

MASTER Assignment

Type : Master M-CS

Period: Nov, 2020 - Jul, 2021

Student : Inoue Nakagawa, P. (Patricia, Student M-CS)

Date Final project: July 8, 2021

Thesis

Supervisors:

Abstract:

Machine learning algorithms have been extensively explored in many domains due to their success in learning and performing autonomous tasks. However, the best performing algorithms usually have high complexity, which makes it is difficult for users to understand how and why they achieved their results. Because of this, they are often considered black-boxes. Understanding the machine learning models is important not only to identify problems and make changes but also to increase trust in them, which can only be achieved by ensuring that the algorithms act as expected, not relying on bias or erroneous values in the data; and avoid ethical issues, not producing stereotypes, prejudiced or wrong conclusions. In this scenario, Explainable Machine Learning comprises methods and techniques that have a fundamental role in enabling users to better understand the machine learning functioning and results. Semantic Web Technologies provide semantically interpretable tools that allow reasoning on knowledge resources, for this reason, they have been applied to make machine learning explainable. In this context, the contribution of this work is the development of an ontology that represents explainable machine learning experiments, allowing data scientists and developers to have a holistic view and better understanding of the machine learning process and the explanation process. We developed the ontology reusing already existing domain-specific ontology (ML-SCHEMA) and grounding it in the Unified Foundational Ontology (UFO), aiming at interoperability. The proposed ontology is structured in three modules: (1) the general module, which represents the general machine learning process; (2) the specific module, which specifies the machine learning process for supervised classification; (3) the explanation module, which represents the explanation process. The ontology was evaluated using a case study in the scenario of the COVID-19 disease, where we trained a Support Vector Machine to predict mortality of patients infected with COVID-19 and applied existing explanation methods to generate explanations from the trained model. The case study was used to populate the ontology with instances, thereafter, we queried the populated ontology to ensure that the retrieved information corresponds to the expected outputs and that the ontology fulfills its intended purpose.