Data Explainability Through Linguistic Expression of Extracted Knowledge
Marie-Jeanne Lesot
-
CIS
IEEE Members: Free
Non-members: FreeLength: 00:59:27
Marie-Jeanne Lesot, Lab of Paris 6 (LIP6), Paris, France.
Abstract: The pervasive use of data science techniques extracts regularities from available data for different tasks, such as prediction, characterisation or structuring. A current challenge is to improve the legibility of the obtained results, so as to allow a data expert to understand better the content of the data. One way to address this challenge consists in presenting them in natural language, offering linguistic expressions which may be easier to interpret for the user. The choice of such a result formulation then has an impact on the machine learning techniques to be applied to the data.
The talk will illustrate these questions for numerical data as well as for time series, respectively discussing the extraction of gradual itemsets, that linguistically express knowledge about feature covariations, and the extraction of periodicity-related linguistic summaries, using the specific quantifier �regularly�. In both cases, as well as for enriched contextual variants, the question is to define precisely the associated semantics and to design efficient extraction algorithms. The talk will also discuss the issue of measuring the relevance of the linguistic terms used to express the summaries, both with respect to the data structure, in case of linguistic variables, and with respect to the cognitive interpretation, in case of approximate numerical expressions.
Abstract: The pervasive use of data science techniques extracts regularities from available data for different tasks, such as prediction, characterisation or structuring. A current challenge is to improve the legibility of the obtained results, so as to allow a data expert to understand better the content of the data. One way to address this challenge consists in presenting them in natural language, offering linguistic expressions which may be easier to interpret for the user. The choice of such a result formulation then has an impact on the machine learning techniques to be applied to the data.
The talk will illustrate these questions for numerical data as well as for time series, respectively discussing the extraction of gradual itemsets, that linguistically express knowledge about feature covariations, and the extraction of periodicity-related linguistic summaries, using the specific quantifier �regularly�. In both cases, as well as for enriched contextual variants, the question is to define precisely the associated semantics and to design efficient extraction algorithms. The talk will also discuss the issue of measuring the relevance of the linguistic terms used to express the summaries, both with respect to the data structure, in case of linguistic variables, and with respect to the cognitive interpretation, in case of approximate numerical expressions.