research-article

Data Quality and Explainable AI

Authors:
Leopoldo Bertossi

Univ. Adolfo Ibáñez, Santiago, Chile and RelationalAI Inc., Toronto, Canada

Univ. Adolfo Ibáñez, Santiago, Chile and RelationalAI Inc., Toronto, Canada
View Profile

,
Floris Geerts

University of Antwerp, Antwerp, Belgium

University of Antwerp, Antwerp, Belgium
View Profile

Authors Info & Claims

Journal of Data and Information Quality Volume 12 Issue 2Article No.: 11pp 1–9https://doi.org/10.1145/3386687

Published:03 May 2020Publication History

Journal of Data and Information Quality

Abstract

In this work, we provide some insights and develop some ideas, with few technical details, about the role of explanations in Data Quality in the context of data-based machine learning models (ML). In this direction, there are, as expected, roles for causality, and explainable artificial intelligence. The latter area not only sheds light on the models, but also on the data that support model construction. There is also room for defining, identifying, and explaining errors in data, in particular, in ML, and also for suggesting repair actions. More generally, explanations can be used as a basis for defining dirty data in the context of ML, and measuring or quantifying them. We think dirtiness as relative to the ML task at hand, e.g., classification.

References

Z. Bahmani, L. Bertossi, and N. Nikolaos Vasiloglou. 2017. ERBlox: Combining matching dependencies with machine learning for entity resolution. International Journal of Approximate Reasoning 83 (2017), 118--141.Google ScholarDigital Library
C. Batini and M. Scannapieco. 2016. Data Quality: Concepts, Methodologies and Techniques. Second edition, Springer.Google Scholar
L. Bertossi and M. Milani. 2018. Ontological multidimensional data models and contextual data quality. Journal of Data and Information Quality 9, 3 (2018), 14.1--14.36.Google ScholarDigital Library
L. Bertossi, F. Rizzolo, and J. Lei. 2011. Data quality is context dependent. In Proc. of the Workshop on Enabling Real-Time Business Intelligence (BIRTE) Collocated with the International Conference on Very Large Data Bases (VLDB). Springer LNBIP 84, 52--67.Google Scholar
L. Bertossi and B. Salimi. 2017. From causes for database queries to repairs and model-based diagnosis and back. Theory of Computing Systems 61, 1 (2017), 191--232.Google ScholarDigital Library
L. Bertossi and B. Salimi. 2017. Causes for query answers from databases: Datalog abduction, view-updates, and integrity constraints. International Journal of Approximate Reasoning 90 (2017), 226--252.Google ScholarCross Ref
L. Bertossi, S. Kolahi, and L. Lakshmanan. 2013. Data cleaning and query answering with matching dependencies and matching functions. Theory of Computing Systems 52, 3 (2013), 441--482.Google ScholarDigital Library
L. Bertossi, J. Li, M. Schleich, D. Suciu, and Z. Vagena. [n.d.]. Experimenting with score-based explanations for classification outcomes. Forthcoming.Google Scholar
D. Calvanese, M. Ortiz, M. Simkus, and G. Stefanoni. 2013. Reasoning about explanations for negative query answers in DL-lite. Journal of Artificial Intelligence Research 48 (2013), 635--669.Google ScholarDigital Library
D. Calvanese, D. Lanti, A. Ozaki, R. Peñaloza, and G. Xiao. 2019. Enriching ontology-based data access with provenance. In Proc. IJCAI.Google Scholar
A. Chalamalla, I. F. Ilyas, M. Ouzzani, and P. Papotti. 2017. Descriptive and prescriptive data cleaning. In Proc. SIGMOD.Google Scholar
C. Chen, K. Lin, C. Rudin, Y. Shaposhnik, S. Wang, and T. Wang. [n.d.]. An interpretable model with globally consistent explanations for credit risk. In Proc. NIPS 2018 Workshop on Challenges and Opportunities for AI in Financial Services: the Impact of Fairness, Explainability, Accuracy, and Privacy.Google Scholar
H. Chockler and J. Y. Halpern. 2004. Responsibility and blame: A structural-model approach. Journal of Artificial Intelligence Research 22 (2004), 93--115.Google ScholarDigital Library
F. Croce and M. Lenzerini. 2018. A framework for explaining query answers in DL-lite. In Proc. EKAW.Google Scholar
A. Datta, S. Sen, and Y. Zick. 2016. Algorithmic transparency via quantitative input influence: Theory and experiments with learning systems. In IEEE Symposium on Security and Privacy.Google Scholar
U. Draisbach, P. Christen, and F. Naumann. 2019. Transforming pairwise duplicates to entity clusters for high-quality duplicate detection. Journal of Data and Information Quality 12, 1 (2019), 3:1--3:30.Google Scholar
J. Du, K. Wang, and Y. Shen. 2014. A tractable approach to ABox abduction over description logic ontologies. In Proc. AAAI.Google Scholar
P. Dubey and L. S. Shapley. 1979. Mathematical properties of the Banzhaf power index. Mathematics of Operations Research 4, 2 (1979), 99--131.Google ScholarDigital Library
W. Fan and F. Geerts. 2012. Foundations of Data Quality Management. Morgan 8 Claypool.Google Scholar
W. Fan, H. Gao, X. Ji, J. Li, and S. Ma. 2009. Dynamic constraints for record matching. The International Journal on Very Large Data Bases (VLDBJ) 20, 4 (2009), 495--520.Google ScholarDigital Library
J. Halpern and J. Pearl. 2005. Causes and explanations: A structural-model approach: Part 1. British Journal of Philosophy of Science 56 (2005), 843--887.Google ScholarCross Ref
A. Heidari, J. McGrath, I. F. Ilyas, and Th. Rekatsinas. 2019. HoloDetect: Few-shot learning for error detection. In Proc. Sigmod.Google ScholarDigital Library
L. Jiang, A. Borgida, and J. Mylopoulos. 2008. Towards a compositional semantic account of data quality atrributes. In Proc. International Conference on Conceptual Modeling (ER). 55--68.Google Scholar
M. A. Khamis, H. Q. Ngo, X. Nguyen, D. Olteanu, and M. Schleich. 2018. AC/DC: In-database learning thunderstruck. In Proc. DEEM.Google Scholar
P. Kouki, J. Pujara, C. Marcum, L. Koehly, and L. Getoor. 2019. Collective entity resolution in multi-relational familial networks. Knowledge and Information Systems 61, 3 (2019), 1547--1581.Google ScholarDigital Library
B. Kimelfeld and C. Ré. 2017. A relational framework for classifier engineering. In Proc. PODS.Google Scholar
J. Kleinberg, J. Ludwig, S. Mullainathan, and A. Rambachan. 2018. Algorithmic fairness. AEA Papers and Proceedings 108 (2018), 22--27.Google ScholarCross Ref
J. Krishnan, M. J. Franklin, K. Goldberg, J. Wang, and E. Wu. 2017. BoostClean: Automated error detection and repair for machine learning. arXiv:1711.01299 (2017).Google Scholar
E. Livshits, L. Bertossi, B. Kimelfeld, and M. Sebag. 2020. The Shapley value of tuples in query answering. In Proc. ICDT. arXiv:1904.08679.Google Scholar
S. Lundberg and S.-I. Lee. 2017. A unified approach to interpreting model predictions. In Proc. NIPS.Google Scholar
A. Meliou, W. Gatterbauer, K. F. Moore, and D. Suciu. 2010. The complexity of causality and responsibility for query answers and non-answers. In Proc. VLDB.Google Scholar
J. Pearl. 2009. Causality: Models, Reasoning and Inference. Cambridge Univ. Press, 2nd ed.Google ScholarDigital Library
J. Rammelaere and F. Geerts. 2018. Explaining repaired data with CFDs. In Proc. VLDB.Google Scholar
A. Roth (ed.). 1988. The Shapley Value: Essays in Honor of Lloyd S. Shapley. Cambridge University Press.Google Scholar
C. Rudin. 2019. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence 1, 5 (2019), 206--215. arXiv:1811.10154Google ScholarCross Ref
P. Saleiro, B. Kuester, A. Stevens, A. Anisfeld, L. Hinkson, J. London, and R. Ghani. 2018. Aequitas: A bias and fairness audit toolkit. CoRR abs/1811.05577 (2018).Google Scholar
B. Salimi, L. Bertossi, D. Suciu, and G. Van den Broeck. 2016. Quantifying causal effects on query answering in databases. In Proc. TaPP.Google Scholar
B. Salimi, J. Gehrke, and D. Dan Suciu. 2018. Bias in OLAP queries: Detection, explanation, and removal. In Proc. SIGMOD. 1021--1035.Google Scholar
B. Salimi, B. Howe, and D. Suciu. 2019. Data management for causal algorithmic fairness. IEEE Data Engineering Bulletin 42, 3 (2019), 24--35.Google Scholar
D. Suciu, D. Olteanu, C. Re, and C. Koch. 2011. Probabilistic Databases. Synthesis Lectures on Data Management, Morgan 8 Claypool Publishers.Google Scholar

Index Terms

Data Quality and Explainable AI
1. Information systems

Recommendations

Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
Highlights
- We review concepts related to the explainability of AI methods (XAI).
- We ...
Abstract
In the last few years, Artificial Intelligence (AI) has achieved a notable momentum that, if harnessed appropriately, may deliver the best of expectations over many application sectors across the field. For this to occur shortly in ...
Read More
Counterfactual Explainable Recommendation
CIKM '21: Proceedings of the 30th ACM International Conference on Information & Knowledge Management

By providing explanations for users and system designers to facilitate better understanding and decision making, explainable recommendation has been an important research problem. In this paper, we propose Counterfactual Explainable Recommendation (...
Read More
The Use and Misuse of Counterfactuals in Ethical Machine Learning
FAccT '21: Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency

The use of counterfactuals for considerations of algorithmic fairness and explainability is gaining prominence within the machine learning community and industry. This paper argues for more caution with the use of counterfactuals when the facts to be ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
Journal of Data and Information Quality Volume 12, Issue 2
Special Issue on Quality Assessment of Knowledge Graphs and On the Horizon
June 2020
105 pages
ISSN:1936-1955
EISSN:1936-1963
DOI:10.1145/3397186
Editor:
Tiziana Catarci
Sapienza University of Rome, Rome, Italy
Issue’s Table of Contents
Copyright © 2020 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 3 May 2020
- Received: 1 March 2020
- Accepted: 1 March 2020
Published in jdiq Volume 12, Issue 2

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Machine learning
bias
causes
fairness
Qualifiers
- research-article
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 24
  Total Citations
  View Citations
- 1,519
  Total Downloads
- Downloads (Last 12 months)263
- Downloads (Last 6 weeks)36
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

Data Quality and Explainable AI

Journal of Data and Information Quality

Abstract

References

Cited By

Index Terms

Recommendations

Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI

Counterfactual Explainable Recommendation

The Use and Misuse of Counterfactuals in Ethical Machine Learning