Abstract
In this work, we provide some insights and develop some ideas, with few technical details, about the role of explanations in Data Quality in the context of data-based machine learning models (ML). In this direction, there are, as expected, roles for causality, and explainable artificial intelligence. The latter area not only sheds light on the models, but also on the data that support model construction. There is also room for defining, identifying, and explaining errors in data, in particular, in ML, and also for suggesting repair actions. More generally, explanations can be used as a basis for defining dirty data in the context of ML, and measuring or quantifying them. We think dirtiness as relative to the ML task at hand, e.g., classification.
- Z. Bahmani, L. Bertossi, and N. Nikolaos Vasiloglou. 2017. ERBlox: Combining matching dependencies with machine learning for entity resolution. International Journal of Approximate Reasoning 83 (2017), 118--141.Google ScholarDigital Library
- C. Batini and M. Scannapieco. 2016. Data Quality: Concepts, Methodologies and Techniques. Second edition, Springer.Google Scholar
- L. Bertossi and M. Milani. 2018. Ontological multidimensional data models and contextual data quality. Journal of Data and Information Quality 9, 3 (2018), 14.1--14.36.Google ScholarDigital Library
- L. Bertossi, F. Rizzolo, and J. Lei. 2011. Data quality is context dependent. In Proc. of the Workshop on Enabling Real-Time Business Intelligence (BIRTE) Collocated with the International Conference on Very Large Data Bases (VLDB). Springer LNBIP 84, 52--67.Google Scholar
- L. Bertossi and B. Salimi. 2017. From causes for database queries to repairs and model-based diagnosis and back. Theory of Computing Systems 61, 1 (2017), 191--232.Google ScholarDigital Library
- L. Bertossi and B. Salimi. 2017. Causes for query answers from databases: Datalog abduction, view-updates, and integrity constraints. International Journal of Approximate Reasoning 90 (2017), 226--252.Google ScholarCross Ref
- L. Bertossi, S. Kolahi, and L. Lakshmanan. 2013. Data cleaning and query answering with matching dependencies and matching functions. Theory of Computing Systems 52, 3 (2013), 441--482.Google ScholarDigital Library
- L. Bertossi, J. Li, M. Schleich, D. Suciu, and Z. Vagena. [n.d.]. Experimenting with score-based explanations for classification outcomes. Forthcoming.Google Scholar
- D. Calvanese, M. Ortiz, M. Simkus, and G. Stefanoni. 2013. Reasoning about explanations for negative query answers in DL-lite. Journal of Artificial Intelligence Research 48 (2013), 635--669.Google ScholarDigital Library
- D. Calvanese, D. Lanti, A. Ozaki, R. Peñaloza, and G. Xiao. 2019. Enriching ontology-based data access with provenance. In Proc. IJCAI.Google Scholar
- A. Chalamalla, I. F. Ilyas, M. Ouzzani, and P. Papotti. 2017. Descriptive and prescriptive data cleaning. In Proc. SIGMOD.Google Scholar
- C. Chen, K. Lin, C. Rudin, Y. Shaposhnik, S. Wang, and T. Wang. [n.d.]. An interpretable model with globally consistent explanations for credit risk. In Proc. NIPS 2018 Workshop on Challenges and Opportunities for AI in Financial Services: the Impact of Fairness, Explainability, Accuracy, and Privacy.Google Scholar
- H. Chockler and J. Y. Halpern. 2004. Responsibility and blame: A structural-model approach. Journal of Artificial Intelligence Research 22 (2004), 93--115.Google ScholarDigital Library
- F. Croce and M. Lenzerini. 2018. A framework for explaining query answers in DL-lite. In Proc. EKAW.Google Scholar
- A. Datta, S. Sen, and Y. Zick. 2016. Algorithmic transparency via quantitative input influence: Theory and experiments with learning systems. In IEEE Symposium on Security and Privacy.Google Scholar
- U. Draisbach, P. Christen, and F. Naumann. 2019. Transforming pairwise duplicates to entity clusters for high-quality duplicate detection. Journal of Data and Information Quality 12, 1 (2019), 3:1--3:30.Google Scholar
- J. Du, K. Wang, and Y. Shen. 2014. A tractable approach to ABox abduction over description logic ontologies. In Proc. AAAI.Google Scholar
- P. Dubey and L. S. Shapley. 1979. Mathematical properties of the Banzhaf power index. Mathematics of Operations Research 4, 2 (1979), 99--131.Google ScholarDigital Library
- W. Fan and F. Geerts. 2012. Foundations of Data Quality Management. Morgan 8 Claypool.Google Scholar
- W. Fan, H. Gao, X. Ji, J. Li, and S. Ma. 2009. Dynamic constraints for record matching. The International Journal on Very Large Data Bases (VLDBJ) 20, 4 (2009), 495--520.Google ScholarDigital Library
- J. Halpern and J. Pearl. 2005. Causes and explanations: A structural-model approach: Part 1. British Journal of Philosophy of Science 56 (2005), 843--887.Google ScholarCross Ref
- A. Heidari, J. McGrath, I. F. Ilyas, and Th. Rekatsinas. 2019. HoloDetect: Few-shot learning for error detection. In Proc. Sigmod.Google ScholarDigital Library
- L. Jiang, A. Borgida, and J. Mylopoulos. 2008. Towards a compositional semantic account of data quality atrributes. In Proc. International Conference on Conceptual Modeling (ER). 55--68.Google Scholar
- M. A. Khamis, H. Q. Ngo, X. Nguyen, D. Olteanu, and M. Schleich. 2018. AC/DC: In-database learning thunderstruck. In Proc. DEEM.Google Scholar
- P. Kouki, J. Pujara, C. Marcum, L. Koehly, and L. Getoor. 2019. Collective entity resolution in multi-relational familial networks. Knowledge and Information Systems 61, 3 (2019), 1547--1581.Google ScholarDigital Library
- B. Kimelfeld and C. Ré. 2017. A relational framework for classifier engineering. In Proc. PODS.Google Scholar
- J. Kleinberg, J. Ludwig, S. Mullainathan, and A. Rambachan. 2018. Algorithmic fairness. AEA Papers and Proceedings 108 (2018), 22--27.Google ScholarCross Ref
- J. Krishnan, M. J. Franklin, K. Goldberg, J. Wang, and E. Wu. 2017. BoostClean: Automated error detection and repair for machine learning. arXiv:1711.01299 (2017).Google Scholar
- E. Livshits, L. Bertossi, B. Kimelfeld, and M. Sebag. 2020. The Shapley value of tuples in query answering. In Proc. ICDT. arXiv:1904.08679.Google Scholar
- S. Lundberg and S.-I. Lee. 2017. A unified approach to interpreting model predictions. In Proc. NIPS.Google Scholar
- A. Meliou, W. Gatterbauer, K. F. Moore, and D. Suciu. 2010. The complexity of causality and responsibility for query answers and non-answers. In Proc. VLDB.Google Scholar
- J. Pearl. 2009. Causality: Models, Reasoning and Inference. Cambridge Univ. Press, 2nd ed.Google ScholarDigital Library
- J. Rammelaere and F. Geerts. 2018. Explaining repaired data with CFDs. In Proc. VLDB.Google Scholar
- A. Roth (ed.). 1988. The Shapley Value: Essays in Honor of Lloyd S. Shapley. Cambridge University Press.Google Scholar
- C. Rudin. 2019. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence 1, 5 (2019), 206--215. arXiv:1811.10154Google ScholarCross Ref
- P. Saleiro, B. Kuester, A. Stevens, A. Anisfeld, L. Hinkson, J. London, and R. Ghani. 2018. Aequitas: A bias and fairness audit toolkit. CoRR abs/1811.05577 (2018).Google Scholar
- B. Salimi, L. Bertossi, D. Suciu, and G. Van den Broeck. 2016. Quantifying causal effects on query answering in databases. In Proc. TaPP.Google Scholar
- B. Salimi, J. Gehrke, and D. Dan Suciu. 2018. Bias in OLAP queries: Detection, explanation, and removal. In Proc. SIGMOD. 1021--1035.Google Scholar
- B. Salimi, B. Howe, and D. Suciu. 2019. Data management for causal algorithmic fairness. IEEE Data Engineering Bulletin 42, 3 (2019), 24--35.Google Scholar
- D. Suciu, D. Olteanu, C. Re, and C. Koch. 2011. Probabilistic Databases. Synthesis Lectures on Data Management, Morgan 8 Claypool Publishers.Google Scholar
Index Terms
- Data Quality and Explainable AI
Recommendations
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
Highlights- We review concepts related to the explainability of AI methods (XAI).
- We ...
AbstractIn the last few years, Artificial Intelligence (AI) has achieved a notable momentum that, if harnessed appropriately, may deliver the best of expectations over many application sectors across the field. For this to occur shortly in ...
Counterfactual Explainable Recommendation
CIKM '21: Proceedings of the 30th ACM International Conference on Information & Knowledge ManagementBy providing explanations for users and system designers to facilitate better understanding and decision making, explainable recommendation has been an important research problem. In this paper, we propose Counterfactual Explainable Recommendation (...
The Use and Misuse of Counterfactuals in Ethical Machine Learning
FAccT '21: Proceedings of the 2021 ACM Conference on Fairness, Accountability, and TransparencyThe use of counterfactuals for considerations of algorithmic fairness and explainability is gaining prominence within the machine learning community and industry. This paper argues for more caution with the use of counterfactuals when the facts to be ...
Comments