Abstract
Database-as-a-service offerings enable users to quickly create and deploy complex data processing pipelines. In practice, these pipelines often exhibit significant overlap of computation due to redundant execution of certain sub-queries. It is challenging for developers and database administrators to manually detect overlap across queries since they may be distributed across teams, organization roles, and geographic locations. Thus, we require automated cloud-scale tools for identifying equivalent queries to minimize computation overlap.
State-of-the-art algebraic approaches to automated verification of query equivalence suffer from two limitations. First, they are unable to model the semantics of widely-used SQL features, such as complex query predicates and three-valued logic. Second, they have a computationally intensive verification procedure. These limitations restrict their efficacy and efficiency in cloud-scale database-as-a-service offerings.
This paper makes the case for an alternate approach to determining query equivalence based on symbolic representation. The key idea is to effectively transform a wide range of SQL queries into first order logic formulae and then use satisfiability modulo theories to efficiently verify their equivalence. We have implemented this symbolic representation-based approach in EQUITAS. Our evaluation shows that EQUITAS proves the semantic equivalence of a larger set of query pairs compared to algebraic approaches and reduces the verification time by 27X. We also demonstrate that on a set of 17,461 real-world SQL queries, it automatically identifies redundant execution across 11% of the queries. Our symbolic-representation based technique is currently deployed on Alibaba's MaxCompute database-as-a-service platform.
- Alibaba MaxCompute. https://www.alibabacloud.com/product/maxcompute.Google Scholar
- Ant Financial Services Group. https://www.antfin.com/.Google Scholar
- Apache Calcite project. http://calcite.apache.org/.Google Scholar
- Apache Drill project. http://drill.apache.org/.Google Scholar
- Apache Flink project. http://flink.apache.org/.Google Scholar
- Apache Hive project. http://hive.apache.org/.Google Scholar
- Apache Kylin project. http://kylin.apache.org/.Google Scholar
- Apache Phoenix project. http://phoenix.apache.org/.Google Scholar
- Azure Data Lake. https://azure.microsoft.com/en-us/solutions/data-lake/.Google Scholar
- Cosette: An automated SQL solver. https://github.com/uwdb/Cosette.Google Scholar
- Google BigQuery. https://cloud.google.com/bigquery/.Google Scholar
- Z3prover: Z3 theorem prover. https://github.com/Z3Prover/z3.Google Scholar
- S. Abdul Khalek, B. Elkarablieh, Y. O. Laleye, and S. Khurshid. Query-aware test generation using a relational constraint solver. In ASE, 09 2008. Google ScholarDigital Library
- S. Abiteboul, R. Hull, and V. Vianu. Foundations of databases: the logical level. Addison-Wesley Longman Publishing Co., Inc., 1995. Google ScholarDigital Library
- J. Albert. Algebraic properties of bag data types. In VLDB, 1991. Google ScholarDigital Library
- J. Albert, Y. Ioannidis, and R. Ramakrishnan. Conjunctive query equivalence of keyed relational schemas. In PODS, 1997. Google ScholarDigital Library
- J. Albert, Y. Ioannidis, and R. Ramakrishnan. Equivalence of keyed relational schemas by conjunctive queries. In Journal of Computer and System Sciences, volume 58, 1999. Google ScholarDigital Library
- B.A.Trakhtenbrot. Impossibility of an algorithm for the decision problem in finite classes. In Journal of Symbolic Logic, 1950.Google Scholar
- A. Bouajjani, C. Drăgoi, C. Enea, A. Rezine, and M. Sighireanu. Invariant synthesis for programs manipulating lists with unbounded data. In CAV, 2010". Google ScholarDigital Library
- A. Bouajjani, C. Drăgoi, C. Enea, and M. Sighireanu. On inter-procedural analysis of programs with lists and data. In PLDI, 2011. Google ScholarDigital Library
- D. Calvanese, G. D. Giacomo, and M. Lenzerini. Conjunctive query containment and answering under description logic constraints. In TOCL, 2008. Google ScholarDigital Library
- A. K. Chandra and P. M. Merlin. Optimal implementation of conjunctive queries in relational data bases. In STOC, 1977. Google ScholarDigital Library
- S. Chaudhuri and M. Y. Vardi. Optimization of real conjunctive queries. In PODS, 1993. Google ScholarDigital Library
- C. Chekuri and A. Rajaraman. Conjunctive query containment revisited. In ICDT, 1997. Google ScholarDigital Library
- S. Chu, D. Li, C. Wang, A. Cheung, and D. Suciu. Demonstration of the Cosette automated sql prover. In SIGMOD, 2017. Google ScholarDigital Library
- S. Chu, B. Murphy, J. Roesch, A. Cheung, and D. Suciu. Axiomatic foundations and algorithms for deciding semantic equivalences of sql queries. PVLDB, 11(11):1482--1495, 2018. Google ScholarDigital Library
- S. Chu, K. Weitz, A. Cheung, and D. Suciu. HoTTSQL: proving query rewrites with univalent sql semantics. In PLDI, 2017. Google ScholarDigital Library
- S. Cohen, W. Nutt, and Y. Sagiv. Containment of aggregate queries. In Lecture Notes in Computer Science, 2002. Google ScholarDigital Library
- S. Cohen, W. Nutt, and A. Serebrenik. Rewriting aggregate queries using views. In PODS, 1999. Google ScholarDigital Library
- M. Davis, G. Logemann, and D. W. Loveland. A machine program for theorem-proving. Commun. ACM, 1962. Google ScholarDigital Library
- G. De Giacomo, D. Calvanese, and M. Lenzerini. On the decidability of query containment under constraints. In PODS, 12 1999. Google ScholarDigital Library
- L. M. de Moura and N. Bjørner. Z3: an efficient SMT solver. In TACAS, 2008. Google ScholarDigital Library
- A. Deutsch, L. Popa, and V. Tannen. Physical data independence, constraints, and optimization with universal plans. In VLDB, 03 2002. Google ScholarDigital Library
- B. Dutertre. Yices 2.2. In CAV, 2014. Google ScholarDigital Library
- S. Grossman, S. Cohen, S. Itzhaky, N. Rinetzky, and M. Sagiv. Verifying equivalence of spark programs. In CAV, 2017.Google Scholar
- I. Horrocks, U. Sattler, S. Tessaris, and S. Tobies. How to decide query containment under constraints using a description logic. In LPAR, 2000. Google ScholarDigital Library
- Y. E. Ioannidis and R. Ramakrishnan. Containment of conjunctive queries: Beyond relations as sets. In TODS, 1995. Google ScholarDigital Library
- S. Itzhaky, T. Kotek, N. Rinetzky, M. Sagiv4, O. Tamir5, H. Veith, and F. Zuleger. On the automated verification of web applications with embedded sql. In ICDT, 2017.Google Scholar
- T. S. Jayram, P. G. Kolaitis, and E. Vee. The containment problem for real conjunctive queries with inequalities. In PODS, 2006. Google ScholarDigital Library
- A. Jindal, K. Karanasos, S. Rao, and H. Patel. Selecting subexpressions to materialize at datacenter scale. PVLDB, 11(7):800--812, 2018. Google ScholarDigital Library
- M. Kawaguchi, P. Rondon, and R. Jhala. Type-based data structure verification. In PLDI, 2009. Google ScholarDigital Library
- P. G. Kolaitis and M. Y. Vardi. Conjunctive-query containment and constraint satisfaction. In PODS, 1998. Google ScholarDigital Library
- Y. V. Matiyasevich. Hilbert's tenth problem and paradigms of computation. In CiE, 2005. Google ScholarDigital Library
- M. Negri, G. Pelagatti, and L. Sbattella. Formal semantics of sql queries. In ACM Trans. Database Syst., 1991. Google ScholarDigital Library
- G. Nelson and D. C. Oppen. Simplification by cooperating decision procedures. ACM Trans. Program. Lang. Syst., 1979. Google ScholarDigital Library
- L. Popa, A. Deutsch, A. Sahuguet, and V. Tannen. A chase too far? In SIGMOD, 2002. Google ScholarDigital Library
- C. Rubinson. Nulls, three-valued logic, and ambiguity in sql: Critiquing date's critique. SIGMOD, 2007. Google ScholarDigital Library
- G. Rull, P. Bernstein, I. Santos, Y. Katsis, S. Melnik, and E. Teniente. Query containment in entity sql. In SIGMOD, 2013. Google ScholarDigital Library
- Y. Sagiv and M. Yannakakis. Equivalences among relational expressions with the union and difference operators. In J. ACM, 1980. Google ScholarDigital Library
- M. Schlaipfer, K. Rajan, A. Lal, and M. Samak. Optimizing big-data queries using program synthesis. In SOSP, 2017. Google ScholarDigital Library
- C. Shumo, W. Chenglong, W. Konstantin, and C. Alvin. Cosette: An automated SQL prover. In CIDR, 2017.Google Scholar
- V. Tannen and L. Popa. An equational chase for path-conjunctive queries, constraints, and views. In ICDT, 1999. Google ScholarDigital Library
- A. Tarski. A decision method for elementary algebra and geometry. In Quantifier Elimination and Cylindrical Algebraic Decomposition, 1951.Google Scholar
- M. Veanes, P. Grigorenko, P. de Halleux, and N. Tillmann. Symbolic query exploration. In FormaliSE, 2009. Google ScholarDigital Library
- M. Veanes, N. Tillmann, and J. de Halleux. Qex: Symbolic sql query explorer. In LPAR, 2010. Google ScholarDigital Library
- Y. Wang, I. Dillig, S. K. Lahiri, and W. Cook. Verifying equivalence of database-driven applications. In PACMPL, 2017. Google ScholarDigital Library
Index Terms
- Automated verification of query equivalence using satisfiability modulo theories
Recommendations
Verification modulo theories
AbstractIn this paper, we consider the problem of model checking fair transition systems expressed symbolically in the framework of Satisfiability Modulo Theories. This problem, referred to as Verification Modulo Theories, is tackled by combining two key ...
Pulling Conjunctive Query Equivalence out of the Bag
CIKM '14: Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge ManagementWe present LECQTER, a tool for generating a 'perfect example' database, called exemplar, for a given conjunctive query. Indeed, exemplars separate the given query from any non-equivalent query. Therefore, LECQTER reduces the query equivalence problem to ...
Query evaluation using overlapping views: completeness and efficiency
SIGMOD '06: Proceedings of the 2006 ACM SIGMOD international conference on Management of dataWe study the problem of finding efficient equivalent view-based rewritings of relational queries, focusing on query optimization using materialized views under the assumption that base relations cannot contain duplicate tuples. A lot of work in the ...
Comments