skip to main content
research-article

Automated verification of query equivalence using satisfiability modulo theories

Published:01 July 2019Publication History
Skip Abstract Section

Abstract

Database-as-a-service offerings enable users to quickly create and deploy complex data processing pipelines. In practice, these pipelines often exhibit significant overlap of computation due to redundant execution of certain sub-queries. It is challenging for developers and database administrators to manually detect overlap across queries since they may be distributed across teams, organization roles, and geographic locations. Thus, we require automated cloud-scale tools for identifying equivalent queries to minimize computation overlap.

State-of-the-art algebraic approaches to automated verification of query equivalence suffer from two limitations. First, they are unable to model the semantics of widely-used SQL features, such as complex query predicates and three-valued logic. Second, they have a computationally intensive verification procedure. These limitations restrict their efficacy and efficiency in cloud-scale database-as-a-service offerings.

This paper makes the case for an alternate approach to determining query equivalence based on symbolic representation. The key idea is to effectively transform a wide range of SQL queries into first order logic formulae and then use satisfiability modulo theories to efficiently verify their equivalence. We have implemented this symbolic representation-based approach in EQUITAS. Our evaluation shows that EQUITAS proves the semantic equivalence of a larger set of query pairs compared to algebraic approaches and reduces the verification time by 27X. We also demonstrate that on a set of 17,461 real-world SQL queries, it automatically identifies redundant execution across 11% of the queries. Our symbolic-representation based technique is currently deployed on Alibaba's MaxCompute database-as-a-service platform.

References

  1. Alibaba MaxCompute. https://www.alibabacloud.com/product/maxcompute.Google ScholarGoogle Scholar
  2. Ant Financial Services Group. https://www.antfin.com/.Google ScholarGoogle Scholar
  3. Apache Calcite project. http://calcite.apache.org/.Google ScholarGoogle Scholar
  4. Apache Drill project. http://drill.apache.org/.Google ScholarGoogle Scholar
  5. Apache Flink project. http://flink.apache.org/.Google ScholarGoogle Scholar
  6. Apache Hive project. http://hive.apache.org/.Google ScholarGoogle Scholar
  7. Apache Kylin project. http://kylin.apache.org/.Google ScholarGoogle Scholar
  8. Apache Phoenix project. http://phoenix.apache.org/.Google ScholarGoogle Scholar
  9. Azure Data Lake. https://azure.microsoft.com/en-us/solutions/data-lake/.Google ScholarGoogle Scholar
  10. Cosette: An automated SQL solver. https://github.com/uwdb/Cosette.Google ScholarGoogle Scholar
  11. Google BigQuery. https://cloud.google.com/bigquery/.Google ScholarGoogle Scholar
  12. Z3prover: Z3 theorem prover. https://github.com/Z3Prover/z3.Google ScholarGoogle Scholar
  13. S. Abdul Khalek, B. Elkarablieh, Y. O. Laleye, and S. Khurshid. Query-aware test generation using a relational constraint solver. In ASE, 09 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. S. Abiteboul, R. Hull, and V. Vianu. Foundations of databases: the logical level. Addison-Wesley Longman Publishing Co., Inc., 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. J. Albert. Algebraic properties of bag data types. In VLDB, 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. J. Albert, Y. Ioannidis, and R. Ramakrishnan. Conjunctive query equivalence of keyed relational schemas. In PODS, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. J. Albert, Y. Ioannidis, and R. Ramakrishnan. Equivalence of keyed relational schemas by conjunctive queries. In Journal of Computer and System Sciences, volume 58, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. B.A.Trakhtenbrot. Impossibility of an algorithm for the decision problem in finite classes. In Journal of Symbolic Logic, 1950.Google ScholarGoogle Scholar
  19. A. Bouajjani, C. Drăgoi, C. Enea, A. Rezine, and M. Sighireanu. Invariant synthesis for programs manipulating lists with unbounded data. In CAV, 2010". Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. A. Bouajjani, C. Drăgoi, C. Enea, and M. Sighireanu. On inter-procedural analysis of programs with lists and data. In PLDI, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. D. Calvanese, G. D. Giacomo, and M. Lenzerini. Conjunctive query containment and answering under description logic constraints. In TOCL, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. A. K. Chandra and P. M. Merlin. Optimal implementation of conjunctive queries in relational data bases. In STOC, 1977. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. S. Chaudhuri and M. Y. Vardi. Optimization of real conjunctive queries. In PODS, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. C. Chekuri and A. Rajaraman. Conjunctive query containment revisited. In ICDT, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. S. Chu, D. Li, C. Wang, A. Cheung, and D. Suciu. Demonstration of the Cosette automated sql prover. In SIGMOD, 2017. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. S. Chu, B. Murphy, J. Roesch, A. Cheung, and D. Suciu. Axiomatic foundations and algorithms for deciding semantic equivalences of sql queries. PVLDB, 11(11):1482--1495, 2018. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. S. Chu, K. Weitz, A. Cheung, and D. Suciu. HoTTSQL: proving query rewrites with univalent sql semantics. In PLDI, 2017. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. S. Cohen, W. Nutt, and Y. Sagiv. Containment of aggregate queries. In Lecture Notes in Computer Science, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. S. Cohen, W. Nutt, and A. Serebrenik. Rewriting aggregate queries using views. In PODS, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. M. Davis, G. Logemann, and D. W. Loveland. A machine program for theorem-proving. Commun. ACM, 1962. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. G. De Giacomo, D. Calvanese, and M. Lenzerini. On the decidability of query containment under constraints. In PODS, 12 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. L. M. de Moura and N. Bjørner. Z3: an efficient SMT solver. In TACAS, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. A. Deutsch, L. Popa, and V. Tannen. Physical data independence, constraints, and optimization with universal plans. In VLDB, 03 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. B. Dutertre. Yices 2.2. In CAV, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. S. Grossman, S. Cohen, S. Itzhaky, N. Rinetzky, and M. Sagiv. Verifying equivalence of spark programs. In CAV, 2017.Google ScholarGoogle Scholar
  36. I. Horrocks, U. Sattler, S. Tessaris, and S. Tobies. How to decide query containment under constraints using a description logic. In LPAR, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Y. E. Ioannidis and R. Ramakrishnan. Containment of conjunctive queries: Beyond relations as sets. In TODS, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. S. Itzhaky, T. Kotek, N. Rinetzky, M. Sagiv4, O. Tamir5, H. Veith, and F. Zuleger. On the automated verification of web applications with embedded sql. In ICDT, 2017.Google ScholarGoogle Scholar
  39. T. S. Jayram, P. G. Kolaitis, and E. Vee. The containment problem for real conjunctive queries with inequalities. In PODS, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. A. Jindal, K. Karanasos, S. Rao, and H. Patel. Selecting subexpressions to materialize at datacenter scale. PVLDB, 11(7):800--812, 2018. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. M. Kawaguchi, P. Rondon, and R. Jhala. Type-based data structure verification. In PLDI, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. P. G. Kolaitis and M. Y. Vardi. Conjunctive-query containment and constraint satisfaction. In PODS, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Y. V. Matiyasevich. Hilbert's tenth problem and paradigms of computation. In CiE, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. M. Negri, G. Pelagatti, and L. Sbattella. Formal semantics of sql queries. In ACM Trans. Database Syst., 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. G. Nelson and D. C. Oppen. Simplification by cooperating decision procedures. ACM Trans. Program. Lang. Syst., 1979. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. L. Popa, A. Deutsch, A. Sahuguet, and V. Tannen. A chase too far? In SIGMOD, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. C. Rubinson. Nulls, three-valued logic, and ambiguity in sql: Critiquing date's critique. SIGMOD, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. G. Rull, P. Bernstein, I. Santos, Y. Katsis, S. Melnik, and E. Teniente. Query containment in entity sql. In SIGMOD, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Y. Sagiv and M. Yannakakis. Equivalences among relational expressions with the union and difference operators. In J. ACM, 1980. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. M. Schlaipfer, K. Rajan, A. Lal, and M. Samak. Optimizing big-data queries using program synthesis. In SOSP, 2017. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. C. Shumo, W. Chenglong, W. Konstantin, and C. Alvin. Cosette: An automated SQL prover. In CIDR, 2017.Google ScholarGoogle Scholar
  52. V. Tannen and L. Popa. An equational chase for path-conjunctive queries, constraints, and views. In ICDT, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. A. Tarski. A decision method for elementary algebra and geometry. In Quantifier Elimination and Cylindrical Algebraic Decomposition, 1951.Google ScholarGoogle Scholar
  54. M. Veanes, P. Grigorenko, P. de Halleux, and N. Tillmann. Symbolic query exploration. In FormaliSE, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. M. Veanes, N. Tillmann, and J. de Halleux. Qex: Symbolic sql query explorer. In LPAR, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. Y. Wang, I. Dillig, S. K. Lahiri, and W. Cook. Verifying equivalence of database-driven applications. In PACMPL, 2017. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Automated verification of query equivalence using satisfiability modulo theories
      Index terms have been assigned to the content through auto-classification.

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image Proceedings of the VLDB Endowment
        Proceedings of the VLDB Endowment  Volume 12, Issue 11
        July 2019
        543 pages

        Publisher

        VLDB Endowment

        Publication History

        • Published: 1 July 2019
        Published in pvldb Volume 12, Issue 11

        Qualifiers

        • research-article

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader