skip to main content
article
Free Access

An optimal algorithm for approximate nearest neighbor searching fixed dimensions

Authors Info & Claims
Published:01 November 1998Publication History
Skip Abstract Section

Abstract

Consider a set of S of n data points in real d-dimensional space, Rd, where distances are measured using any Minkowski metric. In nearest neighbor searching, we preprocess S into a data structure, so that given any query point q∈ Rd, is the closest point of S to q can be reported quickly. Given any positive real ϵ, data point p is a (1 +ϵ)-approximate nearest neighbor of q if its distance from q is within a factor of (1 + ϵ) of the distance to the true nearest neighbor. We show that it is possible to preprocess a set of n points in Rd in O(dn log n) time and O(dn) space, so that given a query point q ∈ Rd, and ϵ > 0, a (1 + ϵ)-approximate nearest neighbor of q can be computed in O(cd, ϵ log n) time, where cd,ϵd ⌈1 + 6d/ϵ⌉d is a factor depending only on dimension and ϵ. In general, we show that given an integer k ≥ 1, (1 + ϵ)-approximations to the k nearest neighbors of q can be computed in additional O(kd log n) time.

References

  1. AGARWAL, P. K., AND MATOUSEK, J. 1993. Ray shooting and parametric search. SlAM J. Comput. 22, 4, 794-806. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. ARYA, S., AND MOUNT, D.M. 1993a. Algorithms for fast vector quantization. In Proceedings of the DCC'93: Data Compression Conference. J. A. Storer and M. Cohn, eds. IEEE Press, New York, pp. 381-390.Google ScholarGoogle ScholarCross RefCross Ref
  3. ARYA, S., AND MOUNT, D.M. 1993b. Approximate nearest neighbor queries in fixed dimensions. In Proceedings of the 4th A CM-SIAM Symposium on Discrete Algorithms. ACM, New York, pp. 271-280. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. ARYA, S., AND MOUNT, D. M. 1995. Approximate range searching. In Proceedings of the llth Annual ACM Symposium on Computational Geometry (Vancouver, B.C., Canada, June 5-7). ACM, New York, pp. 172-181. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. ARYA, S., MOUNT, D., AND NARAYAN, O. 1995. Accounting for boundary effects in nearest neighbor searching. In Proceedings of the llth Annual ACM Symposium on Computational Geometry (Vancouver, B.C., Canada, June 5-7). ACM, New York, pp. 336-344. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. ARYA, S., MOUNT, D. M., NETANYAHU, N., SILVERMAN, R., AND WU, A. Y. 1994. An optimal algorithm for approximate nearest neighbor searching in fixed dimensions. In Proceedings of the 5th ACM-SIAM Symposium on Discrete Algorithms. ACM, New York, pp. 573-582. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. BEI, C.-D., AND GRAY, R. M. 1985. An improvement of the minimum distortion encoding algorithm for vector quantization. IEEE Trans. Commun. 33, 1132-1133.Google ScholarGoogle ScholarCross RefCross Ref
  8. BENTLEY, J. L., WEIDE, B. W., AND YAO, A.C. 1980. Optimal expected-time algorithms for closest point problems. A CM Trans. Math. Sofiw. 6, 4, 563-580. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. BERCHTOLD, S., B()HM, C., KEIM, D. A., AND KRIEGEL, H.-P. 1997. A cost model for nearest neighbor search in high-dimensional data space. In Proceedings of the 16th Annual ACM SIGACT- SIGMOD-SIGART Symposium on Principles of Database Systems (Tucson, Az., May 12-14). ACM, New York, pp. 78-86. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. BERCHTOLD, S., KEIM, D. A., AND KRIEGEL, H.-P. 1996. The X-tree: An index structure for high dimensional data. In Proceedings of the 22nd VLDB Conference. pp. 28-39. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. BERN, M. 1993. Approximate closest-point queries in high dimensions. Inf. Proc. Lett. 45, 95-99. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. BERN, M., EPPSTEIN, D., AND TENS, S.-H. 1993. Parallel construction of quadtrees and quality triangulations. In Proceedings of the 3rd Workshop Algorithms Data Structures. Lecture Notes in Computer Science, vol. 709. Springer-Verlag, New York, pp. 188-199. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. BESPAMYATNIKH, S.N. 1995. An optimal algorithm for closest pair maintenance. In Proceedings of the llth Annual ACM Symposium on Computational Geometry (Vancouver, B.C., Canada, June 5-7). ACM, New York, pp. 152-161. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. CALLAHAN, P. B., AND KOSARAJU, S.R. 1992. A decomposition of multi-dimensional point-sets with applications to k-nearest-neighbors and n-body potentional fields. In Proceedings of the 24th Annual ACM Symposium on the Theory of Computing (Vancouver, B.C., Canada, May 4-6). ACM, New York, pp. 546-556. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. CALLAHAN, P. B., AND KOSARAJU, S.U. 1995. Algorithms for dynamic closest pair and n-body potential fields. In Proceedings of the 6th Annual A CM-SIAM Symposium on Discrete Algorithms (San Francisco, Calif., Jan. 22-24). ACM, New York, pp. 263-272. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. CHAN, T. 1997. Approximate nearest neighbor queries revisited. In Proceedings of the 13th Annual ACM Symposium on Computational Geometry (Nice, France, June 4-6). ACM, New York, pp. 352-358. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. CHAZELLE, B. 1982. A theorem on polygon cutting with applications. In Proceedings of the 23rd Annual IEEE Symposium on Foundations of Computer Science. IEEE Computer Society Press, Los Alamitos, Calif., pp. 339-349.Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. CLARKSON, K.L. 1983. Fast algorithms for the all nearest neighbors problem. In Proceedings of the 24th Annual IEEE Symposium on the Foundations of Computer Science. IEEE Computer Society Press, Los Alamitos, Calif., pp. 226-232.Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. CLARKSON, K.L. 1988. A randomized algorithm for closest-point queries. SIAM J. Comput. 17, 4, 830-847. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. CLARKSON, K.L. 1994. An algorithm for approximate closest-point queries. In Proceedings of the lOth Annual ACM Symposium on Computational Geometry (Stony Brook, N.Y., June 6-8). ACM, New York, pp. 160-164. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. CLEARY, J.G. 1979. Analysis of an algorithm for finding nearest neighbors in Euclidean space. ACM Trans. Math. Sofiw. 5, 2 (June), 183-192. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. CORMEN, T. H., LEISERSON, C. E., AND RIVEST, R.L. 1990. Introduction to Algorithms. MIT Press, Cambridge, Mass. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. COST, S., AND SALZBERG, S. 1993. Aweighted nearest neighbor algorithm for learning with symbolic features. Mach. Learn. 10, 57-78. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. COVER, T. M., AND HART, P.E. 1967. Nearest neighbor pattern classification. IEEE Trans. Info. Theory 13, 57-67.Google ScholarGoogle ScholarCross RefCross Ref
  25. DE BERG, M., VAN KREVELD, M., OVERMARS, M., AND SCHWARZKOPF, O. 1997. Computational Geometry: Algorithms and Applications. Springer-Verlag, New York. Google ScholarGoogle ScholarCross RefCross Ref
  26. DEERWESTER, S., DUMALS, S. T., FURNAS, G. W., LANDAUER, T. K., AND HARSHMAN, R. 1990. Indexing by latend semantic analysis. J. Amer. Soc. Inf. Sci. 41,391-407.Google ScholarGoogle ScholarCross RefCross Ref
  27. DEVaOYE, L., AND WAGNEa, T.J. 1982. Nearest neighbor methods in discrimination. In Handbook of Statistics, vol. 2. P. R. Krishnaiah and L. N. Kanal, eds. North-Holland, Amsterdam, The Netherlands.Google ScholarGoogle Scholar
  28. DUDA, R. O., AND HAST, P.E. 1978. Pattern Classification and Scene Analysis. Wiley, New York.Google ScholarGoogle Scholar
  29. EDELSBaUNNEa, H. 1987. Algorithms in Combinatorial Geometry, vol. 10 of EATCS Monographs on Theoretical Computer Science. Springer-Verlag, New York. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. FARVARDIN, N., AND MODESTINO, J.W. 1985. Rate-distortion performance of DPCM schemes for sutoregressive sources. IEEE Trans. Inf. Theory 31, 3 (May), 402-418.Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. FAYYAD, U. M., PIATETSKY-SHAPIRO, G., SMYTH, P., AND UTHURUSAMY, R. 1996. Advances in Knowledge Discovery and Data Mining. AAAI Press/MIT Press, Cambridge, Mass. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. FEDER, T., AND GREENE, D. 1988. Optimal algorithms for approximate clustering. In Proceedings of the 20th Annual ACM Symposium on Theory of Computing (Chicago, Ill., May 2-4). ACM, New York, pp. 434-444. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. FLICKNER, M., SAWHNEY, H., NIBLACK, W., ASHLEY, J., HUANG, Q., DOM, B., GORKANI, M., HAFNER, J., LEE, D., PETKOVIC, D., STEELE, D., AND YANKER, P. 1995. Query by image and video content: The QBIC system. IEEE Computer 28, 23-32. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. FREDERICKSON, G.N. 1985. Data structures for on-line updating of minimum spanning trees, with applications. SlAM J. Comput. 14, 781-798.Google ScholarGoogle ScholarCross RefCross Ref
  35. FREDERICKSON, G.N. 1993. A data structure for dynamically maintaining rooted trees. In Proceedings of the 4th Annual A CM-SIAM Symposium on Discrete Algorithms. ACM, New York, pp. 175-194. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. FREDMAN, M. L., AND TARJAN, R.E. 1987. Fibonacci heaps and their uses in improved network optimization algorithms. J. ACM 34, 3 (July), 596-615. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. FRIEDMAN, J. H., BENTLEY, J. L., AND FINKEL, R.A. 1977. An algorithm for finding best matches in logarithmic expected time. ACM Trans. Math. Softw. 3, 3 (Sept.), 209-226. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. FRIEDMAN, J. H., BASKETT, F., AND SHUSTEK, L. J. 1975. An algorithm for finding nearest neighbors. IEEE Trans. Comput. C-24, 10, 1000-1006.Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. GALPERIN, I., AND RIVEST, R. L. 1993. Scapegoat trees. In Proceedings of the 4th ACM-SIAM Symposium on Discrete Algorithms. ACM, New York, pp. 165-174. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. GERSHO, A., AND GRAY, R. M. 1991. Vector Quantization and Signal Compression. Kluwer Academic, Boston, Mass. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. GUAN, L., AND KAMEL, M. 1992. Equal-average hyperplane partitioning method for vector quantization of image data. Patt. Recog. Lett. 13, 693-699. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. INDYK, P., AND MOTWANI, R. 1998. Approximate nearest neighbors: Towards removing the curse of dimensionality. In Proceedings of the 30th Annual A CM Symposium on Theory of Computing (Dallas, Tex., May 23-26). ACM, New York, pp. 604-613. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. KLEINBERG, J. M. 1997. Two algorithms for nearest-neighbor search in high dimensions. In Proceedings of the 29th Annual ACM Symposium on Theory of Computing (El Paso, Tex., May 4-6). ACM, New York, pp. 599-608. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. KUSHILEVITZ, E., OSTROVSKY, R., AND RABANI, Y. 1998. Efficient search for approximate nearest neighbor in high dimensional spaces. In Proceedings of the 30th Annual ACM Symposium on the Theory of Computing (Dallas, Tex., May 23-26). ACM, New York, pp. 614-623. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. LEE, C.-H., AND CHEN, L.-H. 1994. Fast closest codeword search algorithm for vector quantisation. IEEE Proc. Vis. Image Sig. Proc. 141, 143-148.Google ScholarGoogle ScholarCross RefCross Ref
  46. LIN, K. I., JAGDISH, H. g., AND FALOUTSOS, C. 1994. The TV-tree: An index structure for high dimensional data. VLDB J. 3, 4, 517-542. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. MEISER, S. 1993. Point location on arrangements of hyperplanes. Inf. Comput. 106, 2, 286-303. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. MOUNT, D. M., NETANYAHU, N., SILVERMAN, R., AND WU, A. Y. 1995. Chromatic nearest neighbor searching: A query sensitive approach. In Proceedings of the 7th Canadian Conference on Computer Geometry. (Quebec City, Que., Canada, Aug. 10-13). pp. 261-266.Google ScholarGoogle Scholar
  49. PREPARATA, F. P., AND SHAMOS, M.I. 1985. Computational Geometry: An Introduction. Springer- Verlag, New York. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. RIVEST, R.L. 1974. On the optimality of Elais's algorithm for performing best-match searches. In Information Processing. North-Holland Publishing Company, Amsterdam, The Netherlands, pp. 678-681.Google ScholarGoogle Scholar
  51. ROUSSOPOULOS, N., KELLEY, S., AND VINCENT, F. 1995. Nearest neighbor queries. In Proceedings of the 1995 ACM SIGMOD Conference on Management of Data (San Jose, Calif., May 23-25). ACM, New York, pp. 71-79. Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. SAMET, n. 1990. The Design and Analysis of Spatial Data Structures. Addison-Wesley, Reading, Mass. Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. SCHWARZ, C., SMID, M., AND SNOEYINK, J. 1994. An optimal algorithm for the on-line closest-pair problem. Algorithmica 12, 18-29.Google ScholarGoogle ScholarCross RefCross Ref
  54. SLEATOR, D. D., AND TARJAN, R.E. 1983. A data structure for dynamic trees. J. Comput. Syst. Sci. 26, 362-391. Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. SPROULL, R.L. 1991. Refinements to nearest-neighbor searching. Algorithmica 6, 579-589.Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. VAIDYA, P. M. 1989. An O(n log n) algorithm for the all-nearest-neighbors problem. Disc. Comput. Geom. 4, 101-115.Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. WHITE, D. A., AND JAIN, R. 1996. Similarity indexing with the SS-tree. In Proceedings of the 12th IEEE International Conference on Data Engineering. IEEE Computer Society Press, Los Alamitos, Calif., pp. 516 -523. Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. {ref65}YAO, A. C., AND YAO, F.F. 1985. A general approach to d-dimensional queries. In Proceedings of the 17th Annual ACM Symposium on Theory of Computing (Providence, R.I., May 6-8). ACM, New York, pp. 163-168. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. An optimal algorithm for approximate nearest neighbor searching fixed dimensions

              Recommendations

              Comments

              Login options

              Check if you have access through your login credentials or your institution to get full access on this article.

              Sign in

              Full Access

              • Published in

                cover image Journal of the ACM
                Journal of the ACM  Volume 45, Issue 6
                Nov. 1998
                185 pages
                ISSN:0004-5411
                EISSN:1557-735X
                DOI:10.1145/293347
                Issue’s Table of Contents

                Copyright © 1998 ACM

                Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

                Publisher

                Association for Computing Machinery

                New York, NY, United States

                Publication History

                • Published: 1 November 1998
                Published in jacm Volume 45, Issue 6

                Permissions

                Request permissions about this article.

                Request Permissions

                Check for updates

                Qualifiers

                • article

              PDF Format

              View or Download as a PDF file.

              PDF

              eReader

              View online with eReader.

              eReader