Abstract
Consider a set of S of n data points in real d-dimensional space, Rd, where distances are measured using any Minkowski metric. In nearest neighbor searching, we preprocess S into a data structure, so that given any query point q∈ Rd, is the closest point of S to q can be reported quickly. Given any positive real ϵ, data point p is a (1 +ϵ)-approximate nearest neighbor of q if its distance from q is within a factor of (1 + ϵ) of the distance to the true nearest neighbor. We show that it is possible to preprocess a set of n points in Rd in O(dn log n) time and O(dn) space, so that given a query point q ∈ Rd, and ϵ > 0, a (1 + ϵ)-approximate nearest neighbor of q can be computed in O(cd, ϵ log n) time, where cd,ϵ≤d ⌈1 + 6d/ϵ⌉d is a factor depending only on dimension and ϵ. In general, we show that given an integer k ≥ 1, (1 + ϵ)-approximations to the k nearest neighbors of q can be computed in additional O(kd log n) time.
- AGARWAL, P. K., AND MATOUSEK, J. 1993. Ray shooting and parametric search. SlAM J. Comput. 22, 4, 794-806. Google ScholarDigital Library
- ARYA, S., AND MOUNT, D.M. 1993a. Algorithms for fast vector quantization. In Proceedings of the DCC'93: Data Compression Conference. J. A. Storer and M. Cohn, eds. IEEE Press, New York, pp. 381-390.Google ScholarCross Ref
- ARYA, S., AND MOUNT, D.M. 1993b. Approximate nearest neighbor queries in fixed dimensions. In Proceedings of the 4th A CM-SIAM Symposium on Discrete Algorithms. ACM, New York, pp. 271-280. Google ScholarDigital Library
- ARYA, S., AND MOUNT, D. M. 1995. Approximate range searching. In Proceedings of the llth Annual ACM Symposium on Computational Geometry (Vancouver, B.C., Canada, June 5-7). ACM, New York, pp. 172-181. Google ScholarDigital Library
- ARYA, S., MOUNT, D., AND NARAYAN, O. 1995. Accounting for boundary effects in nearest neighbor searching. In Proceedings of the llth Annual ACM Symposium on Computational Geometry (Vancouver, B.C., Canada, June 5-7). ACM, New York, pp. 336-344. Google ScholarDigital Library
- ARYA, S., MOUNT, D. M., NETANYAHU, N., SILVERMAN, R., AND WU, A. Y. 1994. An optimal algorithm for approximate nearest neighbor searching in fixed dimensions. In Proceedings of the 5th ACM-SIAM Symposium on Discrete Algorithms. ACM, New York, pp. 573-582. Google ScholarDigital Library
- BEI, C.-D., AND GRAY, R. M. 1985. An improvement of the minimum distortion encoding algorithm for vector quantization. IEEE Trans. Commun. 33, 1132-1133.Google ScholarCross Ref
- BENTLEY, J. L., WEIDE, B. W., AND YAO, A.C. 1980. Optimal expected-time algorithms for closest point problems. A CM Trans. Math. Sofiw. 6, 4, 563-580. Google ScholarDigital Library
- BERCHTOLD, S., B()HM, C., KEIM, D. A., AND KRIEGEL, H.-P. 1997. A cost model for nearest neighbor search in high-dimensional data space. In Proceedings of the 16th Annual ACM SIGACT- SIGMOD-SIGART Symposium on Principles of Database Systems (Tucson, Az., May 12-14). ACM, New York, pp. 78-86. Google ScholarDigital Library
- BERCHTOLD, S., KEIM, D. A., AND KRIEGEL, H.-P. 1996. The X-tree: An index structure for high dimensional data. In Proceedings of the 22nd VLDB Conference. pp. 28-39. Google ScholarDigital Library
- BERN, M. 1993. Approximate closest-point queries in high dimensions. Inf. Proc. Lett. 45, 95-99. Google ScholarDigital Library
- BERN, M., EPPSTEIN, D., AND TENS, S.-H. 1993. Parallel construction of quadtrees and quality triangulations. In Proceedings of the 3rd Workshop Algorithms Data Structures. Lecture Notes in Computer Science, vol. 709. Springer-Verlag, New York, pp. 188-199. Google ScholarDigital Library
- BESPAMYATNIKH, S.N. 1995. An optimal algorithm for closest pair maintenance. In Proceedings of the llth Annual ACM Symposium on Computational Geometry (Vancouver, B.C., Canada, June 5-7). ACM, New York, pp. 152-161. Google ScholarDigital Library
- CALLAHAN, P. B., AND KOSARAJU, S.R. 1992. A decomposition of multi-dimensional point-sets with applications to k-nearest-neighbors and n-body potentional fields. In Proceedings of the 24th Annual ACM Symposium on the Theory of Computing (Vancouver, B.C., Canada, May 4-6). ACM, New York, pp. 546-556. Google ScholarDigital Library
- CALLAHAN, P. B., AND KOSARAJU, S.U. 1995. Algorithms for dynamic closest pair and n-body potential fields. In Proceedings of the 6th Annual A CM-SIAM Symposium on Discrete Algorithms (San Francisco, Calif., Jan. 22-24). ACM, New York, pp. 263-272. Google ScholarDigital Library
- CHAN, T. 1997. Approximate nearest neighbor queries revisited. In Proceedings of the 13th Annual ACM Symposium on Computational Geometry (Nice, France, June 4-6). ACM, New York, pp. 352-358. Google ScholarDigital Library
- CHAZELLE, B. 1982. A theorem on polygon cutting with applications. In Proceedings of the 23rd Annual IEEE Symposium on Foundations of Computer Science. IEEE Computer Society Press, Los Alamitos, Calif., pp. 339-349.Google ScholarDigital Library
- CLARKSON, K.L. 1983. Fast algorithms for the all nearest neighbors problem. In Proceedings of the 24th Annual IEEE Symposium on the Foundations of Computer Science. IEEE Computer Society Press, Los Alamitos, Calif., pp. 226-232.Google ScholarDigital Library
- CLARKSON, K.L. 1988. A randomized algorithm for closest-point queries. SIAM J. Comput. 17, 4, 830-847. Google ScholarDigital Library
- CLARKSON, K.L. 1994. An algorithm for approximate closest-point queries. In Proceedings of the lOth Annual ACM Symposium on Computational Geometry (Stony Brook, N.Y., June 6-8). ACM, New York, pp. 160-164. Google ScholarDigital Library
- CLEARY, J.G. 1979. Analysis of an algorithm for finding nearest neighbors in Euclidean space. ACM Trans. Math. Sofiw. 5, 2 (June), 183-192. Google ScholarDigital Library
- CORMEN, T. H., LEISERSON, C. E., AND RIVEST, R.L. 1990. Introduction to Algorithms. MIT Press, Cambridge, Mass. Google ScholarDigital Library
- COST, S., AND SALZBERG, S. 1993. Aweighted nearest neighbor algorithm for learning with symbolic features. Mach. Learn. 10, 57-78. Google ScholarDigital Library
- COVER, T. M., AND HART, P.E. 1967. Nearest neighbor pattern classification. IEEE Trans. Info. Theory 13, 57-67.Google ScholarCross Ref
- DE BERG, M., VAN KREVELD, M., OVERMARS, M., AND SCHWARZKOPF, O. 1997. Computational Geometry: Algorithms and Applications. Springer-Verlag, New York. Google ScholarCross Ref
- DEERWESTER, S., DUMALS, S. T., FURNAS, G. W., LANDAUER, T. K., AND HARSHMAN, R. 1990. Indexing by latend semantic analysis. J. Amer. Soc. Inf. Sci. 41,391-407.Google ScholarCross Ref
- DEVaOYE, L., AND WAGNEa, T.J. 1982. Nearest neighbor methods in discrimination. In Handbook of Statistics, vol. 2. P. R. Krishnaiah and L. N. Kanal, eds. North-Holland, Amsterdam, The Netherlands.Google Scholar
- DUDA, R. O., AND HAST, P.E. 1978. Pattern Classification and Scene Analysis. Wiley, New York.Google Scholar
- EDELSBaUNNEa, H. 1987. Algorithms in Combinatorial Geometry, vol. 10 of EATCS Monographs on Theoretical Computer Science. Springer-Verlag, New York. Google ScholarDigital Library
- FARVARDIN, N., AND MODESTINO, J.W. 1985. Rate-distortion performance of DPCM schemes for sutoregressive sources. IEEE Trans. Inf. Theory 31, 3 (May), 402-418.Google ScholarDigital Library
- FAYYAD, U. M., PIATETSKY-SHAPIRO, G., SMYTH, P., AND UTHURUSAMY, R. 1996. Advances in Knowledge Discovery and Data Mining. AAAI Press/MIT Press, Cambridge, Mass. Google ScholarDigital Library
- FEDER, T., AND GREENE, D. 1988. Optimal algorithms for approximate clustering. In Proceedings of the 20th Annual ACM Symposium on Theory of Computing (Chicago, Ill., May 2-4). ACM, New York, pp. 434-444. Google ScholarDigital Library
- FLICKNER, M., SAWHNEY, H., NIBLACK, W., ASHLEY, J., HUANG, Q., DOM, B., GORKANI, M., HAFNER, J., LEE, D., PETKOVIC, D., STEELE, D., AND YANKER, P. 1995. Query by image and video content: The QBIC system. IEEE Computer 28, 23-32. Google ScholarDigital Library
- FREDERICKSON, G.N. 1985. Data structures for on-line updating of minimum spanning trees, with applications. SlAM J. Comput. 14, 781-798.Google ScholarCross Ref
- FREDERICKSON, G.N. 1993. A data structure for dynamically maintaining rooted trees. In Proceedings of the 4th Annual A CM-SIAM Symposium on Discrete Algorithms. ACM, New York, pp. 175-194. Google ScholarDigital Library
- FREDMAN, M. L., AND TARJAN, R.E. 1987. Fibonacci heaps and their uses in improved network optimization algorithms. J. ACM 34, 3 (July), 596-615. Google ScholarDigital Library
- FRIEDMAN, J. H., BENTLEY, J. L., AND FINKEL, R.A. 1977. An algorithm for finding best matches in logarithmic expected time. ACM Trans. Math. Softw. 3, 3 (Sept.), 209-226. Google ScholarDigital Library
- FRIEDMAN, J. H., BASKETT, F., AND SHUSTEK, L. J. 1975. An algorithm for finding nearest neighbors. IEEE Trans. Comput. C-24, 10, 1000-1006.Google ScholarDigital Library
- GALPERIN, I., AND RIVEST, R. L. 1993. Scapegoat trees. In Proceedings of the 4th ACM-SIAM Symposium on Discrete Algorithms. ACM, New York, pp. 165-174. Google ScholarDigital Library
- GERSHO, A., AND GRAY, R. M. 1991. Vector Quantization and Signal Compression. Kluwer Academic, Boston, Mass. Google ScholarDigital Library
- GUAN, L., AND KAMEL, M. 1992. Equal-average hyperplane partitioning method for vector quantization of image data. Patt. Recog. Lett. 13, 693-699. Google ScholarDigital Library
- INDYK, P., AND MOTWANI, R. 1998. Approximate nearest neighbors: Towards removing the curse of dimensionality. In Proceedings of the 30th Annual A CM Symposium on Theory of Computing (Dallas, Tex., May 23-26). ACM, New York, pp. 604-613. Google ScholarDigital Library
- KLEINBERG, J. M. 1997. Two algorithms for nearest-neighbor search in high dimensions. In Proceedings of the 29th Annual ACM Symposium on Theory of Computing (El Paso, Tex., May 4-6). ACM, New York, pp. 599-608. Google ScholarDigital Library
- KUSHILEVITZ, E., OSTROVSKY, R., AND RABANI, Y. 1998. Efficient search for approximate nearest neighbor in high dimensional spaces. In Proceedings of the 30th Annual ACM Symposium on the Theory of Computing (Dallas, Tex., May 23-26). ACM, New York, pp. 614-623. Google ScholarDigital Library
- LEE, C.-H., AND CHEN, L.-H. 1994. Fast closest codeword search algorithm for vector quantisation. IEEE Proc. Vis. Image Sig. Proc. 141, 143-148.Google ScholarCross Ref
- LIN, K. I., JAGDISH, H. g., AND FALOUTSOS, C. 1994. The TV-tree: An index structure for high dimensional data. VLDB J. 3, 4, 517-542. Google ScholarDigital Library
- MEISER, S. 1993. Point location on arrangements of hyperplanes. Inf. Comput. 106, 2, 286-303. Google ScholarDigital Library
- MOUNT, D. M., NETANYAHU, N., SILVERMAN, R., AND WU, A. Y. 1995. Chromatic nearest neighbor searching: A query sensitive approach. In Proceedings of the 7th Canadian Conference on Computer Geometry. (Quebec City, Que., Canada, Aug. 10-13). pp. 261-266.Google Scholar
- PREPARATA, F. P., AND SHAMOS, M.I. 1985. Computational Geometry: An Introduction. Springer- Verlag, New York. Google ScholarDigital Library
- RIVEST, R.L. 1974. On the optimality of Elais's algorithm for performing best-match searches. In Information Processing. North-Holland Publishing Company, Amsterdam, The Netherlands, pp. 678-681.Google Scholar
- ROUSSOPOULOS, N., KELLEY, S., AND VINCENT, F. 1995. Nearest neighbor queries. In Proceedings of the 1995 ACM SIGMOD Conference on Management of Data (San Jose, Calif., May 23-25). ACM, New York, pp. 71-79. Google ScholarDigital Library
- SAMET, n. 1990. The Design and Analysis of Spatial Data Structures. Addison-Wesley, Reading, Mass. Google ScholarDigital Library
- SCHWARZ, C., SMID, M., AND SNOEYINK, J. 1994. An optimal algorithm for the on-line closest-pair problem. Algorithmica 12, 18-29.Google ScholarCross Ref
- SLEATOR, D. D., AND TARJAN, R.E. 1983. A data structure for dynamic trees. J. Comput. Syst. Sci. 26, 362-391. Google ScholarDigital Library
- SPROULL, R.L. 1991. Refinements to nearest-neighbor searching. Algorithmica 6, 579-589.Google ScholarDigital Library
- VAIDYA, P. M. 1989. An O(n log n) algorithm for the all-nearest-neighbors problem. Disc. Comput. Geom. 4, 101-115.Google ScholarDigital Library
- WHITE, D. A., AND JAIN, R. 1996. Similarity indexing with the SS-tree. In Proceedings of the 12th IEEE International Conference on Data Engineering. IEEE Computer Society Press, Los Alamitos, Calif., pp. 516 -523. Google ScholarDigital Library
- {ref65}YAO, A. C., AND YAO, F.F. 1985. A general approach to d-dimensional queries. In Proceedings of the 17th Annual ACM Symposium on Theory of Computing (Providence, R.I., May 6-8). ACM, New York, pp. 163-168. Google ScholarDigital Library
Index Terms
- An optimal algorithm for approximate nearest neighbor searching fixed dimensions
Recommendations
Expected-Case Complexity of Approximate Nearest Neighbor Searching
Most research in algorithms for geometric query problems has focused on their worst-case performance. However, when information on the query distribution is available, the alternative paradigm of designing and analyzing algorithms from the perspective ...
Space-time tradeoffs for approximate nearest neighbor searching
Nearest neighbor searching is the problem of preprocessing a set of n point points in d-dimensional space so that, given any query point q, it is possible to report the closest point to q rapidly. In approximate nearest neighbor searching, a parameter ε ...
Comments