Skip to main content

Designing Algorithms for Machine Learning and Data Mining

  • Chapter
  • First Online:
A Guided Tour of Artificial Intelligence Research

Abstract

Designing Machine Learning algorithms implies to answer three main questions: First, what is the space \(\mathscr {H}\) of hypotheses or models of the data that the algorithm considers? Second, what is the inductive criterion used to assess the merit of a hypothesis given the data? Third, given the space \({\mathscr {H}}\) and the inductive criterion, how is the exploration of \({\mathscr {H}}\) carried on in order to find a as good as possible hypothesis? Any learning algorithm can be analyzed along these three questions. This chapter focusses primarily on unsupervised learning, on one hand, and supervised learning, on the other hand. For each, the foremost problems are described as well as the main existing approaches. In particular, the interplay between the structure that can be endowed over the hypothesis space and the optimisation techniques that can in consequence be used is underlined. We cover especially the major existing methods for clustering: prototype-based, generative-based, density-based, spectral based, hierarchical, and conceptual and visit the validation techniques available. For supervised learning, the generative and discriminative approaches are contrasted and a wide variety of linear methods in which we include the Support Vector Machines and Boosting are presented. Multi-Layer neural networks and deep learning methods are discussed. Some additional methods are illustrated, and we describe other learning problems including semi-supervised learning, active learning, online learning, transfer learning, learning to rank, learning recommendations, and identifying causal relationships. We conclude this survey by suggesting new directions for research.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 149.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 199.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    \(S(m,k) = \frac{1}{k!} \sum _{j=0}^k (-1)^{k-j} (k_j) j^m\).

  2. 2.

    If P is a property and C is a concept, a property is characteristic if \(C \rightarrow P\) and discriminant if \(P \rightarrow C\).

  3. 3.

    http://archive.ics.uci.edu/ml/datasets/zoo.

  4. 4.

    A clause is a disjunction of literals, assimilated in this definition to a set of literals.

  5. 5.

    A reflexive and transitive relation.

References

  • Aggarwal CC (2015) Data mining: the textbook. Springer Publishing Company Incorporated, Berlin

    MATH  Google Scholar 

  • Agrawal R, Srikant R (1994) Fast algorithms for mining association rules in large databases. Very large data bases (VLDB-94). Santiage, Chile, pp 487–499

    Google Scholar 

  • Aloise D, Hansen P, Liberti L (2012) An improved column generation algorithm for minimum sum-of-squares clustering. Math Program 131(1–2):195–220

    Article  MathSciNet  MATH  Google Scholar 

  • Amarel S (1968) On representations of problems of reasoning about actions. Mach Intell 3(3):131–171

    MATH  Google Scholar 

  • Ankerst M, Breunig MM, Kriegel H, Sander J (1999) OPTICS: ordering points to identify the clustering structure. In: SIGMOD 1999, proceedings ACM SIGMOD international conference on management of data, June 1–3, 1999, Philadelphia, Pennsylvania, USA, pp 49–60. https://doi.org/10.1145/304182.304187

  • Arthur D, Vassilvitskii S (2007) k-means++: the advantages of careful seeding. In: Proceedings of the eighteenth annual ACM-SIAM symposium on discrete algorithms, SODA 2007, New Orleans, Louisiana, USA, January 7–9, 2007, pp 1027–1035. http://dl.acm.org/citation.cfm?id=1283383.1283494

  • Bastide Y, Pasquier N, Taouil R, Stumme G, Lakhal L (2000) Mining minimal non-redundant association rules using frequent closed itemsets. In: Computational logic, pp 972–986

    Google Scholar 

  • Bezdek JC (1981) Pattern recognition with fuzzy objective function algorithms. Kluwer Academic Publishers, Norwell

    Book  MATH  Google Scholar 

  • Bishop CM (2006) Pattern recognition and machine learning. Springer, Secaucus

    MATH  Google Scholar 

  • Breiman L (2001) Random forests. Mach Learn 45(1):5–32

    Article  MATH  Google Scholar 

  • Breiman L, Friedman J, Olshen R, Stone CJ (1984) Classification and regression trees. Wadsworth and Brooks/Cole Advanced Books and Software

    Google Scholar 

  • Brusco M, Stahl S (2005) Branch-and-bound applications in combinatorial data analysis (Statistics and computing), 1st edn. Springer, Berlin

    MATH  Google Scholar 

  • Busygin S, Prokopyev OA, Pardalos PM (2008) Biclustering in data mining. Comput OR 35:2964–2987

    Article  MathSciNet  MATH  Google Scholar 

  • Cesa-Bianchi N, Lugosi G (2006) Prediction, learning, and games. Cambridge University Press, Cambridge

    Book  MATH  Google Scholar 

  • Chapelle O, Scholkopf B, Zien A (2009) Semi-supervised learning (chapelle O, et al eds; 2006). IEEE Trans Neural Netw 20(3):542

    Google Scholar 

  • Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20(3):273–297

    MATH  Google Scholar 

  • Dao T, Duong K, Vrain C (2017) Constrained clustering by constraint programming. Artif Intell 244:70–94. https://doi.org/10.1016/j.artint.2015.05.006

    Article  MathSciNet  MATH  Google Scholar 

  • de la Higuera C (2010) Grammatical inference: learning automata and grammars. Cambridge University Press, Cambridge

    Book  MATH  Google Scholar 

  • Dhillon IS, Guan Y, Kulis B (2004) Kernel k-means: spectral clustering and normalized cuts. In: Proceedings of the tenth ACM SIGKDD international conference on knowledge discovery and data mining, Seattle, Washington, USA, August 22–25, 2004, pp 551–556. https://doi.org/10.1145/1014052.1014118

  • Ding CHQ, He X (2005) On the equivalence of nonnegative matrix factorization and spectral clustering. In: Proceedings of the 2005 SIAM international conference on data mining, SDM 2005, Newport Beach, CA, USA, April 21–23, 2005, pp 606–610, https://doi.org/10.1137/1.9781611972757.70

  • du Merle O, Hansen P, Jaumard B, Mladenovic N (1999) An interior point algorithm for minimum sum-of-squares clustering. SIAM J Sci Comput 21(4):1485–1505

    Article  MathSciNet  MATH  Google Scholar 

  • Dunn JC (1973) A fuzzy relative of the isodata process and its use in detecting compact well-separated clusters. J Cybern 3(3):32–57. https://doi.org/10.1080/01969727308546046

    Article  MathSciNet  MATH  Google Scholar 

  • Dzeroski S, Lavrac N (eds) (2001) Relational data mining. Springer, Berlin

    MATH  Google Scholar 

  • Ester M, Kriegel H, Sander J, Xu X (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of the second international conference on knowledge discovery and data mining (KDD-96), Portland, Oregon, USA, pp 226–231. http://www.aaai.org/Library/KDD/1996/kdd96-037.php

  • Fisher DH (1987) Knowledge acquisition via incremental conceptual clustering. Mach Learn 2(2):139–172. https://doi.org/10.1007/BF00114265

    Article  Google Scholar 

  • Forgy E (1965) Cluster analysis of multivariate data: efficiency versus interpretability of classification. Biometrics 21(3):768–769

    Google Scholar 

  • Fürnkranz J, Gamberger D, Lavrac N (2012) Foundations of rule learning. Springer, Berlin

    Book  MATH  Google Scholar 

  • Gama J (2010) Knowledge discovery from data streams. Chapman & Hall

    Google Scholar 

  • Ganter B, Wille R, Franke C (1998) Formal concept analysis: mathematical foundations. Springer, Berlin

    Google Scholar 

  • Ganter B, Stumme G, Wille R (eds) (2005) Formal concept analysis: foundations and applications. Springer, Berlin

    MATH  Google Scholar 

  • Getoor L, Taskar B (eds) (2007) An introduction to statistical relational learning. MIT Press, Cambridge

    Google Scholar 

  • Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y (2014) Generative adversarial nets. In: Ghahramani Z, Welling M, Cortes C, Lawrence ND, Weinberger KQ (eds) Advances in neural information processing systems 27, Curran Associates, Inc., pp 2672–2680. http://papers.nips.cc/paper/5423-generative-adversarial-nets.pdf

  • Getoor L, Taskar B (eds) (2007) An introduction to statistical relational learning. MIT Press

    Google Scholar 

  • Halkidi M, Batistakis Y, Vazirgiannis M (2002) Clustering validity checking methods: part ii. SIGMOD Rec 31(3):19–27. https://doi.org/10.1145/601858.601862

    Article  MATH  Google Scholar 

  • Han J, Pei J, Yin Y (2000) Mining frequent patterns without candidate generation. SIGMOD Rec 29(2):1–12. https://doi.org/10.1145/335191.335372

    Article  Google Scholar 

  • Han J, Pei J, Yin Y, Mao R (2004) Mining frequent patterns without candidate generation: a frequent-pattern tree approach. Data Min Knowl Discov 8(1):53–87

    Article  MathSciNet  Google Scholar 

  • Han J, Kamber M, Pei J (2011) Data mining: concepts and techniques, 3rd edn. Morgan Kaufmann Publishers Inc., San Francisco

    MATH  Google Scholar 

  • Hansen P, Delattre M (1978) Complete-link cluster analysis by graph coloring. J Am Stat Assoc 73:397–403

    Article  MATH  Google Scholar 

  • Hansen P, Jaumard B (1997) Cluster analysis and mathematical programming. Math Program 79(1–3):191–215

    MathSciNet  MATH  Google Scholar 

  • Hastie T, Tibshirani R, Friedman JH (2009) The elements of statistical learning: data mining, inference, and prediction, 2nd edn. Springer series in statistics. Springer, Berlin

    Book  MATH  Google Scholar 

  • Hawkins D (1980) Identification of outliers. Monographs on applied probability and statistics. Chapman and Hall. https://books.google.fr/books?id=fb0OAAAAQAAJ

  • Hopfield JJ (1982) Neural networks and physical systems with emergent collective computational abilities. Proc Natl Acad Sci 79(8):2554–2558

    Article  MathSciNet  MATH  Google Scholar 

  • Jain AK, Dubes RC (1988) Algorithms for clustering data. Prentice-Hall

    Google Scholar 

  • Jannach D, Resnick P, Tuzhilin A, Zanker M (2016) Recommender systems-: beyond matrix completion. Commun ACM 59(11):94–102

    Article  Google Scholar 

  • Japkowicz N (2011) Evaluating learning algorithms: a classification perspective. Cambridge University Press

    Google Scholar 

  • Johnson S (1967) Hierarchical clustering schemes. Psychometrika 32(3):241–254

    Article  MATH  Google Scholar 

  • Kaufman L, Rousseeuw PJ (1990) Finding groups in data: an introduction to cluster analysis. Wiley, New York

    Book  MATH  Google Scholar 

  • Klein G, Aronson JE (1991) Optimal clustering: a model and method. Nav Res Logist 38(3):447–461

    Article  MATH  Google Scholar 

  • Kohonen T (ed) (1997) Self-organizing maps. Springer, New York Inc, Secaucus

    MATH  Google Scholar 

  • Koller D, Friedman N (2009) Probabilistic graphical models. Principles and techniques. MIP Press

    Google Scholar 

  • Kotsiantis SB (2007) Supervised machine learning: a review of classification techniques. Informatica 31:249–268

    MathSciNet  MATH  Google Scholar 

  • Lance GN, Williams WTA (1967) A general theory of classificatory sorting strategies: 1. Hierarchical systems 9

    Google Scholar 

  • Lachiche N, Vrain C (eds) (2018) Inductive logic programming - 27th international conference, ILP 2017, Orléans, France, September 4–6, 2017, Revised selected papers, Lecture notes in computer science, vol 10759. Springer. https://doi.org/10.1007/978-3-319-78090-0

  • Lavrac N, Dzeroski S (1994) Inductive logic programming - techniques and applications. Ellis Horwood series in artificial intelligence. Ellis Horwood

    Google Scholar 

  • Le Cun Y (1986) Learning process in an asymmetric threshold network. Disordered systems and biological organization. Springer, Berlin, pp 233–240

    Chapter  Google Scholar 

  • Le Cun Y, Boser BE, Denker JS, Henderson D, Howard RE, Hubbard WE, Jackel LD (1990) Handwritten digit recognition with a back-propagation network. In: Advances in neural information processing systems, pp 396–404

    Google Scholar 

  • Le Cun Y, Bengio Y et al (1995) Convolutional networks for images, speech, and time series. The handbook of brain theory and neural networks 3361(10):1995

    Google Scholar 

  • Le Cun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521(7553):436–444. https://doi.org/10.1038/nature14539

  • Levesque HJ, Brachman RJ (1987) Expressiveness and tractability in knowledge representation and reasoning. Comput Intell 3(1):78–93

    Article  MATH  Google Scholar 

  • Li H (2011) A short introduction to learning to rank. IEICE Trans Inf Syst 94(10):1854–1862

    Article  Google Scholar 

  • Li W, Han J, Pei J (2001) CMAR: accurate and efficient classification based on multiple class-association rules. In: Proceedings of the 2001 IEEE international conference on data mining, 29 November–2 December 2001, San Jose, California, USA, pp 369–376. https://doi.org/10.1109/ICDM.2001.989541

  • Liu B, Hsu W, Ma Y (1998) Integrating classification and association rule mining. In: Proceedings of the fourth international conference on knowledge discovery and data mining, AAAI Press, KDD’98, pp 80–86. http://dl.acm.org/citation.cfm?id=3000292.3000305

  • Lloyd SP (1982) Least squares quantization in PCM. IEEE Trans Inf Theory 28(2):129–136. https://doi.org/10.1109/TIT.1982.1056489

    Article  MathSciNet  MATH  Google Scholar 

  • Lopez-Paz D, Nishihara R, Chintala S, Schölkopf B, Bottou L (2016) Discovering causal signals in images. arXiv:160508179

  • Madeira SC, Oliveira AL (2004) Biclustering algorithms for biological data analysis: a survey. IEEE/ACM Trans Comput Biol Bioinform 1:24–45. https://doi.org/10.1109/TCBB.2004.2, www.doi.ieeecomputersociety.org/10.1109/TCBB.2004.2

  • Michalski RS (1980) Knowledge acquisition through conceptual clustering: a theoretical framework and an algorithm for partitioning data into conjunctive concepts. Int J Policy Anal Inf Syst 4:219–244

    MathSciNet  Google Scholar 

  • Michalski RS, Stepp RE (1983) Automated construction of classifications: conceptual clustering versus numerical taxonomy. IEEE Trans Pattern Anal Mach Intell 5(4):396–410. https://doi.org/10.1109/TPAMI.1983.4767409

    Article  Google Scholar 

  • Miclet L (1990) Grammatical inference. In: Bunke H, Sanfeliu A (eds) Syntactic and structural pattern recognition theory and applications. World Scientific, Singapore

    Google Scholar 

  • Minsky ML, Papert S (1988) Perceptrons, expanded ed. MIT Press, Cambridge, vol 15, pp 767, 776

    Google Scholar 

  • Mitchell T (1982) Generalization as search. Artif Intell J 18:203–226

    Article  MathSciNet  Google Scholar 

  • Mitchell T (1997) Machine learning. McGraw-Hill

    Google Scholar 

  • Muggleton S (1995) Inverse entailment and progol. New Gener Comput 13(3&4):245–286

    Article  Google Scholar 

  • Ng AY, Jordan MI, Weiss Y (2001) On spectral clustering: analysis and an algorithm. In: Advances in neural information processing systems 14 [neural information processing systems: natural and synthetic, NIPS 2001, December 3–8, 2001, Vancouver, British Columbia, Canada], pp 849–856. http://papers.nips.cc/paper/2092-on-spectral-clustering-analysis-and-an-algorithm

  • Ng RT, Han J (1994) Efficient and effective clustering methods for spatial data mining. In: VLDB’94, proceedings of 20th international conference on very large data bases, September 12–15, 1994, Santiago de Chile, Chile, pp 144–155. http://www.vldb.org/conf/1994/P144.PDF

  • Nie F, Wang X, Huang H (2014) Clustering and projected clustering with adaptive neighbors. In: The 20th ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’14, New York, NY, USA - August 24–27, 2014, pp 977–986. https://doi.org/10.1145/2623330.2623726

  • Olshausen BA, Field DJ (1996) Natural image statistics and efficient coding. Netw Comput Neural Syst 7(2):333–339

    Article  Google Scholar 

  • Pan SJ, Yang Q (2010) A survey on transfer learning. IEEE Trans Knowl Data Eng 22(10):1345–1359

    Article  Google Scholar 

  • Park HS, Jun CH (2009) A simple and fast algorithm for k-medoids clustering. Expert Syst Appl 36:3336–3341

    Article  Google Scholar 

  • Pearl J (1988) Probabilistic reasoning in intelligent systems: networks of plausible inference. Morgan Kaufmann

    Google Scholar 

  • Pearl J (2009) Causal inference in statistics: an overview. Statist Surv 3:96–146. https://doi.org/10.1214/09-SS057

    Article  MathSciNet  MATH  Google Scholar 

  • Peters J, Janzing D, Schölkopf B (2017) Elements of causal inference: foundations and learning algorithms. MIT Press

    Google Scholar 

  • Plotkin G (1970) A note on inductive generalization. In: Machine intelligence, vol 5. Edinburgh University Press, pp 153–163

    Google Scholar 

  • Qiu Q, Patel VM, Turaga P, Chellappa R (2012) Domain adaptive dictionary learning, pp 631–645

    Google Scholar 

  • Quinlan J (1993) C4.5: programs for machine learning. Morgan Kauffman

    Google Scholar 

  • Quinlan JR (1996) Learning first-order definitions of functions. CoRR. arXiv:cs.AI/9610102

  • Raedt LD (2008) Logical and relational learning. Springer, Berlin

    Book  MATH  Google Scholar 

  • Raedt LD, Frasconi P, Kersting K, Muggleton S (eds) (2008) Probabilistic inductive logic programming - theory and applications. Lecture notes in computer science, vol 4911. Springer, Berlin

    Google Scholar 

  • Rao M (1969) Cluster analysis and mathematical programming 79:30

    Google Scholar 

  • Rubinstein R, Bruckstein AM, Elad M (2010) Dictionaries for sparse representation modeling. Proc IEEE 98(6):1045–1057

    Article  Google Scholar 

  • Rumelhart DE, McClelland JL, Group PR et al (1987) Parallel distributed processing, vol 1. MIT Press, Cambridge

    Google Scholar 

  • Saitta L, Giordana A, Cornuéjols A (2011) Phase transitions in machine learning. Cambridge University Press

    Google Scholar 

  • Schölkhopf B, Smola A (2002) Learning with kernels. MIT Press

    Google Scholar 

  • Shapire R, Freund Y (2012) Boosting: foundations and algorithms. MIT Press

    Google Scholar 

  • Shawe-Taylor J, Cristianini N (2004) Kernel methods for pattern analysis. Cambridge University Press

    Google Scholar 

  • Silver D, Huang A, Maddison CJ, Guez A, Sifre L, Van Den Driessche G, Schrittwieser J, Antonoglou I, Panneershelvam V, Lanctot M et al (2016) Mastering the game of go with deep neural networks and tree search. Nature 529(7587):484–489

    Article  Google Scholar 

  • Silver D, Schrittwieser J, Simonyan K, Antonoglou I, Huang A, Guez A, Hubert T, Baker L, Lai M, Bolton A et al (2017) Mastering the game of go without human knowledge. Nature 550(7676):354

    Article  Google Scholar 

  • Suraj Z (2004) An introduction to rough sets theory and its applications: a tutorial. In: ICENCO’2004, Cairo, Egypt

    Google Scholar 

  • Sutton C, McCallum A (2012) An introduction to conditional random fields. Found Trends Mach Learn 4(4):267–373. https://doi.org/10.1561/2200000013

    Article  MATH  Google Scholar 

  • Tosic I, Frossard P (2011) Dictionary learning. IEEE Signal Process Mag 28(2):27–38

    Article  MATH  Google Scholar 

  • Uno T, Kiyomi M, Arimura H (2004) LCM ver. 2: efficient mining algorithms for frequent/closed/maximal itemsets. In: FIMI ’04, proceedings of the IEEE ICDM workshop on frequent itemset mining implementations, Brighton, UK, November 1, 2004. http://ceur-ws.org/Vol-126/uno.pdf

  • van der Laag PR, Nienhuys-Cheng SH (1998) Completeness and properness of refinement operators in inductive logic programming. J Log Program 34(3):201–225. https://doi.org/10.1016/S0743-1066(97)00077-0, http://www.sciencedirect.com/science/article/pii/S0743106697000770

  • Vapnik V (1995) The nature of statistical learning theory. Springer, Berlin

    Book  MATH  Google Scholar 

  • Vega-Pons S, Ruiz-Shulcloper J (2011) A survey of clustering ensemble algorithms. IJPRAI 25(3):337–372. https://doi.org/10.1142/S0218001411008683

    Article  MathSciNet  Google Scholar 

  • von Luxburg U (2007) A tutorial on spectral clustering. Stat Comput 17(4):395–416. https://doi.org/10.1007/s11222-007-9033-z

    Article  MathSciNet  Google Scholar 

  • Wagstaff K, Cardie C (2000) Clustering with instance-level constraints. In: Proceedings of the 17th international conference on machine learning, pp 1103–1110

    Google Scholar 

  • Ward JH (1963) Hierarchical grouping to optimize an objective function. J Am Stat Assoc 58(301):236–244. https://doi.org/10.1080/01621459.1963.10500845, http://www.tandfonline.com/doi/abs/10.1080/01621459.1963.10500845

  • Zaki MJ (2000) Generating non-redundant association rules. In: Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining, Boston, MA, USA, August 20–23, KDD, pp 34–43

    Google Scholar 

  • Zelezný F, Lavrac N (2006) Propositionalization-based relational subgroup discovery with rsd. Mach Learn 62(1–2):33–63

    Article  Google Scholar 

  • Zhang C, Bengio S, Hardt M, Recht B, Vinyals O (2016) Understanding deep learning requires rethinking generalization. arXiv:161103530

  • Zhou ZH (2012) Ensemble methods: foundations and algorithms. CRC Press

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Antoine Cornuéjols .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Cornuéjols, A., Vrain, C. (2020). Designing Algorithms for Machine Learning and Data Mining. In: Marquis, P., Papini, O., Prade, H. (eds) A Guided Tour of Artificial Intelligence Research. Springer, Cham. https://doi.org/10.1007/978-3-030-06167-8_12

Download citation

Publish with us

Policies and ethics