Abstract
Designing Machine Learning algorithms implies to answer three main questions: First, what is the space \(\mathscr {H}\) of hypotheses or models of the data that the algorithm considers? Second, what is the inductive criterion used to assess the merit of a hypothesis given the data? Third, given the space \({\mathscr {H}}\) and the inductive criterion, how is the exploration of \({\mathscr {H}}\) carried on in order to find a as good as possible hypothesis? Any learning algorithm can be analyzed along these three questions. This chapter focusses primarily on unsupervised learning, on one hand, and supervised learning, on the other hand. For each, the foremost problems are described as well as the main existing approaches. In particular, the interplay between the structure that can be endowed over the hypothesis space and the optimisation techniques that can in consequence be used is underlined. We cover especially the major existing methods for clustering: prototype-based, generative-based, density-based, spectral based, hierarchical, and conceptual and visit the validation techniques available. For supervised learning, the generative and discriminative approaches are contrasted and a wide variety of linear methods in which we include the Support Vector Machines and Boosting are presented. Multi-Layer neural networks and deep learning methods are discussed. Some additional methods are illustrated, and we describe other learning problems including semi-supervised learning, active learning, online learning, transfer learning, learning to rank, learning recommendations, and identifying causal relationships. We conclude this survey by suggesting new directions for research.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
\(S(m,k) = \frac{1}{k!} \sum _{j=0}^k (-1)^{k-j} (k_j) j^m\).
- 2.
If P is a property and C is a concept, a property is characteristic if \(C \rightarrow P\) and discriminant if \(P \rightarrow C\).
- 3.
- 4.
A clause is a disjunction of literals, assimilated in this definition to a set of literals.
- 5.
A reflexive and transitive relation.
References
Aggarwal CC (2015) Data mining: the textbook. Springer Publishing Company Incorporated, Berlin
Agrawal R, Srikant R (1994) Fast algorithms for mining association rules in large databases. Very large data bases (VLDB-94). Santiage, Chile, pp 487–499
Aloise D, Hansen P, Liberti L (2012) An improved column generation algorithm for minimum sum-of-squares clustering. Math Program 131(1–2):195–220
Amarel S (1968) On representations of problems of reasoning about actions. Mach Intell 3(3):131–171
Ankerst M, Breunig MM, Kriegel H, Sander J (1999) OPTICS: ordering points to identify the clustering structure. In: SIGMOD 1999, proceedings ACM SIGMOD international conference on management of data, June 1–3, 1999, Philadelphia, Pennsylvania, USA, pp 49–60. https://doi.org/10.1145/304182.304187
Arthur D, Vassilvitskii S (2007) k-means++: the advantages of careful seeding. In: Proceedings of the eighteenth annual ACM-SIAM symposium on discrete algorithms, SODA 2007, New Orleans, Louisiana, USA, January 7–9, 2007, pp 1027–1035. http://dl.acm.org/citation.cfm?id=1283383.1283494
Bastide Y, Pasquier N, Taouil R, Stumme G, Lakhal L (2000) Mining minimal non-redundant association rules using frequent closed itemsets. In: Computational logic, pp 972–986
Bezdek JC (1981) Pattern recognition with fuzzy objective function algorithms. Kluwer Academic Publishers, Norwell
Bishop CM (2006) Pattern recognition and machine learning. Springer, Secaucus
Breiman L (2001) Random forests. Mach Learn 45(1):5–32
Breiman L, Friedman J, Olshen R, Stone CJ (1984) Classification and regression trees. Wadsworth and Brooks/Cole Advanced Books and Software
Brusco M, Stahl S (2005) Branch-and-bound applications in combinatorial data analysis (Statistics and computing), 1st edn. Springer, Berlin
Busygin S, Prokopyev OA, Pardalos PM (2008) Biclustering in data mining. Comput OR 35:2964–2987
Cesa-Bianchi N, Lugosi G (2006) Prediction, learning, and games. Cambridge University Press, Cambridge
Chapelle O, Scholkopf B, Zien A (2009) Semi-supervised learning (chapelle O, et al eds; 2006). IEEE Trans Neural Netw 20(3):542
Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20(3):273–297
Dao T, Duong K, Vrain C (2017) Constrained clustering by constraint programming. Artif Intell 244:70–94. https://doi.org/10.1016/j.artint.2015.05.006
de la Higuera C (2010) Grammatical inference: learning automata and grammars. Cambridge University Press, Cambridge
Dhillon IS, Guan Y, Kulis B (2004) Kernel k-means: spectral clustering and normalized cuts. In: Proceedings of the tenth ACM SIGKDD international conference on knowledge discovery and data mining, Seattle, Washington, USA, August 22–25, 2004, pp 551–556. https://doi.org/10.1145/1014052.1014118
Ding CHQ, He X (2005) On the equivalence of nonnegative matrix factorization and spectral clustering. In: Proceedings of the 2005 SIAM international conference on data mining, SDM 2005, Newport Beach, CA, USA, April 21–23, 2005, pp 606–610, https://doi.org/10.1137/1.9781611972757.70
du Merle O, Hansen P, Jaumard B, Mladenovic N (1999) An interior point algorithm for minimum sum-of-squares clustering. SIAM J Sci Comput 21(4):1485–1505
Dunn JC (1973) A fuzzy relative of the isodata process and its use in detecting compact well-separated clusters. J Cybern 3(3):32–57. https://doi.org/10.1080/01969727308546046
Dzeroski S, Lavrac N (eds) (2001) Relational data mining. Springer, Berlin
Ester M, Kriegel H, Sander J, Xu X (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of the second international conference on knowledge discovery and data mining (KDD-96), Portland, Oregon, USA, pp 226–231. http://www.aaai.org/Library/KDD/1996/kdd96-037.php
Fisher DH (1987) Knowledge acquisition via incremental conceptual clustering. Mach Learn 2(2):139–172. https://doi.org/10.1007/BF00114265
Forgy E (1965) Cluster analysis of multivariate data: efficiency versus interpretability of classification. Biometrics 21(3):768–769
Fürnkranz J, Gamberger D, Lavrac N (2012) Foundations of rule learning. Springer, Berlin
Gama J (2010) Knowledge discovery from data streams. Chapman & Hall
Ganter B, Wille R, Franke C (1998) Formal concept analysis: mathematical foundations. Springer, Berlin
Ganter B, Stumme G, Wille R (eds) (2005) Formal concept analysis: foundations and applications. Springer, Berlin
Getoor L, Taskar B (eds) (2007) An introduction to statistical relational learning. MIT Press, Cambridge
Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y (2014) Generative adversarial nets. In: Ghahramani Z, Welling M, Cortes C, Lawrence ND, Weinberger KQ (eds) Advances in neural information processing systems 27, Curran Associates, Inc., pp 2672–2680. http://papers.nips.cc/paper/5423-generative-adversarial-nets.pdf
Getoor L, Taskar B (eds) (2007) An introduction to statistical relational learning. MIT Press
Halkidi M, Batistakis Y, Vazirgiannis M (2002) Clustering validity checking methods: part ii. SIGMOD Rec 31(3):19–27. https://doi.org/10.1145/601858.601862
Han J, Pei J, Yin Y (2000) Mining frequent patterns without candidate generation. SIGMOD Rec 29(2):1–12. https://doi.org/10.1145/335191.335372
Han J, Pei J, Yin Y, Mao R (2004) Mining frequent patterns without candidate generation: a frequent-pattern tree approach. Data Min Knowl Discov 8(1):53–87
Han J, Kamber M, Pei J (2011) Data mining: concepts and techniques, 3rd edn. Morgan Kaufmann Publishers Inc., San Francisco
Hansen P, Delattre M (1978) Complete-link cluster analysis by graph coloring. J Am Stat Assoc 73:397–403
Hansen P, Jaumard B (1997) Cluster analysis and mathematical programming. Math Program 79(1–3):191–215
Hastie T, Tibshirani R, Friedman JH (2009) The elements of statistical learning: data mining, inference, and prediction, 2nd edn. Springer series in statistics. Springer, Berlin
Hawkins D (1980) Identification of outliers. Monographs on applied probability and statistics. Chapman and Hall. https://books.google.fr/books?id=fb0OAAAAQAAJ
Hopfield JJ (1982) Neural networks and physical systems with emergent collective computational abilities. Proc Natl Acad Sci 79(8):2554–2558
Jain AK, Dubes RC (1988) Algorithms for clustering data. Prentice-Hall
Jannach D, Resnick P, Tuzhilin A, Zanker M (2016) Recommender systems-: beyond matrix completion. Commun ACM 59(11):94–102
Japkowicz N (2011) Evaluating learning algorithms: a classification perspective. Cambridge University Press
Johnson S (1967) Hierarchical clustering schemes. Psychometrika 32(3):241–254
Kaufman L, Rousseeuw PJ (1990) Finding groups in data: an introduction to cluster analysis. Wiley, New York
Klein G, Aronson JE (1991) Optimal clustering: a model and method. Nav Res Logist 38(3):447–461
Kohonen T (ed) (1997) Self-organizing maps. Springer, New York Inc, Secaucus
Koller D, Friedman N (2009) Probabilistic graphical models. Principles and techniques. MIP Press
Kotsiantis SB (2007) Supervised machine learning: a review of classification techniques. Informatica 31:249–268
Lance GN, Williams WTA (1967) A general theory of classificatory sorting strategies: 1. Hierarchical systems 9
Lachiche N, Vrain C (eds) (2018) Inductive logic programming - 27th international conference, ILP 2017, Orléans, France, September 4–6, 2017, Revised selected papers, Lecture notes in computer science, vol 10759. Springer. https://doi.org/10.1007/978-3-319-78090-0
Lavrac N, Dzeroski S (1994) Inductive logic programming - techniques and applications. Ellis Horwood series in artificial intelligence. Ellis Horwood
Le Cun Y (1986) Learning process in an asymmetric threshold network. Disordered systems and biological organization. Springer, Berlin, pp 233–240
Le Cun Y, Boser BE, Denker JS, Henderson D, Howard RE, Hubbard WE, Jackel LD (1990) Handwritten digit recognition with a back-propagation network. In: Advances in neural information processing systems, pp 396–404
Le Cun Y, Bengio Y et al (1995) Convolutional networks for images, speech, and time series. The handbook of brain theory and neural networks 3361(10):1995
Le Cun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521(7553):436–444. https://doi.org/10.1038/nature14539
Levesque HJ, Brachman RJ (1987) Expressiveness and tractability in knowledge representation and reasoning. Comput Intell 3(1):78–93
Li H (2011) A short introduction to learning to rank. IEICE Trans Inf Syst 94(10):1854–1862
Li W, Han J, Pei J (2001) CMAR: accurate and efficient classification based on multiple class-association rules. In: Proceedings of the 2001 IEEE international conference on data mining, 29 November–2 December 2001, San Jose, California, USA, pp 369–376. https://doi.org/10.1109/ICDM.2001.989541
Liu B, Hsu W, Ma Y (1998) Integrating classification and association rule mining. In: Proceedings of the fourth international conference on knowledge discovery and data mining, AAAI Press, KDD’98, pp 80–86. http://dl.acm.org/citation.cfm?id=3000292.3000305
Lloyd SP (1982) Least squares quantization in PCM. IEEE Trans Inf Theory 28(2):129–136. https://doi.org/10.1109/TIT.1982.1056489
Lopez-Paz D, Nishihara R, Chintala S, Schölkopf B, Bottou L (2016) Discovering causal signals in images. arXiv:160508179
Madeira SC, Oliveira AL (2004) Biclustering algorithms for biological data analysis: a survey. IEEE/ACM Trans Comput Biol Bioinform 1:24–45. https://doi.org/10.1109/TCBB.2004.2, www.doi.ieeecomputersociety.org/10.1109/TCBB.2004.2
Michalski RS (1980) Knowledge acquisition through conceptual clustering: a theoretical framework and an algorithm for partitioning data into conjunctive concepts. Int J Policy Anal Inf Syst 4:219–244
Michalski RS, Stepp RE (1983) Automated construction of classifications: conceptual clustering versus numerical taxonomy. IEEE Trans Pattern Anal Mach Intell 5(4):396–410. https://doi.org/10.1109/TPAMI.1983.4767409
Miclet L (1990) Grammatical inference. In: Bunke H, Sanfeliu A (eds) Syntactic and structural pattern recognition theory and applications. World Scientific, Singapore
Minsky ML, Papert S (1988) Perceptrons, expanded ed. MIT Press, Cambridge, vol 15, pp 767, 776
Mitchell T (1982) Generalization as search. Artif Intell J 18:203–226
Mitchell T (1997) Machine learning. McGraw-Hill
Muggleton S (1995) Inverse entailment and progol. New Gener Comput 13(3&4):245–286
Ng AY, Jordan MI, Weiss Y (2001) On spectral clustering: analysis and an algorithm. In: Advances in neural information processing systems 14 [neural information processing systems: natural and synthetic, NIPS 2001, December 3–8, 2001, Vancouver, British Columbia, Canada], pp 849–856. http://papers.nips.cc/paper/2092-on-spectral-clustering-analysis-and-an-algorithm
Ng RT, Han J (1994) Efficient and effective clustering methods for spatial data mining. In: VLDB’94, proceedings of 20th international conference on very large data bases, September 12–15, 1994, Santiago de Chile, Chile, pp 144–155. http://www.vldb.org/conf/1994/P144.PDF
Nie F, Wang X, Huang H (2014) Clustering and projected clustering with adaptive neighbors. In: The 20th ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’14, New York, NY, USA - August 24–27, 2014, pp 977–986. https://doi.org/10.1145/2623330.2623726
Olshausen BA, Field DJ (1996) Natural image statistics and efficient coding. Netw Comput Neural Syst 7(2):333–339
Pan SJ, Yang Q (2010) A survey on transfer learning. IEEE Trans Knowl Data Eng 22(10):1345–1359
Park HS, Jun CH (2009) A simple and fast algorithm for k-medoids clustering. Expert Syst Appl 36:3336–3341
Pearl J (1988) Probabilistic reasoning in intelligent systems: networks of plausible inference. Morgan Kaufmann
Pearl J (2009) Causal inference in statistics: an overview. Statist Surv 3:96–146. https://doi.org/10.1214/09-SS057
Peters J, Janzing D, Schölkopf B (2017) Elements of causal inference: foundations and learning algorithms. MIT Press
Plotkin G (1970) A note on inductive generalization. In: Machine intelligence, vol 5. Edinburgh University Press, pp 153–163
Qiu Q, Patel VM, Turaga P, Chellappa R (2012) Domain adaptive dictionary learning, pp 631–645
Quinlan J (1993) C4.5: programs for machine learning. Morgan Kauffman
Quinlan JR (1996) Learning first-order definitions of functions. CoRR. arXiv:cs.AI/9610102
Raedt LD (2008) Logical and relational learning. Springer, Berlin
Raedt LD, Frasconi P, Kersting K, Muggleton S (eds) (2008) Probabilistic inductive logic programming - theory and applications. Lecture notes in computer science, vol 4911. Springer, Berlin
Rao M (1969) Cluster analysis and mathematical programming 79:30
Rubinstein R, Bruckstein AM, Elad M (2010) Dictionaries for sparse representation modeling. Proc IEEE 98(6):1045–1057
Rumelhart DE, McClelland JL, Group PR et al (1987) Parallel distributed processing, vol 1. MIT Press, Cambridge
Saitta L, Giordana A, Cornuéjols A (2011) Phase transitions in machine learning. Cambridge University Press
Schölkhopf B, Smola A (2002) Learning with kernels. MIT Press
Shapire R, Freund Y (2012) Boosting: foundations and algorithms. MIT Press
Shawe-Taylor J, Cristianini N (2004) Kernel methods for pattern analysis. Cambridge University Press
Silver D, Huang A, Maddison CJ, Guez A, Sifre L, Van Den Driessche G, Schrittwieser J, Antonoglou I, Panneershelvam V, Lanctot M et al (2016) Mastering the game of go with deep neural networks and tree search. Nature 529(7587):484–489
Silver D, Schrittwieser J, Simonyan K, Antonoglou I, Huang A, Guez A, Hubert T, Baker L, Lai M, Bolton A et al (2017) Mastering the game of go without human knowledge. Nature 550(7676):354
Suraj Z (2004) An introduction to rough sets theory and its applications: a tutorial. In: ICENCO’2004, Cairo, Egypt
Sutton C, McCallum A (2012) An introduction to conditional random fields. Found Trends Mach Learn 4(4):267–373. https://doi.org/10.1561/2200000013
Tosic I, Frossard P (2011) Dictionary learning. IEEE Signal Process Mag 28(2):27–38
Uno T, Kiyomi M, Arimura H (2004) LCM ver. 2: efficient mining algorithms for frequent/closed/maximal itemsets. In: FIMI ’04, proceedings of the IEEE ICDM workshop on frequent itemset mining implementations, Brighton, UK, November 1, 2004. http://ceur-ws.org/Vol-126/uno.pdf
van der Laag PR, Nienhuys-Cheng SH (1998) Completeness and properness of refinement operators in inductive logic programming. J Log Program 34(3):201–225. https://doi.org/10.1016/S0743-1066(97)00077-0, http://www.sciencedirect.com/science/article/pii/S0743106697000770
Vapnik V (1995) The nature of statistical learning theory. Springer, Berlin
Vega-Pons S, Ruiz-Shulcloper J (2011) A survey of clustering ensemble algorithms. IJPRAI 25(3):337–372. https://doi.org/10.1142/S0218001411008683
von Luxburg U (2007) A tutorial on spectral clustering. Stat Comput 17(4):395–416. https://doi.org/10.1007/s11222-007-9033-z
Wagstaff K, Cardie C (2000) Clustering with instance-level constraints. In: Proceedings of the 17th international conference on machine learning, pp 1103–1110
Ward JH (1963) Hierarchical grouping to optimize an objective function. J Am Stat Assoc 58(301):236–244. https://doi.org/10.1080/01621459.1963.10500845, http://www.tandfonline.com/doi/abs/10.1080/01621459.1963.10500845
Zaki MJ (2000) Generating non-redundant association rules. In: Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining, Boston, MA, USA, August 20–23, KDD, pp 34–43
Zelezný F, Lavrac N (2006) Propositionalization-based relational subgroup discovery with rsd. Mach Learn 62(1–2):33–63
Zhang C, Bengio S, Hardt M, Recht B, Vinyals O (2016) Understanding deep learning requires rethinking generalization. arXiv:161103530
Zhou ZH (2012) Ensemble methods: foundations and algorithms. CRC Press
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this chapter
Cite this chapter
Cornuéjols, A., Vrain, C. (2020). Designing Algorithms for Machine Learning and Data Mining. In: Marquis, P., Papini, O., Prade, H. (eds) A Guided Tour of Artificial Intelligence Research. Springer, Cham. https://doi.org/10.1007/978-3-030-06167-8_12
Download citation
DOI: https://doi.org/10.1007/978-3-030-06167-8_12
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-06166-1
Online ISBN: 978-3-030-06167-8
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)