Designing Algorithms for Machine Learning and Data Mining

Cornuéjols, Antoine; Vrain, Christel

doi:10.1007/978-3-030-06167-8_12

Antoine Cornuéjols⁴ &
Christel Vrain⁵

947 Accesses
1 Citations

Abstract

Designing Machine Learning algorithms implies to answer three main questions: First, what is the space \(\mathscr {H}\) of hypotheses or models of the data that the algorithm considers? Second, what is the inductive criterion used to assess the merit of a hypothesis given the data? Third, given the space \({\mathscr {H}}\) and the inductive criterion, how is the exploration of \({\mathscr {H}}\) carried on in order to find a as good as possible hypothesis? Any learning algorithm can be analyzed along these three questions. This chapter focusses primarily on unsupervised learning, on one hand, and supervised learning, on the other hand. For each, the foremost problems are described as well as the main existing approaches. In particular, the interplay between the structure that can be endowed over the hypothesis space and the optimisation techniques that can in consequence be used is underlined. We cover especially the major existing methods for clustering: prototype-based, generative-based, density-based, spectral based, hierarchical, and conceptual and visit the validation techniques available. For supervised learning, the generative and discriminative approaches are contrasted and a wide variety of linear methods in which we include the Support Vector Machines and Boosting are presented. Multi-Layer neural networks and deep learning methods are discussed. Some additional methods are illustrated, and we describe other learning problems including semi-supervised learning, active learning, online learning, transfer learning, learning to rank, learning recommendations, and identifying causal relationships. We conclude this survey by suggesting new directions for research.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 149.00; Price excludes VAT (USA)

Hardcover Book: USD 199.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
\(S(m,k) = \frac{1}{k!} \sum _{j=0}^k (-1)^{k-j} (k_j) j^m\).
2.
If P is a property and C is a concept, a property is characteristic if \(C \rightarrow P\) and discriminant if \(P \rightarrow C\).
3.
http://archive.ics.uci.edu/ml/datasets/zoo.
4.
A clause is a disjunction of literals, assimilated in this definition to a set of literals.
5.
A reflexive and transitive relation.

References

Aggarwal CC (2015) Data mining: the textbook. Springer Publishing Company Incorporated, Berlin
MATH Google Scholar
Agrawal R, Srikant R (1994) Fast algorithms for mining association rules in large databases. Very large data bases (VLDB-94). Santiage, Chile, pp 487–499
Google Scholar
Aloise D, Hansen P, Liberti L (2012) An improved column generation algorithm for minimum sum-of-squares clustering. Math Program 131(1–2):195–220
Article MathSciNet MATH Google Scholar
Amarel S (1968) On representations of problems of reasoning about actions. Mach Intell 3(3):131–171
MATH Google Scholar
Ankerst M, Breunig MM, Kriegel H, Sander J (1999) OPTICS: ordering points to identify the clustering structure. In: SIGMOD 1999, proceedings ACM SIGMOD international conference on management of data, June 1–3, 1999, Philadelphia, Pennsylvania, USA, pp 49–60. https://doi.org/10.1145/304182.304187
Arthur D, Vassilvitskii S (2007) k-means++: the advantages of careful seeding. In: Proceedings of the eighteenth annual ACM-SIAM symposium on discrete algorithms, SODA 2007, New Orleans, Louisiana, USA, January 7–9, 2007, pp 1027–1035. http://dl.acm.org/citation.cfm?id=1283383.1283494
Bastide Y, Pasquier N, Taouil R, Stumme G, Lakhal L (2000) Mining minimal non-redundant association rules using frequent closed itemsets. In: Computational logic, pp 972–986
Google Scholar
Bezdek JC (1981) Pattern recognition with fuzzy objective function algorithms. Kluwer Academic Publishers, Norwell
Book MATH Google Scholar
Bishop CM (2006) Pattern recognition and machine learning. Springer, Secaucus
MATH Google Scholar
Breiman L (2001) Random forests. Mach Learn 45(1):5–32
Article MATH Google Scholar
Breiman L, Friedman J, Olshen R, Stone CJ (1984) Classification and regression trees. Wadsworth and Brooks/Cole Advanced Books and Software
Google Scholar
Brusco M, Stahl S (2005) Branch-and-bound applications in combinatorial data analysis (Statistics and computing), 1st edn. Springer, Berlin
MATH Google Scholar
Busygin S, Prokopyev OA, Pardalos PM (2008) Biclustering in data mining. Comput OR 35:2964–2987
Article MathSciNet MATH Google Scholar
Cesa-Bianchi N, Lugosi G (2006) Prediction, learning, and games. Cambridge University Press, Cambridge
Book MATH Google Scholar
Chapelle O, Scholkopf B, Zien A (2009) Semi-supervised learning (chapelle O, et al eds; 2006). IEEE Trans Neural Netw 20(3):542
Google Scholar
Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20(3):273–297
MATH Google Scholar
Dao T, Duong K, Vrain C (2017) Constrained clustering by constraint programming. Artif Intell 244:70–94. https://doi.org/10.1016/j.artint.2015.05.006
Article MathSciNet MATH Google Scholar
de la Higuera C (2010) Grammatical inference: learning automata and grammars. Cambridge University Press, Cambridge
Book MATH Google Scholar
Dhillon IS, Guan Y, Kulis B (2004) Kernel k-means: spectral clustering and normalized cuts. In: Proceedings of the tenth ACM SIGKDD international conference on knowledge discovery and data mining, Seattle, Washington, USA, August 22–25, 2004, pp 551–556. https://doi.org/10.1145/1014052.1014118
Ding CHQ, He X (2005) On the equivalence of nonnegative matrix factorization and spectral clustering. In: Proceedings of the 2005 SIAM international conference on data mining, SDM 2005, Newport Beach, CA, USA, April 21–23, 2005, pp 606–610, https://doi.org/10.1137/1.9781611972757.70
du Merle O, Hansen P, Jaumard B, Mladenovic N (1999) An interior point algorithm for minimum sum-of-squares clustering. SIAM J Sci Comput 21(4):1485–1505
Article MathSciNet MATH Google Scholar
Dunn JC (1973) A fuzzy relative of the isodata process and its use in detecting compact well-separated clusters. J Cybern 3(3):32–57. https://doi.org/10.1080/01969727308546046
Article MathSciNet MATH Google Scholar
Dzeroski S, Lavrac N (eds) (2001) Relational data mining. Springer, Berlin
MATH Google Scholar
Ester M, Kriegel H, Sander J, Xu X (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of the second international conference on knowledge discovery and data mining (KDD-96), Portland, Oregon, USA, pp 226–231. http://www.aaai.org/Library/KDD/1996/kdd96-037.php
Fisher DH (1987) Knowledge acquisition via incremental conceptual clustering. Mach Learn 2(2):139–172. https://doi.org/10.1007/BF00114265
Article Google Scholar
Forgy E (1965) Cluster analysis of multivariate data: efficiency versus interpretability of classification. Biometrics 21(3):768–769
Google Scholar
Fürnkranz J, Gamberger D, Lavrac N (2012) Foundations of rule learning. Springer, Berlin
Book MATH Google Scholar
Gama J (2010) Knowledge discovery from data streams. Chapman & Hall
Google Scholar
Ganter B, Wille R, Franke C (1998) Formal concept analysis: mathematical foundations. Springer, Berlin
Google Scholar
Ganter B, Stumme G, Wille R (eds) (2005) Formal concept analysis: foundations and applications. Springer, Berlin
MATH Google Scholar
Getoor L, Taskar B (eds) (2007) An introduction to statistical relational learning. MIT Press, Cambridge
Google Scholar
Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y (2014) Generative adversarial nets. In: Ghahramani Z, Welling M, Cortes C, Lawrence ND, Weinberger KQ (eds) Advances in neural information processing systems 27, Curran Associates, Inc., pp 2672–2680. http://papers.nips.cc/paper/5423-generative-adversarial-nets.pdf
Getoor L, Taskar B (eds) (2007) An introduction to statistical relational learning. MIT Press
Google Scholar
Halkidi M, Batistakis Y, Vazirgiannis M (2002) Clustering validity checking methods: part ii. SIGMOD Rec 31(3):19–27. https://doi.org/10.1145/601858.601862
Article MATH Google Scholar
Han J, Pei J, Yin Y (2000) Mining frequent patterns without candidate generation. SIGMOD Rec 29(2):1–12. https://doi.org/10.1145/335191.335372
Article Google Scholar
Han J, Pei J, Yin Y, Mao R (2004) Mining frequent patterns without candidate generation: a frequent-pattern tree approach. Data Min Knowl Discov 8(1):53–87
Article MathSciNet Google Scholar
Han J, Kamber M, Pei J (2011) Data mining: concepts and techniques, 3rd edn. Morgan Kaufmann Publishers Inc., San Francisco
MATH Google Scholar
Hansen P, Delattre M (1978) Complete-link cluster analysis by graph coloring. J Am Stat Assoc 73:397–403
Article MATH Google Scholar
Hansen P, Jaumard B (1997) Cluster analysis and mathematical programming. Math Program 79(1–3):191–215
MathSciNet MATH Google Scholar
Hastie T, Tibshirani R, Friedman JH (2009) The elements of statistical learning: data mining, inference, and prediction, 2nd edn. Springer series in statistics. Springer, Berlin
Book MATH Google Scholar
Hawkins D (1980) Identification of outliers. Monographs on applied probability and statistics. Chapman and Hall. https://books.google.fr/books?id=fb0OAAAAQAAJ
Hopfield JJ (1982) Neural networks and physical systems with emergent collective computational abilities. Proc Natl Acad Sci 79(8):2554–2558
Article MathSciNet MATH Google Scholar
Jain AK, Dubes RC (1988) Algorithms for clustering data. Prentice-Hall
Google Scholar
Jannach D, Resnick P, Tuzhilin A, Zanker M (2016) Recommender systems-: beyond matrix completion. Commun ACM 59(11):94–102
Article Google Scholar
Japkowicz N (2011) Evaluating learning algorithms: a classification perspective. Cambridge University Press
Google Scholar
Johnson S (1967) Hierarchical clustering schemes. Psychometrika 32(3):241–254
Article MATH Google Scholar
Kaufman L, Rousseeuw PJ (1990) Finding groups in data: an introduction to cluster analysis. Wiley, New York
Book MATH Google Scholar
Klein G, Aronson JE (1991) Optimal clustering: a model and method. Nav Res Logist 38(3):447–461
Article MATH Google Scholar
Kohonen T (ed) (1997) Self-organizing maps. Springer, New York Inc, Secaucus
MATH Google Scholar
Koller D, Friedman N (2009) Probabilistic graphical models. Principles and techniques. MIP Press
Google Scholar
Kotsiantis SB (2007) Supervised machine learning: a review of classification techniques. Informatica 31:249–268
MathSciNet MATH Google Scholar
Lance GN, Williams WTA (1967) A general theory of classificatory sorting strategies: 1. Hierarchical systems 9
Google Scholar
Lachiche N, Vrain C (eds) (2018) Inductive logic programming - 27th international conference, ILP 2017, Orléans, France, September 4–6, 2017, Revised selected papers, Lecture notes in computer science, vol 10759. Springer. https://doi.org/10.1007/978-3-319-78090-0
Lavrac N, Dzeroski S (1994) Inductive logic programming - techniques and applications. Ellis Horwood series in artificial intelligence. Ellis Horwood
Google Scholar
Le Cun Y (1986) Learning process in an asymmetric threshold network. Disordered systems and biological organization. Springer, Berlin, pp 233–240
Chapter Google Scholar
Le Cun Y, Boser BE, Denker JS, Henderson D, Howard RE, Hubbard WE, Jackel LD (1990) Handwritten digit recognition with a back-propagation network. In: Advances in neural information processing systems, pp 396–404
Google Scholar
Le Cun Y, Bengio Y et al (1995) Convolutional networks for images, speech, and time series. The handbook of brain theory and neural networks 3361(10):1995
Google Scholar
Le Cun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521(7553):436–444. https://doi.org/10.1038/nature14539
Levesque HJ, Brachman RJ (1987) Expressiveness and tractability in knowledge representation and reasoning. Comput Intell 3(1):78–93
Article MATH Google Scholar
Li H (2011) A short introduction to learning to rank. IEICE Trans Inf Syst 94(10):1854–1862
Article Google Scholar
Li W, Han J, Pei J (2001) CMAR: accurate and efficient classification based on multiple class-association rules. In: Proceedings of the 2001 IEEE international conference on data mining, 29 November–2 December 2001, San Jose, California, USA, pp 369–376. https://doi.org/10.1109/ICDM.2001.989541
Liu B, Hsu W, Ma Y (1998) Integrating classification and association rule mining. In: Proceedings of the fourth international conference on knowledge discovery and data mining, AAAI Press, KDD’98, pp 80–86. http://dl.acm.org/citation.cfm?id=3000292.3000305
Lloyd SP (1982) Least squares quantization in PCM. IEEE Trans Inf Theory 28(2):129–136. https://doi.org/10.1109/TIT.1982.1056489
Article MathSciNet MATH Google Scholar
Lopez-Paz D, Nishihara R, Chintala S, Schölkopf B, Bottou L (2016) Discovering causal signals in images. arXiv:160508179
Madeira SC, Oliveira AL (2004) Biclustering algorithms for biological data analysis: a survey. IEEE/ACM Trans Comput Biol Bioinform 1:24–45. https://doi.org/10.1109/TCBB.2004.2, www.doi.ieeecomputersociety.org/10.1109/TCBB.2004.2
Michalski RS (1980) Knowledge acquisition through conceptual clustering: a theoretical framework and an algorithm for partitioning data into conjunctive concepts. Int J Policy Anal Inf Syst 4:219–244
MathSciNet Google Scholar
Michalski RS, Stepp RE (1983) Automated construction of classifications: conceptual clustering versus numerical taxonomy. IEEE Trans Pattern Anal Mach Intell 5(4):396–410. https://doi.org/10.1109/TPAMI.1983.4767409
Article Google Scholar
Miclet L (1990) Grammatical inference. In: Bunke H, Sanfeliu A (eds) Syntactic and structural pattern recognition theory and applications. World Scientific, Singapore
Google Scholar
Minsky ML, Papert S (1988) Perceptrons, expanded ed. MIT Press, Cambridge, vol 15, pp 767, 776
Google Scholar
Mitchell T (1982) Generalization as search. Artif Intell J 18:203–226
Article MathSciNet Google Scholar
Mitchell T (1997) Machine learning. McGraw-Hill
Google Scholar
Muggleton S (1995) Inverse entailment and progol. New Gener Comput 13(3&4):245–286
Article Google Scholar
Ng AY, Jordan MI, Weiss Y (2001) On spectral clustering: analysis and an algorithm. In: Advances in neural information processing systems 14 [neural information processing systems: natural and synthetic, NIPS 2001, December 3–8, 2001, Vancouver, British Columbia, Canada], pp 849–856. http://papers.nips.cc/paper/2092-on-spectral-clustering-analysis-and-an-algorithm
Ng RT, Han J (1994) Efficient and effective clustering methods for spatial data mining. In: VLDB’94, proceedings of 20th international conference on very large data bases, September 12–15, 1994, Santiago de Chile, Chile, pp 144–155. http://www.vldb.org/conf/1994/P144.PDF
Nie F, Wang X, Huang H (2014) Clustering and projected clustering with adaptive neighbors. In: The 20th ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’14, New York, NY, USA - August 24–27, 2014, pp 977–986. https://doi.org/10.1145/2623330.2623726
Olshausen BA, Field DJ (1996) Natural image statistics and efficient coding. Netw Comput Neural Syst 7(2):333–339
Article Google Scholar
Pan SJ, Yang Q (2010) A survey on transfer learning. IEEE Trans Knowl Data Eng 22(10):1345–1359
Article Google Scholar
Park HS, Jun CH (2009) A simple and fast algorithm for k-medoids clustering. Expert Syst Appl 36:3336–3341
Article Google Scholar
Pearl J (1988) Probabilistic reasoning in intelligent systems: networks of plausible inference. Morgan Kaufmann
Google Scholar
Pearl J (2009) Causal inference in statistics: an overview. Statist Surv 3:96–146. https://doi.org/10.1214/09-SS057
Article MathSciNet MATH Google Scholar
Peters J, Janzing D, Schölkopf B (2017) Elements of causal inference: foundations and learning algorithms. MIT Press
Google Scholar
Plotkin G (1970) A note on inductive generalization. In: Machine intelligence, vol 5. Edinburgh University Press, pp 153–163
Google Scholar
Qiu Q, Patel VM, Turaga P, Chellappa R (2012) Domain adaptive dictionary learning, pp 631–645
Google Scholar
Quinlan J (1993) C4.5: programs for machine learning. Morgan Kauffman
Google Scholar
Quinlan JR (1996) Learning first-order definitions of functions. CoRR. arXiv:cs.AI/9610102
Raedt LD (2008) Logical and relational learning. Springer, Berlin
Book MATH Google Scholar
Raedt LD, Frasconi P, Kersting K, Muggleton S (eds) (2008) Probabilistic inductive logic programming - theory and applications. Lecture notes in computer science, vol 4911. Springer, Berlin
Google Scholar
Rao M (1969) Cluster analysis and mathematical programming 79:30
Google Scholar
Rubinstein R, Bruckstein AM, Elad M (2010) Dictionaries for sparse representation modeling. Proc IEEE 98(6):1045–1057
Article Google Scholar
Rumelhart DE, McClelland JL, Group PR et al (1987) Parallel distributed processing, vol 1. MIT Press, Cambridge
Google Scholar
Saitta L, Giordana A, Cornuéjols A (2011) Phase transitions in machine learning. Cambridge University Press
Google Scholar
Schölkhopf B, Smola A (2002) Learning with kernels. MIT Press
Google Scholar
Shapire R, Freund Y (2012) Boosting: foundations and algorithms. MIT Press
Google Scholar
Shawe-Taylor J, Cristianini N (2004) Kernel methods for pattern analysis. Cambridge University Press
Google Scholar
Silver D, Huang A, Maddison CJ, Guez A, Sifre L, Van Den Driessche G, Schrittwieser J, Antonoglou I, Panneershelvam V, Lanctot M et al (2016) Mastering the game of go with deep neural networks and tree search. Nature 529(7587):484–489
Article Google Scholar
Silver D, Schrittwieser J, Simonyan K, Antonoglou I, Huang A, Guez A, Hubert T, Baker L, Lai M, Bolton A et al (2017) Mastering the game of go without human knowledge. Nature 550(7676):354
Article Google Scholar
Suraj Z (2004) An introduction to rough sets theory and its applications: a tutorial. In: ICENCO’2004, Cairo, Egypt
Google Scholar
Sutton C, McCallum A (2012) An introduction to conditional random fields. Found Trends Mach Learn 4(4):267–373. https://doi.org/10.1561/2200000013
Article MATH Google Scholar
Tosic I, Frossard P (2011) Dictionary learning. IEEE Signal Process Mag 28(2):27–38
Article MATH Google Scholar
Uno T, Kiyomi M, Arimura H (2004) LCM ver. 2: efficient mining algorithms for frequent/closed/maximal itemsets. In: FIMI ’04, proceedings of the IEEE ICDM workshop on frequent itemset mining implementations, Brighton, UK, November 1, 2004. http://ceur-ws.org/Vol-126/uno.pdf
van der Laag PR, Nienhuys-Cheng SH (1998) Completeness and properness of refinement operators in inductive logic programming. J Log Program 34(3):201–225. https://doi.org/10.1016/S0743-1066(97)00077-0, http://www.sciencedirect.com/science/article/pii/S0743106697000770
Vapnik V (1995) The nature of statistical learning theory. Springer, Berlin
Book MATH Google Scholar
Vega-Pons S, Ruiz-Shulcloper J (2011) A survey of clustering ensemble algorithms. IJPRAI 25(3):337–372. https://doi.org/10.1142/S0218001411008683
Article MathSciNet Google Scholar
von Luxburg U (2007) A tutorial on spectral clustering. Stat Comput 17(4):395–416. https://doi.org/10.1007/s11222-007-9033-z
Article MathSciNet Google Scholar
Wagstaff K, Cardie C (2000) Clustering with instance-level constraints. In: Proceedings of the 17th international conference on machine learning, pp 1103–1110
Google Scholar
Ward JH (1963) Hierarchical grouping to optimize an objective function. J Am Stat Assoc 58(301):236–244. https://doi.org/10.1080/01621459.1963.10500845, http://www.tandfonline.com/doi/abs/10.1080/01621459.1963.10500845
Zaki MJ (2000) Generating non-redundant association rules. In: Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining, Boston, MA, USA, August 20–23, KDD, pp 34–43
Google Scholar
Zelezný F, Lavrac N (2006) Propositionalization-based relational subgroup discovery with rsd. Mach Learn 62(1–2):33–63
Article Google Scholar
Zhang C, Bengio S, Hardt M, Recht B, Vinyals O (2016) Understanding deep learning requires rethinking generalization. arXiv:161103530
Zhou ZH (2012) Ensemble methods: foundations and algorithms. CRC Press
Google Scholar

Download references

Author information

Authors and Affiliations

UMR MIA-Paris, AgroParisTech, INRA, Université Paris-Saclay, 75005, Paris, France
Antoine Cornuéjols
LIFO, EA 4022, University of Orléans, 45067, Orleans, France
Christel Vrain

Authors

Antoine Cornuéjols
View author publications
You can also search for this author in PubMed Google Scholar
Christel Vrain
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Antoine Cornuéjols .

Editor information

Editors and Affiliations

CRIL-CNRS, Université d’Artois and Institut Universitaire de France, Lens, France
Pierre Marquis
Aix Marseille Université, Université de Toulon, CNRS, LIS, Marseille, France
Odile Papini
IRIT, CNRS and Université Paul Sabatier, Toulouse, France
Henri Prade

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Cornuéjols, A., Vrain, C. (2020). Designing Algorithms for Machine Learning and Data Mining. In: Marquis, P., Papini, O., Prade, H. (eds) A Guided Tour of Artificial Intelligence Research. Springer, Cham. https://doi.org/10.1007/978-3-030-06167-8_12

Download citation

DOI: https://doi.org/10.1007/978-3-030-06167-8_12
Published: 08 May 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-06166-1
Online ISBN: 978-3-030-06167-8
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics