The training examples come from some generally unknown probability distribution (considered representative of the space of occurrences) and the learner has to build a general model about this space that enables it to produce sufficiently accurate predictions in new cases. They are usually used to model complex relationships between inputs and outputs, to find patterns in data, or to capture the statistical structure in an unknown joint probability distribution between observed variables.

Reinforcement learning: training data (in form of rewards and punishments) is given only as feedback to the program's actions in a dynamic environment, such as driving a vehicle or playing a game against an opponent.

In classification, inputs are divided into two or more classes, and the learner must produce a model that assigns unseen inputs to one or more (multi-label classification) of these classes.

Evaluated with respect to known knowledge, an uninformed (unsupervised) method will easily be outperformed by other supervised methods, while in a typical KDD task, supervised methods cannot be used due to the unavailability of training data. Association rule learning is a method for discovering interesting relations between variables in large databases. Unsupervised learning can be a goal in itself (discovering hidden patterns in data) or a means towards an end (feature learning). Positive results show that a certain class of functions can be learned in polynomial time. In computational learning theory, a computation is considered feasible if it can be done in polynomial time. Negative results show that certain classes cannot be learned in polynomial time.

<p> But if the hypothesis is too complex, then the model is subject to overfitting and generalization will be poorer. If the hypothesis is less complex than the function, then the model has underfit the data. For the best performance in the context of generalization, the complexity of the hypothesis should match the complexity of the function underlying the data. Because training sets are finite and the future is uncertain, learning theory usually does not yield guarantees of the performance of algorithms. Unlike in classification, the groups are not known beforehand, making this typically an unsupervised task. Much of the confusion between these two research communities (which do often have separate conferences and separate journals, ECML PKDD being a major exception) comes from the basic assumptions they work with: in machine learning, performance is usually evaluated with respect to the ability to reproduce known knowledge, while in knowledge discovery and data mining (KDD) the key task is the discovery of previously unknown knowledge.</p>

