Quote from ivanbaj:
This what I wanted to say:
You may be confusing the term 'curve fitting', with the term 'over fitting'.
Imagine I'm given a basket of fruit and I wish to predict whether they are apples or oranges, given our sample set, we notice that all the oranges are orange, so we make a rule in our prediction, if(color==orange) then fruit = orange else fruit = apple; which works perfectly with the sample set we are given, we would say the grammar used to describe our prediction is small, which is good, had we used an algorithm to analyse the color/prediction space, we would say we have been curve fitting to that solution space.
So how good is this fit curve in the solution space of color to predicting apples vs oranges, well it turns out it is very good, even though it is possible to find an apple which is orange or an orange which is not orange, generally thats not the case. We find that even though we are highly fit to the curve we see in our solution space, it is still an extremely good predictor.
Now imagine instead of using color, we use something like sample order as our predictive measure, so we find in our sample set, if we have two apples in a row, it is always followed by an orange. And we keep adding to our predictive grammar until we fully predict our sample set correctly. This would cause a much larger grammar, meaning it is less likely to be correct. We also see that this is not a good predictor in real life, and is highly 'over fit' to our sample set.
The moral of the story, fitting to curves you see in your solution space can be an extremely good way of creating a predictor, as long as the curve you fit to represents something that actually has predictive value.
Thus the difference between curve fitting and over fitting.