Hey Brian, I'm glad you found the article useful.

1 min readDec 8, 2020

Hey Brian, I'm glad you found the article useful. I think we are talking about the same thing, in the Scikit learn link you posted it states "common practice when performing a (supervised) machine learning experiment to hold out part of the available data as a test set X_test, y_test." Which basically taking a portion of your dataset to validate your model instead of just training and pushing it in production. If you find low accuracy on your test set then you re-fine hyper parameters then re-train. This is the general concept of cross validation. This is the answer we usually look for.

I believe what you're referring to K-fold cross validation, which we can think of as a more specific enhanced type of cross validation.

In k-fold cross-validation, you split the input data into k subsets of data (also known as folds). You train an ML model on all but one (k-1) of the subsets, and then evaluate the model on the subset that was not used for training. This process is repeated k times, with a different subset reserved for evaluation (and excluded from training) each time.

See this AWS ML guide. https://docs.aws.amazon.com/machine-learning/latest/dg/cross-validation.html

Thanks for your comment and Happy to further discuss.

Sign up to discover human stories that deepen your understanding of the world.

Free

Membership

Written by Mo Daoud

No responses yet