Mo Daoud
1 min readDec 8, 2020

--

Hey Brian, I'm glad you found the article useful. I think we are talking about the same thing, in the Scikit learn link you posted it states "common practice when performing a (supervised) machine learning experiment to hold out part of the available data as a test set X_test, y_test." Which basically taking a portion of your dataset to validate your model instead of just training and pushing it in production. If you find low accuracy on your test set then you re-fine hyper parameters then re-train. This is the general concept of cross validation. This is the answer we usually look for.

I believe what you're referring to K-fold cross validation, which we can think of as a more specific enhanced type of cross validation.

In k-fold cross-validation, you split the input data into k subsets of data (also known as folds). You train an ML model on all but one (k-1) of the subsets, and then evaluate the model on the subset that was not used for training. This process is repeated k times, with a different subset reserved for evaluation (and excluded from training) each time.

See this AWS ML guide. https://docs.aws.amazon.com/machine-learning/latest/dg/cross-validation.html

Thanks for your comment and Happy to further discuss.

Sign up to discover human stories that deepen your understanding of the world.

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

--

--

Mo Daoud
Mo Daoud

Written by Mo Daoud

Works in technology, AI enthusiast, loves weightlifting and traveling.

No responses yet

Write a response