10 Key Concepts and Ideas to Apply When Solving Any Data Science (Machine Learning) Problem ?

Spread the love

Solving data science problems requires systematic thinking and approach, here are some of the key concepts and ideas that you need to apply when trying to solve data science problems:-

Step 1 | Identify Type of Problem

Typical problems may be classification, regression, recommendation systems, reinforcement learnings, etc. When getting started, you should have a clear idea about the type of problem that you are going to solve.

Step 2 Basic Understanding of the Training Set

The training set may have too few samples, or it might be too big. The training set may or may not have desired features or information for solving the problem at hand. The training set may also be imbalanced, meaning examples of a specific type of data may be too few or too many.

Step 3 Accuracy

We have to be conscious of the accuracy and risks involved when choosing the algorithm.

Step 4 Training Time

In various scenarios, we might or might not have the luxury of long training cycles.

Step 5 Linearity

Some algorithms can only solve problems if they are linearly separable. If that’s not the case than we can have poor accuracy.

Step 6 Number of Parameters

Parameters affect the algorithm’s behavior, such as error tolerance or several iterations. Typically, algorithms with large numbers of parameters require the most trial and error(and time) to find a good combination.

Step 7 Optimize hyperparameters

There are three options for optimizing hyperparameters, grid search, random search, and Bayesian optimization.

Step 8 Number of Features

The number of features in some datasets can be very large compared to the number of data points. This is often the case with genetics or textual data. A large number of features can bog down some learning algorithms, making training time unfeasibly long. Some algorithms such as Support Vector Machines are particularly well suited to this case.

Step 9 Feature Engineering

Feature engineering is about creating new input features from your existing ones.

In general, you can think of data cleaning as a process of subtraction and feature engineering as a process of addition. Deep learning algorithms can deduce features implicitly and therefore may not require creation of new features whereas classical machine learning algorithms may benefit from additional derived features.

Step 10 Tree Ensembles

In many cases, no single algorithm may give us desired results. Ensembles are machine learning methods for combining predictions from multiple separate models.

While bagging and boosting are both ensemble methods, they approach the problem from opposite directions. Bagging uses complex base models and tries to “smooth out” their predictions while boosting uses simple base models and tries to “boost” their aggregate complexity.

Ensembling is a general term, but when the base models are decision trees, they have special names: random forests and boosted trees!

10 Key Concepts and Ideas to Apply When Solving Any Data Science (Machine Learning) Problem ?

ByHassan Amin

Step 1 | Identify Type of Problem

Step 2 Basic Understanding of the Training Set

Step 3 Accuracy

Step 4 Training Time

Step 5 Linearity

Step 6 Number of Parameters

Step 7 Optimize hyperparameters

Step 8 Number of Features

Step 9 Feature Engineering

Step 10 Tree Ensembles

Related

By Hassan Amin

Related Post

Understanding and Developing Data Strategy and Monetization

Combating AI Fear

Tracking Recent LLM Model Announcements

Building a Personal Brand

Book Summary : Think and Grow Rich by Napolean Hill

Book Summary : Thinking, Fast and Slow by Nobel Laureate Daniel Kahneman

A Guide to Good and Bad Habits for Teens

Setting Up PostGIS Extension On Greenplum 6 In Ubuntu 18.04

ChatGPT is a Turning Point

Building Conversational Chatbot with GPT3

Success of Technology Companies Depends on Relationship between Technical Leadership and Management

You missed

Building a Personal Brand

Book Summary : Think and Grow Rich by Napolean Hill

Book Summary : Thinking, Fast and Slow by Nobel Laureate Daniel Kahneman

A Guide to Good and Bad Habits for Teens