Preet Parmar - Preet Parmar's Blog

Grouping, Aggregating, and Ordering our Data Frame in PySpark

May 10, 2022

Grouping, Aggregating, and Ordering are the most commonly used functions in PySpark. With the help of this article, I will try to explain how we …

Random Forest: a versatile ML algorithm for classification and regression

May 10, 2022June 2, 2022

A Random Forest is an ensemble of Decision Trees, generally trained via the bagging method (or sometimes pasting), typically with max_samples set to the size …

Bagging and Pasting: Ensemble Learning using Scikit-Learn

March 31, 2022June 2, 2022

One way to get a diverse set of classifiers for ensemble learning is to use very different training algorithms. Another approach is to use the …

Decision Tree: powerful algorithm capable of fitting complex datasets

March 31, 2022June 2, 2022

Decision Trees are versatile Machine Learning algorithms that can perform both classification and regression tasks, even multioutput tasks. The goal is to create a model …

Gradient Descent: a generic algorithm for finding the optimal solution

March 17, 2022June 2, 2022

Gradient Descent is a generic algorithm capable of finding the optimal solutions to a wide range of problems. The general idea is to tweak the …

Display Visuals Based on Selection

March 16, 2022March 16, 2022

In this article, I will go over a specific problem and its solution. Problem Sometimes, we only need to show a visual when certain filters …

Handling customers with same name

March 16, 2022March 16, 2022

More often than not we face a situation where we have customer data with duplicate names. The differentiating column is something like a customer code. …