AI & Analytics

Introduction to Machine Learning in Pyspark

Data scientists who want to learn how to apply machine learning on big data using Pyspark

Aimed at:

Delivery method:

Interactive classroom training combining theory with practical exercises


Required preparation:

Basic understanding of linear regression, knowledge of programming basics (not necessarily in Python)

Bring a laptop with X2Go installed

3-4 hours



Skill level:

Prefered group size:

+/-10 participants per trainer

Course description

This training provides a general introduction to some basic concepts of Machine Learning in the context of logistic regression in Pyspark. It discusses the difference between linear and logistic regression, the algorithm underlying logistic regression, the bias-variance trade-off, and regularization. Participants then work through a Zeppelin notebook in which they apply the learned concepts to predict trial versus settlement outcomes in patent litigation.

Learning objective

Upon completion of this training, participants will be able to apply machine learning in a big data setting.