AI & Analytics
Introduction to Machine Learning in Pyspark
Data scientists who want to learn how to apply machine learning on big data using Pyspark
Aimed at:
Delivery method:
Interactive classroom training combining theory with practical exercises
Prerequisites:
Required preparation:
Basic understanding of linear regression, knowledge of programming basics (not necessarily in Python)
Bring a laptop with X2Go installed
3-4 hours
Foundation
Duration:
Skill level:
Prefered group size:
+/-10 participants per trainer
Course description
This training provides a general introduction to some basic concepts of Machine Learning in the context of logistic regression in Pyspark. It discusses the difference between linear and logistic regression, the algorithm underlying logistic regression, the bias-variance trade-off, and regularization. Participants then work through a Zeppelin notebook in which they apply the learned concepts to predict trial versus settlement outcomes in patent litigation.
Learning objective
Upon completion of this training, participants will be able to apply machine learning in a big data setting.