© 2019 by the PA CONSULTING GROUP

  • Black Twitter Icon
  • Black LinkedIn Icon

Data Driven Organisation website

AI & Analytics

Introduction to Machine Learning in Pyspark

Data scientists who want to learn how to apply machine learning on big data using Pyspark

Aimed at:

Delivery method:

Interactive classroom training combining theory with practical exercises

Prerequisites:

Required preparation:

Basic understanding of linear regression, knowledge of programming basics (not necessarily in Python)

Bring a laptop with X2Go installed

3-4 hours

Foundation

Duration:

Skill level:

Prefered group size:

+/-10 participants per trainer

Course description

This training provides a general introduction to some basic concepts of Machine Learning in the context of logistic regression in Pyspark. It discusses the difference between linear and logistic regression, the algorithm underlying logistic regression, the bias-variance trade-off, and regularization. Participants then work through a Zeppelin notebook in which they apply the learned concepts to predict trial versus settlement outcomes in patent litigation.

Learning objective

Upon completion of this training, participants will be able to apply machine learning in a big data setting.