Data Science Foundations with Python
In a deeply connected world, data can help us to inform and empower. In this course, you will learn to use the Python programming language to gain insight from your information.
Data is the residue of every action that takes place in a company, with customers, and in the marketplace. It is created when customers buy products, users interact with services, and colleagues collaborate.
In an increasingly connected world, our ability to capture and leverage data has increased exponentially. We can track interactions, transactions, and encounters in real time; but data in the wrong hands is useless, if not dangerous. In the right hands, data can drive new insights and powerfully informed decisions. When combined with advances in artificial intelligence and machine learning, data can be transformational.
This course introduces fundamental techniques and technologies from data science, predictive analytics, and machine learning that can help you get a handle on the modern information flood. Using the Python programming language, you will:
 Learn analytics skills which will enable you to evaluate, query, and visualize data using open source tools: NumPy, Pandas, Matplotlib, Seaborn, scikitlearn, and Apache Spark.
 Leverage strategies to create datadriven questions that can provide scientific or business value
 Use methods for assembling data from multiple sources and preparing powerful machine learning (ML) models
 Be exposed to common machine learning techniques used to solve supervised and unsupervised problems
 Gain handson experience with techniques for deploying models as part of larger systems
Target Audience.
 Software engineers who are seeking to understand analytics and extend their skills.
 Data scientists and analysis who wish to work with data in Python.
 Recent college graduates and graduate students with experience in a data discipline seeking to use Python for data exploration, visualization, analysis, or machine learning.
Prerequisites.
Participants should have a working knowledge of Python and be familiar with core statistical concepts (variance, correlation, etc.). This course is meant for all levels of Python and Data Science backgrounds.
Objectives.
 Understand how Python fits into the data science ecosystem. How do Python tools such as NumPy, Pandas, Matplotlib, Seaborn, scikitlearn, and Apache Spark empower the analysis of data and machine learning.
 Learn strategies that can help formulate datadrive questions to provide scientific or business value.
 Learn how to analyze data with Jupyter and Python data tools: gather, filter, transform, explore, and visualize data.
 Gain handson experience in the creation of machine learning models and tools to assess their accuracy and performance.
Course Details
The data fundamentals course can be taught in one, three, four, or five day variations.
 The one day version focuses on working with data in Pandas with an introduction to machine learning techniques.
 The four day course includes data fundamentals and classical machine learning.
 The five day course includes data fundamentals, classical machine learning, and a full day of handson casestudies which further explore the techniques.
 The three day version omits formal coverage of Pandas (day 1) and excludes the case studies (day 5).

Outline

Day 1: Data Fundamentals

Day 2: Machine Learning Fundamentals

Day 3: Regression and Forecasting

Day 4: Unsupervised Learning

Day 5: Case Studies
Day 1: Data Fundamentals
Session objectives:
 Review the fundamental syntax and structure of Python
 Learn about Python libraries for working with and visualizing data
Modules:
 Introducing Python
 Practical Data Science
 Data Fundamentals
Day 2: Machine Learning Fundamentals
Session objectives:
 Introduce the core components of machine learning
 Demonstrate the practical steps required to prepare data and create models
 Show how classification algorithms can be used to predict outcomes of interest
 Discuss techniques for assessing a model's performance and accuracy
Modules:
 What is Machine Learning
 Machine Learning Algorithms
 Classification
Day 3: Regression and Forecasting
Session objectives:
 Show how regression can be used to estimate continuous targets
 Discuss time series and the unique considerations in their modeling
Modules:
 Regression
 Forecasting and Time Series
Day 4: Unsupervised Learning
Session objectives:
 Explore Principal Components Analysis (PCA) and other forms of dimensional reduction
 Show how clustering and other unsupervised algorithms can be used to gain insight into data
Modules:
 Dimensionality Reduction
 Principal Components Analysis (PCA)
 Clustering
Day 5: Case Studies
Session objectives:
 Show how machine learning can be applied to solve difficult data science challenges
 Show how tools such as deep learning can be used to work with unstructured data
Modules:
 Case Study: Natural Language Processing
 Deep Learning
Introducing Python
Introduce the Python programming language, its syntax, and core libraries that are used for working with data.
 Python Modules: Toolboxes
 Importing modules
 Listing methods
 Creating modules
 Python Syntax and Structure
 Core programming language structure
 functions
 object oriented programming
 Comprehensions and other syntactic niceties
 Python Data Science Libraries
 NumPy
 NumPy Arrays
 SciPy
 Pandas
 Python Dev Tools, Analytic Environments, and REPLS
 IPython
 Jupyter
 Jupyter Operation Modes
 Anaconda
Practical Data Science
Describe how the utilization of data is changing and the emergence of the “Data Scientist” or “a programmer who knows more statistics than a software engineer and more programming than a statistician.”
 How is data being used in innovative ways to ask new and interesting questions?
 What is Data Science?
 Data Science, Machine Learning, AI: What is the difference?
 Case Study: Applied Data Science at Google
 Case Study: Predictive Models in Advertising
 Case Study: Recommender Systems in ECommerce
 Data Analytics Lifecycle
 Discovery
 Harvesting
 Priming
 Exploratory Data Analysis
 Model Planning
 Model Building
 Validation
 Production Rollout
Data Fundamentals
Aggregating, repairing, normalizing, exploring, and visualizing data.
 Working with data in Python
 Importing data from external sources
 Dealing with missing data
 Dropping columns
 Interpolating missing data in Pandas
 Replacing data
 Scaling/normalizing data
 Exploratory Data Analysis and Visualization: Pandas, Matplotlib, and Plotly
 Transformation, validation, and interpretation
 Getting started with matplotlib and Seaborn
 Plotting Windows and Figures
 Distributions and variance:
 Show to represent a distribution in pictures (histogram and related charts) and numbers (summaries)
 Introduce outliers and describe the effect they might have on a distribution
 Variance: measuring the spread of a distribution
 Modeling distributions: normal, lognormal, and Pareto distributions
 Lab: Visualizing and Summarizing Distributions
 Analyzing Relationships
 Show how Pandas can be used to assess relationships among variables
 Visualizing relationships: scatterplots and beyond
 Measuring relationships: correlation and covariance
 Testing relationships: is it meaningful?
 Classical hypothesis testing: means, correlation, and proportions
 Demonstration: Analyzing Relationships
 Lab: More Relationship Analysis
 Data Grouping and Aggregation in Python
 Data aggregation and grouping
 pandas.core.groupby.SeriesGroupBy
 Grouping multiple columns
 Pivot Tables
 CrossTabulation
What is Machine Learning
 The Machines are Coming: Machine Learning and Artificial Intelligence
 What are machine learning and artificial intelligence?
 What are some ML techniques and how can they be used to solve business problems?
 Supervised versus unsupervised learning: what are the differences?
 Terminology and definitions
 Features and observations
 Labels
 Continuous and categorical features
 Practical Machine Learning
 Data preparation
 Model training
 Model validation and assessment
 scikitlearn: Estimators, Models, and Predictors
Machine Learning Algorithms
Introduce common machine learning algorithms and explore their use.
 Classification and Regression
 How do you build machine learning models to “make guesses” and “put things into buckets”
 Classification
 Regression
 Clustering and Principal Components Analysis
 Time Series
Classification
Building, tuning, and assessing classification models
 Classification Overview
 What is classification?
 When is it used?
 Creating Classification Models
 Logistic Regression
 Decision Tree Classifier
 KNearest Neighbors
 Gaussian Naive Bayes
 Support Vector Machines
 Random Forest
 Assessing Classification Models
 ROC Visualizations
 Confusion Matrices
 Precision Recall Curves
 Imbalanced Distributions
 Optimizing Classification Models
 Tuning Hyperparameters
 Answering the "Do I have enough data?" question
 Explaining Classification Model Results
 General Model Interpretation
 Linear Models
 Tree Models
 Model Interpretation using Shap
Regression
Building, tuning, and assessing regression models.
 Regression Overview
 What is regression?
 When is it used?
 Creating Regression Models
 Linear and Polynomial Regression
 Extreme Gradient Boost
 Regression Trees and Random Forests
 Extreme Gradient Boosting
 Assessing Regression Models
 R2, MAE, and MSE
 Residuals Plots
 Prediction Error Plots
 Optimizing Regression Parameters
 Explaining Regression Model Results
Forecasting and Time Series
Dimensionality Reduction
 Dimensionality Reduction Primer
 What is dimensionality reduction?
 What problems does it solve?
 When should it be used?
 Principal Components Analysis (PCA)
 What is PCA?
 How and why does it work?
 What are the results and what do they mean?
 Working with PCA in SciKit Learn
 Using PCA to Visualize and Understand Data
 How many components is optimal?
 Interpreting components
 Biplots
Clustering
 Clustering: Letting the Computer Tell Us About Differences
 KMeans
 Hierarchical Clustering
 Assessing Cluster Results
 Elbow Diagrams
 Visualizing Cluster Impacts on Data
 Silhouette Plots
 Data Exploration of Clusters
Case Study: Machine Learning and Natural Language Processing
Show how machine learning techniques can be applied alongside feature engineering to solve complex problems.
 Introduce Natural Language Processing, core constructs that can be used to work with human language.
 Explore computational models of human language that can be used for classification and clustering.
 Show how keyword extraction using NLP and data normalization can be used to locate patients who have a specific condition or disease.
Deep Learning
Introduce neural networks and their basic function.
 What is a deep neural network? How are they different from other types of machine learning techniques?
 What are the mathematical techniques behind neural networks? How do they work?
 How do we teach networks to “Learn”?
 What are some of the applications for these types of tools in healthcare, finance, and advertising?
Comments
Loading
No results found