Python Analytics

Course Duration : 70 hrs + Case Study
20,000

About Data Analysis with Python

Python is a general-purpose programming language that’s powerful, easy to learn and fast to code. .
It is used for many different applications. Over time, the huge community around this open source language has created a number of tools specifically for data science. As a result, analyzing data with Python has never been easier, and it is rapidly becoming the language of choice for the Data scientists. Python code can be written like a traditional program, to execute an entire series of instructions at once; it can also be executed line by line or block by block, making it perfect for working with data interactively.


Course Overview

In this practical course you’ll learn everything you need to get you started using Python for data analysis. You will start with basic arithmetic and variables, and data structures, move on to how to load data from different sources, rearrange and aggregate it, and finally you will create your own stunning visualizations based on real data.
The course focuses on following topics.

  • Python Language Fundamentals
  • Python Data Science Tools (Numpy)
  • Panda’s Series & DataFrame
  • Plotting & Visualization (matplotlib)

 


What we offer

Training under the guidance of 20+ years experienced Data Scientist with post graduation from IIT, PhD from Boston University, and 40+ research papers on Data Science.
After training, Internship at our Development Partner’s house (Ideal Analytics/ ArcVision) in real-time/live project work.
Case studies on real industry data
Classroom training with flexible timing
Customized/On-demand training
Unlimited access to exclusive Study Materials on Cloud

Installation

  • Python via anaconda
  • Jupyter Notebook

 

Chapter 1: Python Language Fundamentals

Topic:1

  • Variables, Data Types and Basic Syntax
  • Operators and Expressions
  • Control Flow & Loops
  • Importing and Using Packages

(Assignment 1)

Topic:2

  • Data Structures (lists, tuples, sets, dictionaries)
  • Functions, Objects and Classes

(Assignment 2)

Topic:3

  • Strings and Dates
  • File Handling
  • Iterators, Generators and Decorators
  • Modules

 

Chapter 2: The Python Data Science Toolbox

Topic:1

  • Usage of NumPy for scientific computing with Python.
  • Usage of NumPy as an efficient multi-dimensional container of generic data.

(Assignment 1)

Topic:2

  • Data manipulation and analysis through Pandas, a software library
  • Pandas data structures and operations for manipulating numerical tables and time series.

(Assignment 2)

Topic:3

  • Pandas idioms
  • Indexing and Missing Data
  • Merging, Grouping and Pivot Tables

(Assignment 3)

Topic:4

  • Plotting & Visualization (matplotlib): Matplotlib is a Python 2D plotting library which produces publication quality figures in a variety of hardcopy formats and interactive environments across platforms.

(Assignment 4)

 

Chapter 3: Advanced Packages

  • SciPy: SciPy is a Python-based ecosystem of open-source software for mathematics, science, engineering and technical computing. SciPy builds on the NumPy array object.
  • Scikit-Learn: Scikit-learn are a machine learning library for the python programming language. It features various classification, regression and closeting algorithms.
  • Statsmodels: is a Python module that allows users to explore data, estimate statistical models, and perform statistical tests. An extensive list of descriptive statistics, statistical tests, plotting functions, and result statistics are available for different types of data and each estimator.

 

Chapter 4: Descriptive Statistics

Topic:1

  • Central Tendency: Identify situations in which the center of a distribution is valuable, different ways the center of a distribution can be calculated for symmetric and asymmetric distributions, handling outliers
  • Measures of Variability: How the spread of random variables can be meaningfully summarized in the context of the distribution shape, handling bias in estimators

(Assignment 1)

Topic:2

  • Distributions of Random Variables: Computing and summarizing distribution of discrete and continuous random variables, parametric fitting.
  • Correlations: Auto and cross-correlations between random variables, correlation matrix, correlation function, relationship with statistical dependence and causation

(Assignment 2)

 

Chapter 5: Supervised Learning

Topic:1

  • Classification: Introduction to frequentist and Bayesian algorithms such as linear discriminant, k nearest neighbors, decision tree, naïve Bayes classifier.

(Assignment 1)

Topic:2

  • Regression: Introduction to linear and logistic regression extending to generalized linear models, clarifying the notions of principal components and diagnostics

(Assignment 2)

 

Chapter 6: Unsupervised Learning

  • Clustering: Hierarchical and centroid based clustering (k-means), distance and similarity metrics, the problem of high dimensions, information theoretic methods

We have various case studies based on different industries. You can choose the case study as applicable for you.

Case Study 1: Regression Analysis

How to assess if you are paying correct price or not while buying a property?
Price is very important function for any business. Correct price can create a real gap between profit and loss. In this case study we will take an example of property pricing to gain a deeper understanding of regression analysis.

Step – 1: Data Preparation
A. Checking the outlier
B. Checking Missing Values and how to treat them.
C. Basic bivariate and univariate analysis i.e. checking correlations, how the variables are distributed.
Step – 2: Principle Component Analysis
Step – 3: Traditional Regression Analysis with variable selection

 

Case Study 2: Marketing Analytics

Being a key decision and strategy maker on an online retail store that specializes in apparel and clothing, how by establishing analytics practice opportunity to improve PnL could be figured out. Background of behavioural analytics – How human brains follow involuntary pattern (behave like other similar people around them) and the detection of the pattern is preciously the idea behind marketing analytics.

Step – 1: EDA – Exploratory Data Analysis
A. Exploring different patterns i.e. distribution of the customers across the number of product categories purchased by each customer.
B. Why the customers buying different product categories
C. Categorization of customers based on the # of product category they purchased.
D. Which category is contributing highest sales?
Step – 2: Association Analysis
E. Support/Confidence/Lift – Apriori concept
F. Market Basket Analysis
Step – 3: Customer Segmentation
A. Classification/Clustering

 

Case Study 3: Score Card ModelLing

Given the on-going turmoil on credit markets, a critical re-assessment of credit risk modelling approaches is more than ever needed. This modelling approach generates some probability of default score for each customer on basis of some collection of independent variables (it may differ as per business requirements). After that it is usable for predictive modelling, MIS reporting etc.

Step – 1: EDA – Exploratory Data Analysis
A. Data import and basic data sanity check.
B. Exploring different patterns i.e. distribution of data
C. Variables (categorical & numerical) selection approaches.
D. Training and validation data creation.
Step – 2: Model Preparation
E. Creating indicator variables
F. Apply step wise regression
Step – 3: validation of model
G. Check for multi Collinearity (using correlation matrix, VIF)
H. Generate Score using logistic regression.
I. KS calculation
J. Coefficient validation, coefficient stability and score stability.

 

Case Study 4: Web Scrapping & Text Analysis

The rapid growth of the World Wide Web over the past two decades tremendously changed the way we share, collect, and publish data. Firms, public institutions, and private users provide every imaginable type of information and new channels of communication generate vast amounts of data on human behavior. Web scrapping is a process to extract data from websites and applying some text analysis algorithms to analyze these data. Twitter analysis, google data analysis etc.

Step – 1: Setup connection
A. Create a key against developer account.
B. Run API request to fetch data.
Step – 2: Data Extraction
C. Save API requested data into excel/csv.
D. Data analysis and sanity check (dealing with missing data)
Step – 3: Text mining
E. Apply diff-2 algorithms like: sentiment analysis.

Arnab Majumdar

In-house Faculty/Consultant (SAS, Python)

Arnab Majumdar, Data Scientist Consultant, is a physicist, researcher and educator. He completed his Integrated M.Sc. in Physics from the Indian Institute of Technology, Kanpur before moving to Boston, USA, where he did his Ph.D. and post-doctoral work at Boston University. During his seventeen years in academic research, primarily in the domain of statistical physics, Econophysics and Biomedical engineering, he has published over forty research papers in international peer-reviewed journals including Nature, Physical Review Letters and Proceedings of the National Academy of Sciences USA.

Anindya Kundu

In-house Faculty/Consultant (SAS, Python,R,SPSS)

Anindya Kundu is into qualitative and quantitative analytics consultancy for more than half a decade. He has been involved in both analytics as a service and analytics product development projects. Anindya is a data obsessed person who loves generating insights from large quantities of data - clean, process, harness data to get hidden truth. He uses his SAS/R/SPSS/Python tool implementation capability to analyze data, and also perform automation. He is involved in analytics innovation, specializing in product development for population health management, health economics, insurance and mortgage, healthcare analytics and transportation – supply chain management.  He was also extensively involved in functional development of CRM (Customer retention module) application tool for a Fortune's Best 100 Companies. He has received his post graduate certification from IIM Ranchi.