About Data Analysis with Python
Python is a general-purpose programming language that’s powerful, easy to learn and fast to code. .
It is used for many different applications. Over time, the huge community around this open source language has created a number of tools specifically for data science. As a result, analyzing data with Python has never been easier, and it is rapidly becoming the language of choice for the Data scientists. Python code can be written like a traditional program, to execute an entire series of instructions at once; it can also be executed line by line or block by block, making it perfect for working with data interactively.
In this practical course you’ll learn everything you need to get you started using Python for data analysis. You will start with basic arithmetic and variables, and data structures, move on to how to load data from different sources, rearrange and aggregate it, and finally you will create your own stunning visualizations based on real data.
The course focuses on following topics.
- Python Language Fundamentals
- Python Data Science Tools (Numpy)
- Panda’s Series & DataFrame
- Plotting & Visualization (matplotlib)
What we offer
Case Studies/Projects based training.
Flexible class timing, continuous support.
Unlimited Access to Exclusive study materials on Cloud.
Placement assistance after training.
State-of-Art Labs with latest Infrastructure
- Python via anaconda
- Jupyter Notebook
Chapter 1: Python Language Fundamentals
- Variables, Data Types and Basic Syntax
- Operators and Expressions
- Control Flow & Loops
- Importing and Using Packages
- Data Structures (lists, tuples, sets, dictionaries)
- Functions, Objects and Classes
- Strings and Dates
- File Handling
- Iterators, Generators and Decorators
Chapter 2: The Python Data Science Toolbox
- Usage of NumPy for scientific computing with Python.
- Usage of NumPy as an efficient multi-dimensional container of generic data.
- Data manipulation and analysis through Pandas, a software library
- Pandas data structures and operations for manipulating numerical tables and time series.
- Pandas idioms
- Indexing and Missing Data
- Merging, Grouping and Pivot Tables
- Plotting & Visualization (matplotlib): Matplotlib is a Python 2D plotting library which produces publication quality figures in a variety of hardcopy formats and interactive environments across platforms.
Chapter 3: Advanced Packages
- SciPy: SciPy is a Python-based ecosystem of open-source software for mathematics, science, engineering and technical computing. SciPy builds on the NumPy array object.
- Scikit-Learn: Scikit-learn are a machine learning library for the python programming language. It features various classification, regression and closeting algorithms.
- Statsmodels: is a Python module that allows users to explore data, estimate statistical models, and perform statistical tests. An extensive list of descriptive statistics, statistical tests, plotting functions, and result statistics are available for different types of data and each estimator.
Chapter 4: Descriptive Statistics
- Central Tendency: Identify situations in which the center of a distribution is valuable, different ways the center of a distribution can be calculated for symmetric and asymmetric distributions, handling outliers
- Measures of Variability: How the spread of random variables can be meaningfully summarized in the context of the distribution shape, handling bias in estimators
- Distributions of Random Variables: Computing and summarizing distribution of discrete and continuous random variables, parametric fitting.
- Correlations: Auto and cross-correlations between random variables, correlation matrix, correlation function, relationship with statistical dependence and causation
Chapter 5: Supervised Learning
- Classification: Introduction to frequentist and Bayesian algorithms such as linear discriminant, k nearest neighbors, decision tree, naïve Bayes classifier.
- Regression: Introduction to linear and logistic regression extending to generalized linear models, clarifying the notions of principal components and diagnostics
Chapter 6: Unsupervised Learning
- Clustering: Hierarchical and centroid based clustering (k-means), distance and similarity metrics, the problem of high dimensions, information theoretic methods
We have various case studies based on different industries. You can choose the case study as applicable for you.
Case Study 1: Regression Analysis
How to assess if you are paying correct price or not while buying a property?
Price is very important function for any business. Correct price can create a real gap between profit and loss. In this case study we will take an example of property pricing to gain a deeper understanding of regression analysis.
Step – 1: Data Preparation
A. Checking the outlier
B. Checking Missing Values and how to treat them.
C. Basic bivariate and univariate analysis i.e. checking correlations, how the variables are distributed.
Step – 2: Principle Component Analysis
Step – 3: Traditional Regression Analysis with variable selection
Case Study 2: Marketing Analytics
Being a key decision and strategy maker on an online retail store that specializes in apparel and clothing, how by establishing analytics practice opportunity to improve PnL could be figured out. Background of behavioural analytics – How human brains follow involuntary pattern (behave like other similar people around them) and the detection of the pattern is preciously the idea behind marketing analytics.
Step – 1: EDA – Exploratory Data Analysis
A. Exploring different patterns i.e. distribution of the customers across the number of product categories purchased by each customer.
B. Why the customers buying different product categories
C. Categorization of customers based on the # of product category they purchased.
D. Which category is contributing highest sales?
Step – 2: Association Analysis
E. Support/Confidence/Lift – Apriori concept
F. Market Basket Analysis
Step – 3: Customer Segmentation
Case Study 3: Score Card ModelLing
Given the on-going turmoil on credit markets, a critical re-assessment of credit risk modelling approaches is more than ever needed. This modelling approach generates some probability of default score for each customer on basis of some collection of independent variables (it may differ as per business requirements). After that it is usable for predictive modelling, MIS reporting etc.
Step – 1: EDA – Exploratory Data Analysis
A. Data import and basic data sanity check.
B. Exploring different patterns i.e. distribution of data
C. Variables (categorical & numerical) selection approaches.
D. Training and validation data creation.
Step – 2: Model Preparation
E. Creating indicator variables
F. Apply step wise regression
Step – 3: validation of model
G. Check for multi Collinearity (using correlation matrix, VIF)
H. Generate Score using logistic regression.
I. KS calculation
J. Coefficient validation, coefficient stability and score stability.
Case Study 4: Web Scrapping & Text Analysis
The rapid growth of the World Wide Web over the past two decades tremendously changed the way we share, collect, and publish data. Firms, public institutions, and private users provide every imaginable type of information and new channels of communication generate vast amounts of data on human behavior. Web scrapping is a process to extract data from websites and applying some text analysis algorithms to analyze these data. Twitter analysis, google data analysis etc.
Step – 1: Setup connection
A. Create a key against developer account.
B. Run API request to fetch data.
Step – 2: Data Extraction
C. Save API requested data into excel/csv.
D. Data analysis and sanity check (dealing with missing data)
Step – 3: Text mining
E. Apply diff-2 algorithms like: sentiment analysis.
In-house Faculty/Consultant (SAS, Python)
In-house Faculty/Consultant (SAS, Python,R,SPSS)