Week 1: Python Programming for Data Science
Environment & Foundations: Setting up Anaconda, Jupyter Notebooks, and VS Code; managing virtual environments and dependencies.
Data Structures Deep-Dive: Efficient use of Lists, Dictionaries, and Sets; understanding Big O notation for data manipulation.
Vectorized Computing with NumPy: N-dimensional arrays, broadcasting rules, and mathematical operations for high-performance computing.
Data Manipulation with Pandas: Series and DataFrames; indexing, slicing, and filtering; handling missing data and time-series analysis.
Functional Programming: Lambda functions, map/filter/reduce, and list comprehensions for cleaner, more “Pythonic” data pipelines.
Week 2: Data Visualization and Exploratory Data Analysis (EDA)
Statistical Plotting with Matplotlib: The Artist layer vs. Scripting layer; customizing axes, labels, and subplots for publication-quality figures.
High-level Viz with Seaborn: Heatmaps, pair plots, and violin plots to identify underlying distributions and feature correlations.
Interactive Dashboards: Introduction to Plotly and Dash for creating dynamic, web-based data visualizations.
The EDA Framework: Identifying outliers, skewness, and multi-collinearity; feature engineering and transformation (scaling, encoding).
Storytelling with Data: Best practices for visual communication and extracting “actionable insights” from raw datasets.
Week 3: Machine Learning Modeling I (Supervised Learning)
Regression Analysis: Linear and Polynomial regression; loss functions ($MSE$, $MAE$); and regularization techniques (Lasso/Ridge).
Classification Fundamentals: Logistic Regression, Support Vector Machines (SVM), and K-Nearest Neighbors (KNN).
Tree-based Models: Decision Trees, Random Forests, and Gradient Boosting (XGBoost/LightGBM) for complex non-linear relationships.
Model Evaluation: Using Scikit-Learn for cross-validation, confusion matrices, ROC-AUC curves, and F1-score optimization.
Pipeline Automation: Creating end-to-end ML pipelines for automated preprocessing and hyperparameter tuning (GridSearchCV).
Week 4: Machine Learning Modeling II (Unsupervised & Advanced)
Clustering Algorithms: K-Means clustering, Elbow method for $K$ selection, and Hierarchical/DBSCAN for spatial data.
Dimensionality Reduction: Principal Component Analysis (PCA) and t-SNE for visualizing high-dimensional data and noise reduction.
Association Rule Learning: Market Basket Analysis using Apriori and Eclat algorithms to find hidden patterns in transactions.
Deployment & Model Ops: Saving models with Pickle/Joblib; basics of Flask/FastAPI for model serving; monitoring for “model drift.”
Capstone Project: End-to-end data science workflow—from scraping/loading a real-world dataset to deploying a predictive model with documented insights.









Reviews
There are no reviews yet.