open portfolio

About

Headshot of Ali Abouelazm

Junior Data Engineer shaping end-to-end data ecosystems: sourcing, cleansing, modeling, and delivering insights that accelerate experimentation and product decisions.

Blend software engineering, statistics, and ML/AI to transform messy datasets into reliable products: feature stores, experimentation platforms, forecasting services, and data apps used daily by partners.

Collaborate with product managers, data scientists, analytics engineers, and operations teams to frame ambiguous questions, design measurable roadmaps, and ship scalable solutions.

Obsessed with hardening data quality, documenting domain knowledge, and communicating the "so what" behind every model to drive action for both technical and business leaders.

Currently automating ingestion pipelines and evaluation dashboards for ML/AI teams.

  • Based in Sugar Land, TX
  • Open to internships in Data Science & ML/AI

Projects

Casual Marketing Impact

Python | DoubleML | Streamlit | Docker | GitHub Actions

An end-to-end application utilizing Double Machine Learning (DML) to isolate the causal impact of marketing spend, controlling for 20+ confounding variables to eliminate selection bias. Engineered a production MLOps pipeline with Docker and GitHub Actions to deploy an interactive Streamlit dashboard, automating model validation and reducing manual reporting time by 90%.

clinix.ai

Python | pandas | NumPy | scikit-learn | SQLAlchemy | FastAPI | Streamlit | OpenAI/Anthropic | SQLite | matplotlib

A RAG-based symptom extraction pipeline using the OpenAI API and PostgreSQL that achieved a 0.92 F1-score on patient risk classification. Architected SQL-based clinical feature pipelines to process patient data, utilizing Streamlit to visualize real-time triage patterns for medical staff.

PL Predictor

Python | pandas | NumPy | scikit-learn | XGBoost | BeautifulSoup | Selenium | Streamlit

A predictive modeling framework using XGBoost to forecast league standings; implemented 30+ custom features including ELO-based strength metrics. Automated match data ingestion using BeautifulSoup and Selenium, ensuring 99% data integrity through robust error-handling and validation scripts.

Localytics

Python | R | SQL | scikit-learn | GeoPandas | Tableau

A comprehensive market segmentation and geospatial analytics project combining demographic and behavioral data analysis. Applied clustering algorithms (K-means, hierarchical clustering) and regression models using scikit-learn to identify distinct customer segments and predict market trends. Leveraged GeoPandas for spatial analysis and geographic insights, processing geospatial datasets to uncover location-based patterns and correlations. Engineered features from demographic data including income levels, age distributions, and population density metrics. Built interactive dashboards in Tableau to visualize patterns, segment distributions, and geographic heatmaps, effectively communicating findings to stakeholders. Implemented data preprocessing pipelines to clean and standardize multi-source datasets, ensuring data quality and consistency across demographic and geographic dimensions.

Stockly

Python | pandas | NumPy | scikit-learn | TensorFlow/Keras | SQLite | matplotlib | Streamlit

A production-quality stock market prediction and backtesting system with SQLite-based data storage, comprehensive feature engineering, and multiple ML models. Implements a complete pipeline from data acquisition (Alpha Vantage API and CSV ingestion) through SQL schema design for prices, features, targets, and predictions. Engineered technical indicators including RSI, MACD, moving averages, and volatility measures. Trained both baseline classical models (Logistic Regression, Random Forest) and deep learning sequence models (LSTM/GRU) for next-day direction classification and return prediction. Built time-series-aware backtesting with rolling window evaluation, performance metrics, and strategy comparison against buy-and-hold. Features a deliberately retro, blocky pixel-style visualizer in Streamlit with square markers, step plots, and a bold color palette for displaying price charts, prediction signals, and cumulative returns.

SwimMatch

HTML | CSS | Web Development | Business Operations | Marketing

A private swim lesson platform connecting students with experienced instructors for personalized 1-on-1 coaching in backyard pools. Developed and maintained the SwimMatch website using HTML and CSS, improving user experience and reducing coach-student match time. Successfully coached 30+ students, leading to significant improvement in swimming skills and confidence. Generated $11,000 in monthly revenue within the first month of operation, showcasing strong business acumen and market demand. Implemented online booking and scheduling systems, increasing efficiency in managing coach-student sessions. Coordinated a team of swim coaches and created marketing campaigns through social media platforms, expanding the user base by 15% in the first two months.

Experience & Leadership

    Skills/Interests

    Languages

    Python

    R

    SQL

    Java

    JavaScript

    TypeScript

    C/C++

    HTML/CSS

    AI/ML

    scikit-learn

    XGBoost

    CatBoost

    LightGBM

    TensorFlow

    PyTorch

    Keras

    Transformers

    Data/Viz

    pandas

    NumPy

    SciPy

    Dask

    GeoPandas

    Statsmodels

    Matplotlib

    Seaborn

    Plotly

    Tableau

    Interests

    Traveling

    Soccer

    Swimming

    Philosophy

    Family

    Food

    Gym

    Resume

    My Life in Data

    GitHub

    Loading...

    Total Commits

    Projects

    8

    Completed Projects

    Technologies

    25+

    Tools & Languages

    Experience

    2+

    Years in Data Science

    Daily Routine

    Learning Progress

    Tech Stack Usage

    My Typing Rhythm