Justin Cheong
Software & Data Engineer · Business Analytics @ NUS
Full-stack engineer and analyst focused on building the technical infrastructure required for systematic research. My experience ranges from architecting RAG-based search engines and data pipelines to developing regime-adaptive backtesting frameworks for equity strategies.
Experience
Where I've worked
- Architecting high-throughput time-series analytics pipelines using PostgreSQL and TimescaleDB to aggregate and process intraday DTCC Swap Data Repository (SDR) ticks
- Implementing dynamic time-bucketing and continuous aggregates for swap volume profiling, eliminating expensive data backfills and accelerating downstream DV01 risk reporting
- Migrating the desk's swap analytics layer from legacy C# jobs and Oracle to PostgreSQL, building C# data-access components and normalized schemas that join vendor SDR feeds with internal yield-curve DV01 analytics
- Building Python report pipelines that synthesize daily sell-side research into AI-generated market briefings delivered to the Fixed Income Relative Value trading desk
- Engineered a production-grade multi-agent LLM pipeline on Databricks to process daily sell-side research into consolidated macro briefings, utilized by 50+ traders daily and saving 2-3 hours of manual synthesis per reader
- Architected a tiered Bronze-Silver-Gold Delta Lake infrastructure applying SCD Type-2 patterns, ensuring ACID-compliant versioning and full historical traceability of data outputs
- Implemented end-to-end LLMOps using MLflow for complete observability and integrated the Instructor library with Pydantic to enforce typed schema outputs, eliminating runtime parsing errors
- Reduced AI search index rebuild time by 96% (26 min to under 1 min) by parallelizing the full ingestion pipeline — including Mistral OCR, Azure Document Intelligence, and OpenAI embeddings — using Python ThreadPoolExecutor with per-stage worker pools and a thread-safe sliding-window rate limiter
- Architected the downstream RAG retrieval engine using Azure AI Search, implementing a hybrid lexical-semantic fusion pipeline with a Cross-Encoder reranker to optimize search relevance by 25% (NDCG@10) and reduce P95 latency by 30%
- Design and backtest systematic and factor-based equity strategies for the student-managed portfolio
- Help maintain research and data infrastructure for screening, backtesting and performance attribution
- Optimized distributed Spark Structured Streaming pipelines to normalize multi-source datasets, reducing end-to-end data latency by 60% for critical risk monitoring
- Managed and deployed workflows using Airflow, automating complex maker-checker reconciliation and improving data availability by 30%
- Achieved an overall effectiveness rating of 4.7/5.0, surpassing the School of Computing faculty average (4.3) by ~9%
- Improved soft outcomes with a 4.6/5.0 score for enhancing students' industry readiness and team effectiveness
- Architected an AI-powered workflow (Python, Google STT, OpenAI) to extract signals from unstructured data, reducing manual processing time by 85% and enabling same-day data synthesis
- Performed statistical analyses in R to support technology and transformation initiatives for financial services clients
- Translated quantitative findings into recommendations for process optimisation and risk management
Projects
Things I've built
Architected a dual-path Retrieval-Augmented Generation (RAG) system utilizing FastAPI and PostgreSQL to index and query over 100k semi-structured financial research artifacts. Engineered a low-latency hybrid search engine combining HNSW dense vector indexing and BM25 full-text search (tsvector), orchestrated via Reciprocal Rank Fusion to maximize semantic and keyword recall. Designed an autonomous LLM tool-routing layer using Claude's tool-use API to dynamically execute complex SQL aggregations or semantic retrieval paths, accelerating intraday macro-economic analysis.
AI-powered routing engine that converts text, prompts and images into road-matched GPS routes using RAG (GPT-4o/Sentence-Transformers), iterative scaling on SVG paths and self-hosted OSRM with custom profiles to handle 100+ concurrent low-latency requests.
A two-sided community marketplace connecting vendors with surplus food to consumers. Engineered with Vue.js and Firebase, featuring a real-time order lifecycle (Pending → Pickup → Completed), automated inventory synchronization via Cloud Functions, and a 98% sprint velocity over two high-intensity Agile cycles.
A volatility-aware systematic trading system that dynamically switches between Trend-Following (Calm) and Mean-Reversion (Panic) modules. Engineered with a smoothed-VIX regime detector, achieving a 1.26 In-Sample Sharpe Ratio and 3.5% Max Drawdown through ATR-adjusted risk management and 200SMA trend filters.
Education & Awards
Academic record
Skills
Technical skills
Accountability
What I'm working towards
Live stats from my personal tracker
Contact
Get in touch
Open to software engineering and data roles. The fastest way to reach me is email.