Projects

A set of selected research and engineering projects.
Descriptions are aligned with the corresponding entries in my CV.


1. Tensor-Core-Compatible Optimization for CP Decomposition

(Machine Learning Systems Lab, Advisor: Prof. Euhyun Moon)

Topic: Tensor-Core-Compatible Optimization for CP Decomposition

  • Investigated CP Decomposition, a tensor factorization method typically solved using the CP-ALS algorithm.
  • Identified a major computational bottleneck in CP-ALS caused by the MTTKRP (Matricized Tensor Times Khatri-Rao Product) operation, which heavily depends on the Khatri-Rao product.
  • Investigated a reformulation that converts the Khatri-Rao product into a matrix multiplication, enabling Tensor Core compatibility.

Code:
Code


2. CUDA Matrix Multiplication Optimization

Tools: C/C++, CUDA

  • Learned and applied optimization techniques such as:
    • global memory coalescing
    • shared memory cache-blocking
    • 1D blocktiling for calculating multiple results per thread
    • increasing arithmetic intensity via 2D blocktiling
  • Achieved near-cuBLAS performance.

Code:
Code