Accurate Compute-In-Memory Accelerator With Multi-Level Analog Weight Storage (Case No. 2026-169)

Summary:

UCLA researchers in the Department of Electrical and Computer Engineering have developed a high-density, multi-level CTT-based compute-in-memory architecture that enables precise, energy-efficient analog computation directly within memory for high-performance computing.

Background:

Conventional digital processors for artificial intelligence (AI) workloads are fundamentally limited by the memory wall, where data movement between memory and compute units dominates energy and latency. This challenge is compounded by the need for increasingly dense weight storage for large-scale models. Compute-in-memory (CIM) architectures address this by performing computation directly within the memory array. Despite advancements, existing CIM technologies based on resistive RAM, Flash, or ferroelectric devices often rely on non-standard fabrication processes and suffer from device mismatch, limiting computational accuracy and scalability. Charge trap transistor (CTT)-based CIM architectures offer improved CMOS compatibility, but current approaches remain constrained by limited storage density, restricted dynamic range, and low precision (=5 bits), along with suboptimal energy and area efficiency. As a result, there remains an unmet need for a high-density, high-precision, and energy-efficient CIM architecture capable of overcoming memory wall limitations while supporting scalable AI workloads.

Innovation:

Professor Sudhakar Pamarti and his team have developed a multi-level CTT-based CIM architecture that enables high-precision analog multiply-accumulate (MAC) operations directly within memory using transistor-level weight storage. Each device simultaneously performs storage and multi-bit analog multiplication, supporting up to 6- bit discrete levels with effective precision approaching 8 bits while maintaining compact footprint. The architecture achieves a high dynamic range (around 16 bits) and significantly improves efficiency (100 TOPS/W and 10 TOPS/mm2), while requiring only three analog-to-digital converter (ADC) evaluations per macro-operation, compared to 64 in conventional designs, greatly reducing overhead. Its highly linear design, combined with calibration techniques that compensate for device and circuit non-idealities, enables near digital-like accuracy. The approach is also fully compatible with standard CMOS processes, eliminating the need for exotic materials or additional fabrication steps and enabling scalable, high-density integration. These improvements in precision, efficiency, and density position the technology to significantly advance next-generation AI hardware by overcoming key limitations of existing CIM systems.

Potential Applications:

●   CNN or DNN acceleration for AI workloads
●   General analog signal processing
●   Edge computing and inference accelerators
●   Scientific and numerical computation
●   Energy-efficient embedded computing
●   Data center optimization

Advantages:

●   High-density weight storage
●   Multi-bit analog computation in-memory
●   High dynamic range
●   Near digital-like accuracy
●   Energy-efficient
●   CMOS-compatible

Status of Development:

First description of complete invention 03/2024.

Related Publications:

1. S. Qiao, S. Moran, D. Srinivas, S. Pamarti and S. S. Iyer, "Demonstration of Analog Compute-In-Memory Using the Charge-Trap Transistor in 22 FDX Technology," 2022 International Electron Devices Meeting (IEDM), San Francisco, CA, USA, 2022.
2. "Neural network system with neurons including charge-trap transistors and neural integrators and methods therefor," S. L. Moran, S. S. Iyer, Z. Wan, S. Pamarti, US Patent Application US 2024/0028884 A1, Jan 25, 2024

Reference:

UCLA Case No. 2026-169

Lead Inventor:

Sudhakar Pamarti, Faculty, Department of Electrical and Computer Engineering