Summary:
UCLA researchers in the Department of Electrical and Computer Engineering have developed a high-speed runtime reconfigurable processor array (RTRA) that enables on-chip scheduling and rapid multi-program execution with unprecedented energy and area efficiency for dynamic computing workloads.
Background:
Dynamic digital signal processing and machine learning workloads demand high performance, energy efficiency, and low-latency hardware reconfiguration. However, existing reconfigurable architectures and scheduling solutions rely on slow off-chip software scheduling and mapping. These systems are constrained by inherently slow and non-scalable two-dimensional polygon placement algorithms, which limit real-time adaptability and parallel task execution. As a result, current designs fail to fully exploit the spatial and temporal parallelism offered by reconfigurable processor arrays. Thus, there is an unmet need for a partitioning system capable of achieving superior scheduling and mapping performance with high resource utilization and multi-granularity multi-program tenancy.
Innovation:
Professor Dejan Markovic and his research team have developed a runtime reconfigurable processor array (RTRA) equipped to perform on-chip scheduling in just four clock cycles, program switching in sixteen to sixty-four cycles, and complete array mapping in only 384 cycles, in less than 1 microsecond. The architecture supports up to twenty-four concurrent programs, combining low-latency reconfiguration with exceptional energy and area efficiency. The system is thus particularly suitable for low-power, multi-program, and fine-grain acceleration. The unidirectional interconnect design achieves a 1.3x shorter critical path than comparable bidirectional processing-element architectures, while scheduler complexity is reduced via efficient fusion for large program mappings. Compared to prior systems, RTRA delivers 7.5x higher energy efficiency than state-of-the-art multi-core CPUs and 5x higher efficiency than FPGA-based solutions. The dense PE array achieves a computational density of 123 GOP/mm2 , surpassing CPUs by 8x and FPGAs by 6x. Relative to CGRA architectures, RTRA improves energy efficiency by 2-3x and area efficiency by 1.78-6.72x, marking a significant advancement in scalable, high-speed reconfigurable computing.
Additionally, the inventors have developed two complimentary inventions: Circular Elevator Style Network-on-Chip (NoC) (Case No. 2026-012) and Pattern Compilation for Runtime Reconfigurable Arrays (Case No. 2026-013). To obtain an optimal balance between programmability and performance of processors, the inventors have introduced Configurable Architecture Design Automation (CADA). This solution allows users to automate the design of reconfigurable architectures by simultaneously optimizing both software and hardware needs. CADA can utilize a software code as input and output an optimized hardware solution that suits the demands of the specific application. This design optimization is especially suited for applications such as autonomous vehicle navigation, Internet of Things (IoT), machine learning, and advanced robotics, where device efficiency and flexibility must be optimized. The advances in this innovation may overcome traditional limitations in state-of-the-art CPUs, GPUs, application-specific integrated circuits (ASICs) and field-programmable gate arrays (FPGAs). The inventors report that CADA achieves a 6-18 fold improvement in area efficiency and 5-8 fold higher energy efficiency across machine learning and digital signal processing domains, while also significantly reducing programming latency. This solution offers a flexible and scalable design methodology to improve software and hardware imbalances in advanced circuit design for a wide array of emerging applications.
Potential Applications:
- Machine learning and AI accelerators
- Digital signal and image processing (DSP)
- Edge and embedded computing
- Real-time data analytics
- Reconfigurable hardware systems
Advantages:
- Ultra low-latency reconfiguration
- Supports dynamic, concurrent workloads
- Flexible and efficient resource mapping
- Simplified scheduling architecture
- High energy and area efficiency
State of Development:
First successful demonstration of the invention: February 2025
Related Publications:
- ASSCC2025 manuscript (under review)
- Thesis: CADA: Configurable Architecture Design Automation Framework; https://escholarship.org/uc/item/9wr9c448
Reference:
UCLA Case Nos. 2026-011
Circular Elevator Style Network-on-Chip (NoC) (Case No. 2026-012)
Pattern Compilation for Runtime Reconfigurable Arrays (Case No. 2026-013)
Inventors:
Dejan Markovic, Chekai (Tim) Ling, Hongseok Lee