Low-Latency Decoder for the Bahl, Cocke, Jelinek, Raviv Algorithm (Case No. 2025-114)

Summary

UCLA researchers have developed a low-latency, high-throughput GPU-based implementation of the BCJR (Bahl-Cocke-Jelinek-Raviv) algorithm for decoding the CCSDS Serially-Concatenated Convolutionally-Coded Pulse-Position Modulation (SCPPM) code. The innovation reduces decoding latency from linear in the number of trellis stages to logarithmic in the number of stages, enabling much faster decoding suitable for deep‐space optical communications and similar high-photon-efficiency channels.

Background

  • The SCPPM code is used in space optical communication (e.g. NASA’s Deep Space Optical Communications, DSOC) and involves many trellis stages (e.g. thousands). Traditional BCJR decoding must traverse the trellis both forward and backward stage by stage, which incurs latency proportional to the number of stages.

  • For communications in photon-limited or high-delay channels, decoding latency and throughput are critical. FPGA implementations of BCJR are possible but may not scale easily or offer flexibility. CPU implementations are too slow for many stages. There is a strong need for decoders that can work in real time, with very low delay, yet maintain error performance.

Innovation

  • They propose a parallel “trellis reduction” method, where pairs of trellis stages are combined into meta-stages, successively reducing the number of stages in a way that permits computation of BCJR forward/backward metrics (α and β) in parallel, in a tree-like fashion.

  • The algorithm organizes computations so that many α(s) and β(s) values (forward and backward path metrics) can be computed simultaneously in O(log N) steps (where N = number of trellis stages), rather than O(N).

  • Implementation is done using CUDA on NVIDIA GPUs: careful memory hierarchy management, shared memory usage, thread synchronization, and stage combining allow high parallelism.

  • The decoder preserves the same decoding error performance (codeword error rate, etc.) as the traditional BCJR algorithm while dramatically reducing runtime.

Advantages

  • Much lower latency: decoding time scales with the logarithm of trellis length rather than linearly.

  • High throughput: able to handle large trellis lengths (e.g. thousands of stages) with shorter runtime.

  • Comparable error performance to standard BCJR (i.e. no degradation in codeword error rate).

  • Flexibility and portability: implemented on GPU (CUDA), which allows usage on different GPU models; easier development than for FPGA in many cases.

  • Scalable with increased hardware resources (more streaming multiprocessors, more shared memory).

  • Useful for deep-space optical communications or other photon-limited channels where SCPPM is relevant.

Potential Applications

  • Space communication systems (e.g. NASA DSOC, deep-space optical links) where SCPPM is used or similar concatenated codes are used under tight latency constraints.

  • Ground receivers or optical communication terminals requiring fast decoding of photon-limited signals.

  • Any environment where trellis-based decoders (BCJR) with many stages are used and latency is a bottleneck (optical comms, free-space optics, maybe even some quantum communication decoding scenarios).

  • Simulation platforms or testbeds where faster decoding allows more extensive error-rate testing / parameter sweeps.

  • Communication systems in other challenging channels (e.g., underwater optics, low SNR wireless) where efficient, low-latency decoding offers benefit.

Development to Date

  • The paper “Parallel Trellis-Stage-Combining BCJR for High-Throughput CUDA Decoder of CCSDS SCPPM” describes a full implementation in CUDA on GPUs, including detailed simulation results. UCLA Seas

  • Demonstrated that for very large trellis lengths (e.g. 2^14 = 16384 stages) the GPU version achieves up to ~15× runtime reduction versus CPU implementation. UCLA Seas

  • Error rate performance (frame/codeword error rate) matches baseline CPU+BCJR implementation. UCLA Seas

Publications / Reference

  • Antonini, A.; Glukhov, E.; Wesel, R.; Divsalar, D.; Hamkins, J. “Parallel Trellis-Stage-Combining BCJR for High-Throughput CUDA Decoder of CCSDS SCPPM.” UCLA / JPL. 2025. DOI or formal publication details pending. UCLA Seas

Patent Information:
For More Information:
Joel Kehle
Business Development Officer
joel.kehle@tdg.ucla.edu
Inventors:
Richard Wesel
Amaael Antonini
Egor Glukhov
Dariush Divsalar