Subgraph Matching for High-Throughput DNA-Aptamer Secondary Structure Classification and Machine Learning Interpretability (Case No. 2025-104)

Intro Sentence:

UCLA researchers in the Department of Mathematics have developed machine learning methods to rapidly identify novel aptamer sequences for target binding to accelerate highly-accurate diagnostic and therapeutic development.

Background:

Aptamers are single-stranded nucleotide polymers that bind with high affinity to targets such as cells and proteins, creating unique secondary structures. Their small size, reproducible chemical synthesis, biocompatibility, low immunogenicity, and structural stability make aptamers valuable for fields such as biosensing and therapeutics. The ability to generate aptamers specific to desired molecules allows researchers to develop highly accurate and sensitive sensing technologies, with applications spanning diagnostics, wearable sensors, and environmental monitoring. These aptamers are typically identified from libraries of more than one billion DNA sequences. One current industry standard, Mfold, is only capable of analyzing one sequence at a time, making it inefficient when billions of sequences are involved. Another earlier standard, Seqfold 2.0, offers limited speed and accuracy for aptamer sequence identification. Additionally, these methods lack real-time feedback, which hinders research progress. To address the limitations in current aptamer identification, a novel computational method capable of high-throughput analysis and real-time feedback is needed.

Innovation:

UCLA researchers have developed GMfold, a powerful machine learning-driven pipeline for high-throughput aptamer structure prediction and screening. Gmfold enables the rapid analysis of large-scale DNA aptamer libraries by integrating subgraph matching with machine learning, significantly outperforming prior methods in both speed and predictive accuracy. Gmfold represents the research team's primary innovation for practical use and commercialization, whereas Seqfold 2.0 served mainly as an academic exercise to explore graph-based improvements and correct earlier Seqfold versions. Gmfold also competes favorably with industry standards such as Mfold and NUPACK, offering improved throughput and accuracy, particularly when analyzing large batches of sequences. Its real-time feedback capability and structural motif clustering enable researchers to rapidly pinpoint high-value aptamers that correlate with strong target affinity—an advantage not currently offered by standard tools. Furthermore, GMfold’s recent RNA folding capabilities expand its utility across diverse biological applications and are the subject of an upcoming publication. The platform is particularly well-suited for integration with high-throughput screening methods such as SELEX, and it uniquely supports batch-scale machine learning analysis of aptamer libraries—a feature not available in existing commercial tools. Released under a permissive BSD v3 open-source license, Gmfold presents a versatile and scalable solution for accelerating aptamer discovery in diagnostics, therapeutics, biosensing, and environmental monitoring.

Potential Applications:

Therapeutics
Drug discovery and development
Diagnostics
Biosensing
Gene regulation
Environmental monitoring
Food safety
Contract research advancement
ML enablement for R&D
RNA identification

Advantages:

Computational Efficiency
- quick and less resource intensive
- can handle large libraries
Real-time feedback
Enhanced classification
Accuracy
Competes favorably with Gmfold and NUPACK
Can be used to improve Seqfold 2.0 code

Development-To-Date:

First description of complete invention; pre-print of the technology available at https://doi.org/10.1016/j.mbs.2025.109485

Related Publications:

Paolo Climaco, Noelle M. Mitchell, Matthew Tyler, Kyungae Yang, Anne M. Andrews, Andrea L. Bertozzi, GMFOLD: Subgraph matching for high-throughput DNA-aptamer secondary structure classification and machine learning interpretability, Mathematical Biosciences, 2025, 109485, ISSN 0025-5564, https://doi.org/10.1016/j.mbs.2025.109485.

J. D. Moorman, T. K. Tu, Q. Chen, X. He and A. L. Bertozzi, "Subgraph Matching on Multiplex Networks," in IEEE Transactions kon Network Science and Engineering, vol. 8, no. 2, pp. 1367-1384, 1 April-June 2021, doi: 10.1109/TNSE.2021.3056329. keywords: {Silicon;Switched mode power supplies;Multiplexing;Image edge detection;Social networking (online);Databases;Space exploration;Graph isomorphism;graph matching;multiplex network;subgraph isomorphism;subgraph matching.},

Reference:

UCLA Case No. 2025-104

GitHub Repository:

https://github.com/PaClimaco/GMfold/

Lead Inventor:

Andrea Bertozzi