Peptide drug development with symbolic regression

An example of peptide drug featurization and modelling using anticancer peptides.

The conventional methods of cancer treatment, chemotherapy and radiotherapy, are still the main methods of treatment against cancer, although they have devastating side effects [1]. They kill not only the dangerous cells but also the cells around them, causing hair loss, pain, fatigue and nausea [1].

New treatments are being developed that specifically target the cancer cells and thereby reducing toxicity to other cells: One example is peptides. Peptides are also less toxic because they are easily broken down into their building blocks (amino acids) by the body’s own enzymes. However, peptide treatments need further development to improve target efficiency and reduce production costs [2].

Symbolic regression models can be used to search and select relevant chemical and structural properties of peptides so that existing drugs can be further developed. In this blog post, we will use the QLattice® to build an explainable AI model to help us understand the combined properties of a better anti-cancer peptide.

Let’s dig into the details.

Peptides with specific chemical properties have been shown to be toxic to bacteria due to the negative charge on the cell surface. Charged peptides are able to interact with and destroy the membrane, causing cells to break apart [1], often referred to as cell lysis. Peptides are therefore a promising cancer therapy because cancer cells share the same electrochemical properties, having a charge distributed over their entire surface [1].

We can address these therapeutic possibilities by calculating and analysing different chemical properties of anti-cancer peptides with the aim of further modifying the peptides so that they can lyse cells even better.

A number of features of the chemical and physical properties, such as hydrophobicity, polarity, and charge, can be calculated from the amino acid sequence. This serves as input to the QLattice. With this, we create a classification model that determines whether the peptide is an anticancer peptide (1) or not (0). In this way, we gain insights into what a good anticancer peptide should (or should not) look like. We will use a publicly available dataset of annotated peptides published in the paper [4].

Building the model

A set of chemical and structural peptide features was calculated: Alpha-helix propensity, beta-sheet propensity, coil propensity, aromaticity, molecular weight (mw), flexibility, instability, half-life, hydrophobicity, extinction coefficient reduced and oxidised, isoelectric point, and sequence length. In addition, fractions of all 20 amino acids and amino acid pairs such as KC (lysine followed by cystine) and FL (phenylalanine followed by leucine) were considered as input data.

To adequately capture the hydrophobicity of peptides, a biological scale for hydrophobicity was chosen that captures the ability of a sequence to be incorporated into the lipid membrane [7]. In total, 434 data columns were used from which the QLattice was set out to select the most important. From all the hypotheses generated, one hypothesis was selected that contained 5 features hypothesised to be relevant to membrane interaction: Molecular weight, hydrophobicity, coil propensity, and FL and KC composition.

The selected model predicted the target with an accuracy of 0.798 and an AUC of 0.869 on the test data (Figure 1). The confusion matrix in Figure 2 visualises the performance by showing the expected predictions relative to the actual predictions.

Figure 1: Graph model of the mathematical model predicting the target outcome with an accuracy of 0.799, an AUC of 0.869, a precision of 0.788 and a recall of 0.82 on the test data. The nodes are coloured according to mutual information, with a high value shown in green and a low value in white. Figure 2: Confusion matrix that visualises the performance by showing the expected predictions relative to the actual predictions
Figure 2: Confusion matrix (threshold of 0.5) for the predictions made on the test data.

Understanding the model

The selected model captured several chemical/structural properties of peptides that have been shown to determine the anti-cancer properties shown in Figure 1. As expected, hydrophobicity and coil expansion play an important role in the peptide membrane interaction. This can be understood as “hydrophobicity describes the potential of a peptide to interact with the membrane and trigger lysis”.

Furthermore, it is known that hydrophobic coil regions can intercalate into the lipid membrane. Interestingly, peptides with a high potential to insert into the membrane showed a low possibility of destroying cancer cells. Moreover, peptides with low coil propensity in the lower hydrophobic regions were found to be effective in lysing cancer cells (Figure 3).

This leads to the conclusion that peptides with low membrane penetration possibilities, reflected in secondary structure and biological hydrophobicity, are promising in the development of anticancer peptide drugs. The presence of the KC within a sequence (one very polar and one charged amino acid) proved particularly relevant for anti-cancer properties. The target was predicted to be an anticancer peptide for all peptides with KC present.

Cysteine, in particular, was shown to lead to cell-penetrating and antimicrobial activity of a peptide sequence. Its cell-destructive properties have already been observed in the snake venom crotamine, which is rich in cysteine residues [5].

Figure 3: By setting the molecular weight and the proportion of FL and KC to a fixed value (median), you can plot the coil propensity and hydrophobicity by their signal against the target. Three different values of coil propensity are plotted against hydrophobicity.

From insights to action

We have now learned about some properties we should focus on for anti-cancer peptides: Hydrophobicity, KC composition, and alpha-helical regions. The focus is on exploring the specific mechanism of the therapeutics themselves: The attack on the membrane. Introducing KC amino acid pairs into a sequence leads to a higher lysis ability of cancer cells.

This example shows that there are opportunities to use symbolic regression to find properties and amino acids that can be used in modifying peptide sequences at the early stages of drug development. However, much remains to be done to further explain the mechanism and ensure that drugs reach the actual target cells.

Two major challenges lie ahead: Peptides have a short biological half-life and penetrate poorly into tissues [2][3]. The analysis in this blog post can be applied to these two problems to gain insights into what influences toxicity, solubility, and stability. We hope this will provide insight into how we can use the QLattice in developing peptide therapeutics.

Share some perspective.

More perspective from Abzu.

Our unique perspective of transparency and trust on topics from AI to business to being good humans.

Code camp is a tradition that we hold close to our hearts, not just because we've been doing it for over 5 years, but because it's a very efficient and effective way for us to create and innovate.
We at Abzu had the privilege of being a prominent player in the future of AI in fintech: A journey marked by trust, where we must balance the capabilities of AI with human touch.
We practice pair programming at Abzu, an Agile software development technique that improves our code quality, efficiency, and collaboration.

Subscribe for
notifications from Abzu.

You can opt out at any time. We’re cookieless, and our privacy policy is actually easy to read.