Chromatographic Determination of mRNA Critical Quality Attributes

Transcript

0:01 My name is Ramesh Indarkanti. I'm a Biologics Business Development Manager with Phenomenex.
0:05 Thank you for coming. And mRNA is an important drug modality, and it was seen with the recent
0:12 COVID vaccine development, which was where rapid deployment and development was possible
0:17 because of many unique properties. Also, last week, a Nobel Prize in Physiology and Medicine
0:23 was awarded for the development and discovery of mRNA vaccines. So, like any other drug modality
0:31 out there, we have to understand some of the critical quality attributes of mRNA for it to
0:37 be useful as a drug. And for today's presentation, we'll focus on some of the chromatographic methods
0:43 by themselves or coupled with mass spectrometry to understand the critical quality attributes of
0:49 the mRNA molecule. Most of this work was carried out by Roxanna, our application scientist, and
0:55 I have the opportunity to present it to you guys here.
0:59 So, this is the brief overview of the presentation here. We'll just start with an introduction of
1:04 the critical quality attributes of mRNA drugs and vaccines, solely focusing on the mRNA molecule
1:11 itself and not the LNP component. Then we move on to looking more in depth about the 5' cap
1:18 characterization and efficiency, which is important for mRNA's efficacy. Then we look at the ways we
1:24 can use enzymatic sequencing as well as mass spectrometry to understand the primary structural
1:30 integrity of mRNA. Then we want to talk about the poly(A) length distribution and heterogeneity,
1:35 which are important for the life of the mRNA in the cells in the organism. So,
1:43 then finally, look at the mRNA aggregation as a way to establish the drug substance
1:51 product quality. So, here we're looking at the mRNA critical quality attributes, and
1:59 as you can see here, mRNA contains a 5' cap, which is a highly methylated chemical structure,
2:06 we'll see in the coming slides. And the cap in itself determines the mRNA's efficacy, the
2:11 translational efficiency there, because this is where the transcription factors bind and express
2:17 the protein. Translation factors bind and express the protein. So, the amount of cap highly influences
2:24 the mRNA's expression. And also, it differentiates the host endogenous mRNA from those of the
2:31 pathogens. So, one of the early discoveries was the importance of the cap in having the synthetic
2:36 mRNAs to make them useful as vaccines and as well as therapies. Then this is followed by the
2:43 3' UTR region, untranslated region, which is also a regulatory, has contained several regulatory
2:49 elements, followed by the open reading frame or decoding sequence, as well as the 3' UTR,
2:57 or untranslated region. And we need to understand the sequence integrity of this in order to
3:03 establish the sequence integrity of the translated protein as well. Finally, there is a poly(A) tail
3:09 here that determines it is important in mRNA translocation, as well as mRNA life is also
3:17 heavily influenced by the length of the poly(A) tail itself. So, we'll look at methods to understand
3:23 each one of these critical quality attributes of the mRNA. And here we are looking at the
3:30 overview of the workflow that's used in this present set of experiments here,
3:34 which starts with heat denaturing the mRNA, and then in the presence of urea as a denaturing,
3:40 then digesting it down to smaller oligonucleotides that are much more amenable to mass spectrometry analysis
3:47 using RNase 4. And since RNase 4 cleaves between the U and the A or U and G,
3:54 and leaves the 3' phosphate, we are incorporating a T4 polynucleotide kinase that removes the 3'
4:00 phosphate, as well as 2'-3' cyclic phosphates that are formed during the digestion process.
4:05 Now, this results in a much more simplified hydroxylated pool, as opposed to having a
4:11 phosphate and a cyclic phosphate combination that generates a much more complex pool of
4:17 shorter oligos. Now, we're going to subject this to LC-MS/MS. First, we're going to, you know,
4:23 use good chromatography on our ion-pair reverse phase system using the biotin oligo column,
4:28 as you'll see, and then couple that to a high-resolution mass spectrometry,
4:32 CyX-Xenotop 7600 instrument. So, this is the comprehensive overview of the workflow.
4:39 Now, let's take a little bit of a closer look at the type of enzymes and how we would choose
4:45 various nucleases for the mRNA characterization workflow. If I were to use human RNase 4,
4:54 that cleaves between U and A and U and G, that's the 3' to U, followed by A or G, then I actually,
5:02 in case of EGFP mRNA example that we'll be looking at in this study, you actually end up generating a
5:10 nice, decent-sized oligonucleotide, 18 nucleotides long without the cap and about
5:17 19 nucleotides long with the cap, which is perfectly useful for mass spectrometry-based sequencing
5:23 and quantification. But on another hand, if I were to use an RNase T1, which cleaves the 3' terminal
5:29 to G residues, I'll end up generating really short oligonucleotide fragments that are not as
5:36 useful for sequencing as well as quantifying applications. On the extreme end, if I were to
5:40 use the E. coli MASF, that actually cleaves 5' to the ACA triplets, I'll end up generating about
5:47 a 90-nucleotide oligonucleotide, which makes it very difficult to do sequencing using mass spectrometry.
5:55 So, all in all, it's really important to choose the right type of nuclease based on your
6:01 understanding of the mRNA. In the present study, we are going to be using humanized RNase 4 in
6:06 combination with T4 polynucleotide kinase to remove the phosphates formed on the 3'.
6:15 Here is the sequence of the, I mean, mRNAs can be complex, can be a variety of lines,
6:20 but for the present study, we are using the EGFP enhanced green fluorescent protein sequence.
6:26 Then it's about 908 nucleotides long with a mass of about 294,000 Daltons. And as you can see,
6:34 if you were to use RNase 4, human RNase 4, it cleaves between the U and G, as well as U and
6:42 A residues, you'll generate a nice 5-terminal fragment with a cap on it that will allow you to
6:48 do sequencing, MS/MS based sequencing, as well as do good quantification using mass spectrometry
6:54 and chromatography. On the poly(A) tail side, you actually, you can do cleavage between this U and
7:02 A and you'll end up with the idea of a poly(A) tail length that you can analyze by HPLC mass
7:10 spectrometry as well. Now, obviously, good chromatography and choosing the appropriate
7:18 column for the type of analyte you're working with is going to be very important. And all
7:23 your nucleotide HPLC columns' requirements can be very different from those of proteins and small
7:28 molecules. And in this regard, Biosyn Phenomenex offers our Biosyn oligo HPLC column, which is
7:35 based on our core shell technology that has a solid impermeable inner core and a porous outer
7:41 core. And the porous outer core is the one that's responsible for separation. So this is a C18
7:46 column that incorporates a hybrid particle technology that offers extreme pH stability.
7:52 It comes in our BioTI titanium hardware to reduce sample loss and non-specific binding.
7:58 And it's stable up to pH 12, which is very important when you're working with oligonucleotides.
8:04 In addition, some of the offerings out there for the oligonucleotide columns are based on
8:10 fully porous particles. And fully porous particles, since they have a longer diffusion path, result in
8:16 greater band broadening. The core shell particles, on the other hand, can have shorter
8:24 diffusion paths, as a result, give you higher efficiency as well as higher resolution.
8:32 Now, we'll be looking at three different things. One is the 5' cap. The second is the
8:38 sequence integrity. And the third one is the poly(A) distributions. And since all these three different
8:43 studies require three different types of mass spectrometry-based experiments, we've chosen the 7600
8:50 ZenoTOF here. This offers, for the cap, it offers the MRM-based quantification abilities,
8:58 along with high-resolution measurements, which uses accurate quantification here.
9:03 And for the sequencing capabilities, it offers a data-dependent acquisition that will allow us to
9:10 get a complete, good sequence coverage for the mRNA. And for the poly(A) length distribution,
9:18 we have the accurate mass measurements that will give us information about the
9:23 poly(A) heterogeneity itself. Now, let's get a little bit deeper into the
9:29 mRNA cap characterization. Like I mentioned before, the mRNA 5' cap is very important to ensure
9:36 accurate translation of mRNA, as well as efficacy. And it also differentiates the
9:42 host endogenous nucleic acids from those of the pathogens. Since our nucleic acids from virus and
9:48 bacteria don't have the 5' cap, that's how our immune system can differentiate those from the
9:53 endogenous mRNA molecules. And the 5' cap comprises of N7-methylguanosine that is linked
10:02 via this triphosphate linkage, a 5' triphosphate linkage to the first nucleotide of
10:08 the mRNA. And in some cases, there could be a free hydroxyl in the 2' of the first nucleotide. We
10:15 call that a cap 0. And in cases where there is a methylation of the 2', we call that cap 1.
10:21 And in the present study, we'll be focusing on the cap 0, which is part of the EGFP mRNA
10:28 we use in these experiments. Now, let's take a closer look at how we calculate the percent cap
10:36 efficiency. So this is a structure here. Obviously, if you were using human RNase 4, RNase 4,
10:43 like I mentioned before, it cleaves between the U and G residues. Since we are using T4 PNK,
10:48 that's the polynucleotide kinase T4, that removes the phosphate that we end up with is about 18 or
10:54 19 more nucleotides long, depending on whether it's uncapped or capped. And you can also end up
11:00 with various other degradants as shown here, and these are the accurate masses. So we'll incorporate
11:06 a combination of MRM and accuracy measurements to understand the levels of the cap versus uncapped
11:14 that's present in the samples here. And we're going to use this formula here for calculating the
11:19 capping efficiency in the mRNA samples. Now, a good analysis, mass spectrometry analysis,
11:29 starts with good chromatography, and that's what we're seeing here. Running these samples
11:34 in MRM mode on our bioanalytical column, we can get a nice separation between the cap 0,
11:40 which is actually capped, and there's no cap, right? Remember, the cap 0 is actually non-methylated
11:46 bipyramid cap. So, and the no cap is eluding around 41.5 minutes, and the fully,
11:53 the cap mRNA is eluding around 45 minutes. And for good, robust chromatography is important,
12:00 so we can see across the replicates we are having very consistent retention times,
12:06 giving us confidence in the robustness of the method. And also, when looking at the
12:11 peak areas for the cap versus uncapped across various replicates, we also have very
12:15 consistent results, and that gives us confidence in the robustness of our method as well.
12:22 Here, we are looking at the, you know, the accurate mass data of the no cap sequence
12:30 that's generated from the RNase T4 digestion. This is the deconvoluted spectral data. In other words,
12:36 it's a neutral mass data, and you can see a nice isotopic resolution, even at an 18
12:43 oligonucleotide here, which is going to be very important to understand the,
12:48 to have a more precise understanding of the sequence as well. On the right, we are seeing
12:54 the CID spectral data, and these blue L's represent the fragments that are generated,
13:00 the five-prime fragments that are generated due to the CID in the collision cell, and the red
13:07 L's indicate the three-prime fragments. So, if you're looking at this, you know, either by the
13:12 red or the blue themselves, that is the five-prime fragments or the three-prime fragments by
13:16 themselves, don't give us complete sequence information, but if you were to combine these
13:21 two data sets, you actually get a complete sequence coverage. So, we're not only able to
13:27 quantify the mRNA, but also are able to use the CID capabilities of this instrument to
13:34 get a complete sequence of this five-prime capped oligo. Here, we're looking at, going back here,
13:42 the, we estimated the uncapped oligo to be about 14 percent, and using, along the same lines here,
13:50 we're looking at the M7G capped oligo here, that's a 19 mer oligo, and we also get a nice
13:57 complete sequence coverage for this particular sequence, so it gives us confidence in our
14:02 results. And using this, using the MRM experiments, we estimate the amount of the
14:09 capped oligo to be about 85 percent. Let's move on to looking at the sequence mapping
14:18 information, that's for the sequence coverage, because we need to establish the primary
14:22 structure identity, which gives us confidence in the sequence of the protein that are being
14:27 expressed. One of the things to keep in mind is when you were to use, if you were to use nucleases,
14:32 and since oligonucleotides only contain, mRNA oligos contain only four nucleotide AUGC,
14:39 we can end up with the multiple sequences variants that will have the same exact mass, so
14:44 it's important to have a good chromatography to be able to separate these sequence variants in
14:49 order to establish their identity and get good sequence coverage. Here we are looking at the
14:56 RNase T4 digested mRNA that's run on our Biosyn oligo column using ion-pair reverse phase
15:02 chromatography method, using hexyl propanol and isopropylamine as mobile phase modifiers in a
15:09 water acetonitrile gradient system. And you know, if you're a chromatographer, you can really
15:14 appreciate the quality of the data we are getting here. Well-separated peaks with the nice, you know,
15:20 sharp peaks were well-separated, and even for a complex mixture like this, you're getting a good
15:25 distribution of these, all these peaks. On the later end part of the chromatogram here,
15:30 you see that the poly(A) tail and all these little bumps, as well as the
15:36 the big peak here is the poly(A) tail, as we will see in the later slides.
15:40 Now, if you were to subject this to data-dependent acquisition on the
15:45 CyX 7600 Xenotop mass spectrometer, we can get the CID spectral data, which will allow us to
15:53 establish the sequence of all of this. And from this present experiment using the RNase 4
16:00 nuclease and the nuclease, we are able to get about 96% coverage for this. And obviously, I'm not
16:06 including the poly(A) tail in this 96% number. That will be discussed later separately, and that's a
16:12 separate set of experiments. And as I mentioned before, the short nucleotides can have slightly
16:18 different sequences, but would have the same exact mass, in which case, on the mass spectrometer,
16:23 they're indistinguishable because they have the same exact mass. There, it becomes important to
16:28 chromatographically separate these short sequences and then use the mass spectrometry to get their sequence
16:34 identity. And what we're seeing here is three oligonucleotides that have the same exact mass,
16:39 but have different base locations. We call them the sequence isomers. And these sequence isomers
16:47 are nicely separated in our Biosyn oligo column, and we also have very consistent retention times
16:53 across replicates as well, giving us confidence in the robustness of the method. So, taking a
16:58 little bit of a deeper look at that, the data we've shown in the previous slides, we can see
17:04 in the previous slide here, we have peaks one to three that have the same exact mass, that are
17:09 indistinguishable in the mass spectrometry. But by separating them chromatographically, we pick
17:14 negatively charged two charges, two ions for each one of these peaks, and subject this to CID
17:20 fragmentation, and we can get the complete sequence information for each one of these. So, thereby, we
17:26 are improving overall sequence coverage of our mRNA using this nuclease digestion, and we are able to
17:32 get 96% sequence coverage. Now, with that, you know, little bit of data on the sequence coverage,
17:41 let's move on to the poly(A) tail length distribution and heterogeneity itself. Like I said,
17:47 the poly(A) tail is very important in enhancing the life of the mRNA itself, and also in its
17:55 cellular translocation. So, it's an important attribute to measure in your drug substance as
18:03 well as drug product. Now, for this, we're still using our RNase 4 that is coming between this U
18:12 and A, and generating a long sequence, because there are various degradants and varying different
18:19 poly(A) tail lengths. And taking a closer look at the later part of the chromatogram that we saw a
18:23 few slides ago, and here you see a lot of bumps here, which we'll zoom into in the next
18:29 slide, that are actually coming from the different poly(A) chain lengths. And this big
18:38 peak has multiple poly(A) chains as well. And if you were to take a look at the MS data, it looks
18:43 extremely complex, but if you were to do a deconvolution on the spectral m over z data,
18:49 and convert this to the mass domain, we can see nice even spacing that corresponds to
18:55 an adenosine nucleotide giving us confidence that this is the poly(A) tail. Here, we're looking at
19:01 the power of the chromatography itself to separate all these various poly(A) tail lengths, and we can
19:08 nicely separate up to 61 for the poly(A) tail length. But as you get to longer and longer
19:15 oligos, the difference between the N-1 and full lengths is small, and as a result, the separation
19:21 becomes more difficult. But nonetheless, we can still use the mass spectrometry to deconvolute this and
19:26 get additional information. So for chromatographically, we are able to separate up to
19:31 61 nucleotides in this. And if we were to take this big peak here and perform deconvolution,
19:38 we see that the peak spacing is equal to that of an adenosine, telling us that this is, again,
19:44 the poly(A) tail. And we were able to detect a poly(A) tail length of up to 18 nucleotides in
19:51 this recent study. Now let's finally take a look at an additional quality attribute,
19:57 which is the aggregation of mRNA itself, which…
20:00 poly(A) tail. Here, we're looking at the power of the chromatography itself to separate
20:05 all these various poly(A) tail lengths, and we can nicely separate up to 61 for the poly(A)
20:12 tail length. But as you get to longer and longer oligos, the difference between the
20:17 N-1 and full lengths is small, and as a result, the separation becomes more difficult. But
20:22 nonetheless, we can still use the mass spectrometry to deconvolute this and get additional information.
20:28 So, for chromatographically, we are able to separate up to 61 nucleotides in this.
20:33 And if we were to take this big peak here and perform deconvolution, we see that the
20:39 peak spacing is equal to that of an adenosine, telling us that this is, again, the poly(A) tail.
20:48 And we were able to detect a poly(A) tail length of up to 18 nucleotides in this recent study.
20:53 Now, let's finally take a look at an additional quality attribute, which is the aggregation of
20:58 mRNA itself, which, according to USP guidelines, is a product quality one needs to establish in
21:05 your mRNA samples. Now, obviously, size exclusion chromatography is well-suited for separating
21:13 mRNA and its aggregates, since the monomers, the dimers, and the trimers are going to be
21:17 two times, three times, and so on. The molecular weight of the monomeric peak and the size
21:23 exclusion is going to be a very useful way to separate this. And for this application,
21:28 we have used our Biosyn-DSeq7 size exclusion column, which is a 700 angstrom pore size column,
21:34 and it incorporates our BioTI titanium hardware, and it comes in various lengths and dimensions,
21:42 depending on your requirements. Here, we are looking at the EGFP mRNA that is separated on
21:48 the Biosyn-DSeq7 HPLC column. And as you can see here, the first peak is coming most likely
21:56 from the aggregate, and the really tall peak is coming from the monomer. But how do we know this
22:01 is aggregate? So, if you were to take this sample and heat it, actually, the levels of the aggregate
22:06 go down, whereas the levels of the monomer go up, suggesting that this is a hydrogen bonding
22:12 type of aggregation is happening, and heating it to 70 degrees is actually decreasing the aggregate
22:18 levels, right? And that's great that we were looking at the UV data in the previous slide,
22:24 but what if I want additional information? So, you can couple your DSeq7 column to something
22:29 called a multi-angle light scattering detector, or MALS, that can give you molecular weight
22:35 information as well. So, from this, we can see that the first major peak is the monomer,
22:41 and the larger aggregates are coming, the dimer, trimer, tetramer, and so on,
22:49 giving us the confidence that we are detecting the aggregation accurately.
22:54 In summary, I hope I was able to convince that the phenomenal solutions for mRNA characterization
23:01 and critical quality attribute determination encompassing the oligo sequence mapping,
23:06 5'-cap efficiency, poly(A) tail length distribution, as well as aggregate determination
23:13 can be very helpful in your day-to-day work. Thank you for your time.