Phenomenex
Duration: 23:13 Min
Chromatographic Determination of mRNA Critical Quality Attributes
Transcript
0:01
My name is Ramesh Indarkanti. I'm a Biologics Business Development Manager with Phenomenex.
0:05
Thank you for coming. And mRNA is an important drug modality, and it was seen with the recent
0:12
COVID vaccine development, which was where rapid deployment and development was possible
0:17
because of many unique properties. Also, last week, a Nobel Prize in Physiology and Medicine
0:23
was awarded for the development and discovery of mRNA vaccines. So, like any other drug modality
0:31
out there, we have to understand some of the critical quality attributes of mRNA for it to
0:37
be useful as a drug. And for today's presentation, we'll focus on some of the chromatographic methods
0:43
by themselves or coupled with mass spectrometry to understand the critical quality attributes of
0:49
the mRNA molecule. Most of this work was carried out by Roxanna, our application scientist, and
0:55
I have the opportunity to present it to you guys here.
0:59
So, this is the brief overview of the presentation here. We'll just start with an introduction of
1:04
the critical quality attributes of mRNA drugs and vaccines, solely focusing on the mRNA molecule
1:11
itself and not the LNP component. Then we move on to looking more in depth about the 5' cap
1:18
characterization and efficiency, which is important for mRNA's efficacy. Then we look at the ways we
1:24
can use enzymatic sequencing as well as mass spectrometry to understand the primary structural
1:30
integrity of mRNA. Then we want to talk about the poly(A) length distribution and heterogeneity,
1:35
which are important for the life of the mRNA in the cells in the organism. So,
1:43
then finally, look at the mRNA aggregation as a way to establish the drug substance
1:51
product quality. So, here we're looking at the mRNA critical quality attributes, and
1:59
as you can see here, mRNA contains a 5' cap, which is a highly methylated chemical structure,
2:06
we'll see in the coming slides. And the cap in itself determines the mRNA's efficacy, the
2:11
translational efficiency there, because this is where the transcription factors bind and express
2:17
the protein. Translation factors bind and express the protein. So, the amount of cap highly influences
2:24
the mRNA's expression. And also, it differentiates the host endogenous mRNA from those of the
2:31
pathogens. So, one of the early discoveries was the importance of the cap in having the synthetic
2:36
mRNAs to make them useful as vaccines and as well as therapies. Then this is followed by the
2:43
3' UTR region, untranslated region, which is also a regulatory, has contained several regulatory
2:49
elements, followed by the open reading frame or decoding sequence, as well as the 3' UTR,
2:57
or untranslated region. And we need to understand the sequence integrity of this in order to
3:03
establish the sequence integrity of the translated protein as well. Finally, there is a poly(A) tail
3:09
here that determines it is important in mRNA translocation, as well as mRNA life is also
3:17
heavily influenced by the length of the poly(A) tail itself. So, we'll look at methods to understand
3:23
each one of these critical quality attributes of the mRNA. And here we are looking at the
3:30
overview of the workflow that's used in this present set of experiments here,
3:34
which starts with heat denaturing the mRNA, and then in the presence of urea as a denaturing,
3:40
then digesting it down to smaller oligonucleotides that are much more amenable to mass spectrometry analysis
3:47
using RNase 4. And since RNase 4 cleaves between the U and the A or U and G,
3:54
and leaves the 3' phosphate, we are incorporating a T4 polynucleotide kinase that removes the 3'
4:00
phosphate, as well as 2'-3' cyclic phosphates that are formed during the digestion process.
4:05
Now, this results in a much more simplified hydroxylated pool, as opposed to having a
4:11
phosphate and a cyclic phosphate combination that generates a much more complex pool of
4:17
shorter oligos. Now, we're going to subject this to LC-MS/MS. First, we're going to, you know,
4:23
use good chromatography on our ion-pair reverse phase system using the biotin oligo column,
4:28
as you'll see, and then couple that to a high-resolution mass spectrometry,
4:32
CyX-Xenotop 7600 instrument. So, this is the comprehensive overview of the workflow.
4:39
Now, let's take a little bit of a closer look at the type of enzymes and how we would choose
4:45
various nucleases for the mRNA characterization workflow. If I were to use human RNase 4,
4:54
that cleaves between U and A and U and G, that's the 3' to U, followed by A or G, then I actually,
5:02
in case of EGFP mRNA example that we'll be looking at in this study, you actually end up generating a
5:10
nice, decent-sized oligonucleotide, 18 nucleotides long without the cap and about
5:17
19 nucleotides long with the cap, which is perfectly useful for mass spectrometry-based sequencing
5:23
and quantification. But on another hand, if I were to use an RNase T1, which cleaves the 3' terminal
5:29
to G residues, I'll end up generating really short oligonucleotide fragments that are not as
5:36
useful for sequencing as well as quantifying applications. On the extreme end, if I were to
5:40
use the E. coli MASF, that actually cleaves 5' to the ACA triplets, I'll end up generating about
5:47
a 90-nucleotide oligonucleotide, which makes it very difficult to do sequencing using mass spectrometry.
5:55
So, all in all, it's really important to choose the right type of nuclease based on your
6:01
understanding of the mRNA. In the present study, we are going to be using humanized RNase 4 in
6:06
combination with T4 polynucleotide kinase to remove the phosphates formed on the 3'.
6:15
Here is the sequence of the, I mean, mRNAs can be complex, can be a variety of lines,
6:20
but for the present study, we are using the EGFP enhanced green fluorescent protein sequence.
6:26
Then it's about 908 nucleotides long with a mass of about 294,000 Daltons. And as you can see,
6:34
if you were to use RNase 4, human RNase 4, it cleaves between the U and G, as well as U and
6:42
A residues, you'll generate a nice 5-terminal fragment with a cap on it that will allow you to
6:48
do sequencing, MS/MS based sequencing, as well as do good quantification using mass spectrometry
6:54
and chromatography. On the poly(A) tail side, you actually, you can do cleavage between this U and
7:02
A and you'll end up with the idea of a poly(A) tail length that you can analyze by HPLC mass
7:10
spectrometry as well. Now, obviously, good chromatography and choosing the appropriate
7:18
column for the type of analyte you're working with is going to be very important. And all
7:23
your nucleotide HPLC columns' requirements can be very different from those of proteins and small
7:28
molecules. And in this regard, Biosyn Phenomenex offers our Biosyn oligo HPLC column, which is
7:35
based on our core shell technology that has a solid impermeable inner core and a porous outer
7:41
core. And the porous outer core is the one that's responsible for separation. So this is a C18
7:46
column that incorporates a hybrid particle technology that offers extreme pH stability.
7:52
It comes in our BioTI titanium hardware to reduce sample loss and non-specific binding.
7:58
And it's stable up to pH 12, which is very important when you're working with oligonucleotides.
8:04
In addition, some of the offerings out there for the oligonucleotide columns are based on
8:10
fully porous particles. And fully porous particles, since they have a longer diffusion path, result in
8:16
greater band broadening. The core shell particles, on the other hand, can have shorter
8:24
diffusion paths, as a result, give you higher efficiency as well as higher resolution.
8:32
Now, we'll be looking at three different things. One is the 5' cap. The second is the
8:38
sequence integrity. And the third one is the poly(A) distributions. And since all these three different
8:43
studies require three different types of mass spectrometry-based experiments, we've chosen the 7600
8:50
ZenoTOF here. This offers, for the cap, it offers the MRM-based quantification abilities,
8:58
along with high-resolution measurements, which uses accurate quantification here.
9:03
And for the sequencing capabilities, it offers a data-dependent acquisition that will allow us to
9:10
get a complete, good sequence coverage for the mRNA. And for the poly(A) length distribution,
9:18
we have the accurate mass measurements that will give us information about the
9:23
poly(A) heterogeneity itself. Now, let's get a little bit deeper into the
9:29
mRNA cap characterization. Like I mentioned before, the mRNA 5' cap is very important to ensure
9:36
accurate translation of mRNA, as well as efficacy. And it also differentiates the
9:42
host endogenous nucleic acids from those of the pathogens. Since our nucleic acids from virus and
9:48
bacteria don't have the 5' cap, that's how our immune system can differentiate those from the
9:53
endogenous mRNA molecules. And the 5' cap comprises of N7-methylguanosine that is linked
10:02
via this triphosphate linkage, a 5' triphosphate linkage to the first nucleotide of
10:08
the mRNA. And in some cases, there could be a free hydroxyl in the 2' of the first nucleotide. We
10:15
call that a cap 0. And in cases where there is a methylation of the 2', we call that cap 1.
10:21
And in the present study, we'll be focusing on the cap 0, which is part of the EGFP mRNA
10:28
we use in these experiments. Now, let's take a closer look at how we calculate the percent cap
10:36
efficiency. So this is a structure here. Obviously, if you were using human RNase 4, RNase 4,
10:43
like I mentioned before, it cleaves between the U and G residues. Since we are using T4 PNK,
10:48
that's the polynucleotide kinase T4, that removes the phosphate that we end up with is about 18 or
10:54
19 more nucleotides long, depending on whether it's uncapped or capped. And you can also end up
11:00
with various other degradants as shown here, and these are the accurate masses. So we'll incorporate
11:06
a combination of MRM and accuracy measurements to understand the levels of the cap versus uncapped
11:14
that's present in the samples here. And we're going to use this formula here for calculating the
11:19
capping efficiency in the mRNA samples. Now, a good analysis, mass spectrometry analysis,
11:29
starts with good chromatography, and that's what we're seeing here. Running these samples
11:34
in MRM mode on our bioanalytical column, we can get a nice separation between the cap 0,
11:40
which is actually capped, and there's no cap, right? Remember, the cap 0 is actually non-methylated
11:46
bipyramid cap. So, and the no cap is eluding around 41.5 minutes, and the fully,
11:53
the cap mRNA is eluding around 45 minutes. And for good, robust chromatography is important,
12:00
so we can see across the replicates we are having very consistent retention times,
12:06
giving us confidence in the robustness of the method. And also, when looking at the
12:11
peak areas for the cap versus uncapped across various replicates, we also have very
12:15
consistent results, and that gives us confidence in the robustness of our method as well.
12:22
Here, we are looking at the, you know, the accurate mass data of the no cap sequence
12:30
that's generated from the RNase T4 digestion. This is the deconvoluted spectral data. In other words,
12:36
it's a neutral mass data, and you can see a nice isotopic resolution, even at an 18
12:43
oligonucleotide here, which is going to be very important to understand the,
12:48
to have a more precise understanding of the sequence as well. On the right, we are seeing
12:54
the CID spectral data, and these blue L's represent the fragments that are generated,
13:00
the five-prime fragments that are generated due to the CID in the collision cell, and the red
13:07
L's indicate the three-prime fragments. So, if you're looking at this, you know, either by the
13:12
red or the blue themselves, that is the five-prime fragments or the three-prime fragments by
13:16
themselves, don't give us complete sequence information, but if you were to combine these
13:21
two data sets, you actually get a complete sequence coverage. So, we're not only able to
13:27
quantify the mRNA, but also are able to use the CID capabilities of this instrument to
13:34
get a complete sequence of this five-prime capped oligo. Here, we're looking at, going back here,
13:42
the, we estimated the uncapped oligo to be about 14 percent, and using, along the same lines here,
13:50
we're looking at the M7G capped oligo here, that's a 19 mer oligo, and we also get a nice
13:57
complete sequence coverage for this particular sequence, so it gives us confidence in our
14:02
results. And using this, using the MRM experiments, we estimate the amount of the
14:09
capped oligo to be about 85 percent. Let's move on to looking at the sequence mapping
14:18
information, that's for the sequence coverage, because we need to establish the primary
14:22
structure identity, which gives us confidence in the sequence of the protein that are being
14:27
expressed. One of the things to keep in mind is when you were to use, if you were to use nucleases,
14:32
and since oligonucleotides only contain, mRNA oligos contain only four nucleotide AUGC,
14:39
we can end up with the multiple sequences variants that will have the same exact mass, so
14:44
it's important to have a good chromatography to be able to separate these sequence variants in
14:49
order to establish their identity and get good sequence coverage. Here we are looking at the
14:56
RNase T4 digested mRNA that's run on our Biosyn oligo column using ion-pair reverse phase
15:02
chromatography method, using hexyl propanol and isopropylamine as mobile phase modifiers in a
15:09
water acetonitrile gradient system. And you know, if you're a chromatographer, you can really
15:14
appreciate the quality of the data we are getting here. Well-separated peaks with the nice, you know,
15:20
sharp peaks were well-separated, and even for a complex mixture like this, you're getting a good
15:25
distribution of these, all these peaks. On the later end part of the chromatogram here,
15:30
you see that the poly(A) tail and all these little bumps, as well as the
15:36
the big peak here is the poly(A) tail, as we will see in the later slides.
15:40
Now, if you were to subject this to data-dependent acquisition on the
15:45
CyX 7600 Xenotop mass spectrometer, we can get the CID spectral data, which will allow us to
15:53
establish the sequence of all of this. And from this present experiment using the RNase 4
16:00
nuclease and the nuclease, we are able to get about 96% coverage for this. And obviously, I'm not
16:06
including the poly(A) tail in this 96% number. That will be discussed later separately, and that's a
16:12
separate set of experiments. And as I mentioned before, the short nucleotides can have slightly
16:18
different sequences, but would have the same exact mass, in which case, on the mass spectrometer,
16:23
they're indistinguishable because they have the same exact mass. There, it becomes important to
16:28
chromatographically separate these short sequences and then use the mass spectrometry to get their sequence
16:34
identity. And what we're seeing here is three oligonucleotides that have the same exact mass,
16:39
but have different base locations. We call them the sequence isomers. And these sequence isomers
16:47
are nicely separated in our Biosyn oligo column, and we also have very consistent retention times
16:53
across replicates as well, giving us confidence in the robustness of the method. So, taking a
16:58
little bit of a deeper look at that, the data we've shown in the previous slides, we can see
17:04
in the previous slide here, we have peaks one to three that have the same exact mass, that are
17:09
indistinguishable in the mass spectrometry. But by separating them chromatographically, we pick
17:14
negatively charged two charges, two ions for each one of these peaks, and subject this to CID
17:20
fragmentation, and we can get the complete sequence information for each one of these. So, thereby, we
17:26
are improving overall sequence coverage of our mRNA using this nuclease digestion, and we are able to
17:32
get 96% sequence coverage. Now, with that, you know, little bit of data on the sequence coverage,
17:41
let's move on to the poly(A) tail length distribution and heterogeneity itself. Like I said,
17:47
the poly(A) tail is very important in enhancing the life of the mRNA itself, and also in its
17:55
cellular translocation. So, it's an important attribute to measure in your drug substance as
18:03
well as drug product. Now, for this, we're still using our RNase 4 that is coming between this U
18:12
and A, and generating a long sequence, because there are various degradants and varying different
18:19
poly(A) tail lengths. And taking a closer look at the later part of the chromatogram that we saw a
18:23
few slides ago, and here you see a lot of bumps here, which we'll zoom into in the next
18:29
slide, that are actually coming from the different poly(A) chain lengths. And this big
18:38
peak has multiple poly(A) chains as well. And if you were to take a look at the MS data, it looks
18:43
extremely complex, but if you were to do a deconvolution on the spectral m over z data,
18:49
and convert this to the mass domain, we can see nice even spacing that corresponds to
18:55
an adenosine nucleotide giving us confidence that this is the poly(A) tail. Here, we're looking at
19:01
the power of the chromatography itself to separate all these various poly(A) tail lengths, and we can
19:08
nicely separate up to 61 for the poly(A) tail length. But as you get to longer and longer
19:15
oligos, the difference between the N-1 and full lengths is small, and as a result, the separation
19:21
becomes more difficult. But nonetheless, we can still use the mass spectrometry to deconvolute this and
19:26
get additional information. So for chromatographically, we are able to separate up to
19:31
61 nucleotides in this. And if we were to take this big peak here and perform deconvolution,
19:38
we see that the peak spacing is equal to that of an adenosine, telling us that this is, again,
19:44
the poly(A) tail. And we were able to detect a poly(A) tail length of up to 18 nucleotides in
19:51
this recent study. Now let's finally take a look at an additional quality attribute,
19:57
which is the aggregation of mRNA itself, which…
20:00
poly(A) tail. Here, we're looking at the power of the chromatography itself to separate
20:05
all these various poly(A) tail lengths, and we can nicely separate up to 61 for the poly(A)
20:12
tail length. But as you get to longer and longer oligos, the difference between the
20:17
N-1 and full lengths is small, and as a result, the separation becomes more difficult. But
20:22
nonetheless, we can still use the mass spectrometry to deconvolute this and get additional information.
20:28
So, for chromatographically, we are able to separate up to 61 nucleotides in this.
20:33
And if we were to take this big peak here and perform deconvolution, we see that the
20:39
peak spacing is equal to that of an adenosine, telling us that this is, again, the poly(A) tail.
20:48
And we were able to detect a poly(A) tail length of up to 18 nucleotides in this recent study.
20:53
Now, let's finally take a look at an additional quality attribute, which is the aggregation of
20:58
mRNA itself, which, according to USP guidelines, is a product quality one needs to establish in
21:05
your mRNA samples. Now, obviously, size exclusion chromatography is well-suited for separating
21:13
mRNA and its aggregates, since the monomers, the dimers, and the trimers are going to be
21:17
two times, three times, and so on. The molecular weight of the monomeric peak and the size
21:23
exclusion is going to be a very useful way to separate this. And for this application,
21:28
we have used our Biosyn-DSeq7 size exclusion column, which is a 700 angstrom pore size column,
21:34
and it incorporates our BioTI titanium hardware, and it comes in various lengths and dimensions,
21:42
depending on your requirements. Here, we are looking at the EGFP mRNA that is separated on
21:48
the Biosyn-DSeq7 HPLC column. And as you can see here, the first peak is coming most likely
21:56
from the aggregate, and the really tall peak is coming from the monomer. But how do we know this
22:01
is aggregate? So, if you were to take this sample and heat it, actually, the levels of the aggregate
22:06
go down, whereas the levels of the monomer go up, suggesting that this is a hydrogen bonding
22:12
type of aggregation is happening, and heating it to 70 degrees is actually decreasing the aggregate
22:18
levels, right? And that's great that we were looking at the UV data in the previous slide,
22:24
but what if I want additional information? So, you can couple your DSeq7 column to something
22:29
called a multi-angle light scattering detector, or MALS, that can give you molecular weight
22:35
information as well. So, from this, we can see that the first major peak is the monomer,
22:41
and the larger aggregates are coming, the dimer, trimer, tetramer, and so on,
22:49
giving us the confidence that we are detecting the aggregation accurately.
22:54
In summary, I hope I was able to convince that the phenomenal solutions for mRNA characterization
23:01
and critical quality attribute determination encompassing the oligo sequence mapping,
23:06
5'-cap efficiency, poly(A) tail length distribution, as well as aggregate determination
23:13
can be very helpful in your day-to-day work. Thank you for your time.