Proteins at their most basic level are chains of amino acids. Protein sequencing is the determination of the order of these amino acids in a chain. It may be done by removing and analyzing each amino acid one by one, by mass spectrometry, or examining the RNA that codes for the protein.
Different Techniques to Analyze Proteins
Proteins are complexes of amino acid chains called polypeptides. Their most basic level of structure (primary structure) is the sequence of the amino acids in the chain. Biochemists have a number of techniques to determine a protein's primary structure. Three methods in particular are used widely: the Edman degradation, mass spectrometry, and prediction based on DNA/RNA.
Image: Electrospray ionization is used in mass spectroscopy to analyze protein sequences. Source: Maciej Kotlinski, from Wikimedia Commons, licensed under the GNU Free Documentation License
Electrospray Ionization Source
The Anatomy of a Polypeptide
In biochemistry, an amino acid starts with a single carbon atom, the alpha-carbon. Bonded to it are a carboxyl group (COOH), an amine group (NH), and an organic side chain (represented as R). Only twenty different side chains are used in nature, but these are quite varied; some are hydrophobic, some are polar, some are basic or acidic. The carboxyl and amine groups are linked by a covalent bond called a peptide bond. Each peptide has a C-terminus (the end with a free carboxyl group) and an N-terminus (with a free amine group).
In principle, sequencing can progress from either end of a peptide, but more methods are available for N-terminal sequencing than for C-terminal sequencing.
The Edman Degradation
The Edman degradation sequences a peptide by removing amino acids one by one from the N-terminus. The peptide is first adsorbed onto a solid surface. The amine group on the N-terminus is then labeled with the Edman reagent, phenylisothiocyanate, under mildly basic conditions (pH around 8.0). This allows only that N-terminal amino acid to be cleaved when the labeled peptide is later placed into acidic conditions. The cleaved amino acid is separated and identified through chromatography or electrophoresis.
The process is repeated on the peptide until the entire sequence is determined. Each step is about 98% efficient, so only peptides about 50 amino acids in length or shorter can be accurately and completely sequenced. Thus, to be sequenced via the Edman degradation, a protein usually must be broken into shorter pieces.
One drawback of the Edman degradation is that the N-terminus of some proteins is hidden within the tertiary structure.
Mass spectrometry uses the mass-to-charge ratio of ions to analyze a molecule. A Nobel-Prize winning technique called electrospray ionization allows large molecules like polypeptides to be ionized for use in the mass spectrometer. The protein is first digested by an enzyme to produce an assortment of fragments. These are passed through a liquid chromatography column before being sprayed into the mass spectrometer through a narrow, positively-charged nozzle, further fragmenting them into individual ions. After the device measures the mass-to-charge ratio of the fragments, the resulting data are processed by a computer to arrive at the amino acid sequence. Unlike the Edman degradation, mass spectrometry does not have an absolute upper size limit for the proteins it sequences, but larger proteins are computationally more difficult to sequence.
Prediction from DNA or RNA
The amino acid sequence of every protein made by cells is coded for by the messenger RNA (mRNA) used by ribosomes to create that protein. If researchers know which gene codes for a protein, the protein structure can be determined by sequencing the mRNA or its DNA source. This technique may be used in conjunction with the above techniques. A short segment of the polypeptide may be sequenced and a corresponding RNA segment synthesized. This RNA segment is used to find the gene by allowing it to hybridize with candidate mRNA or DNA. A complication of this technique is the fact that most amino acids can be coded for by more than one RNA sequence.
Protein sequencing technology is one part of the field of bioinformatics. As a precursor to de novo protein design, it is now an important cornerstone of biochemistry.