Introduction The last decade in life sciences was deeply influenced by the development of the “Omics” technologies (genomics, transcriptomics, proteomics, and metabolomics), which aim for a global view on biological systems. With these tools at hand, the scientific community is striving to build functional models to develop a global understanding of the living cell. 1975 1975 2006 2007 1995 2005 2003 1979 1981 1981a b c 1989 1981 1989 1987 1988 1982 1988 1991 1989 1992 1998 1999 2001 1 Table 1 Current commercial software products for 2-D gel image analysis Company Products www.biorad.com PDQuest, ProteomWeaver www.compugen.com Z3 (discontinued) www.decodon.com Delta2D www.gelifesciences.com Decyder 2D, ImageMaster Platinum* www.genebio.com *Melanie (ImageMaster Platinum) www.nonlinear.com Progenesis, SameSpots www.syngene.com Dymension 2003 2006 2006 Spot detection first: these are the classical packages where the image information is first condensed into a set of spot centers, boundaries, and possibly spot volumes for each image. Spot matching and subsequent creation of expression profiles are done based on the data about spot geometry and volumes. Image warping first: these are packages where image warping is applied to remove running differences between gels, based on the whole image information. Spot detection is a separate and independent step. The creation of expression profiles is critically informed (and improved) by the data about positional differences between gels that were gained in the first step. 2007 1 Performing a biological experiment or selecting a biological object of interest. The first sample preparation step is freezing the sample in the current state. This includes inactivation of all cellular processes that may change the proteome composition, preventing protease action, disintegration of the cell material, keeping or bringing the proteins into solution, removing or destroying macromolecules that may disturb the subsequent steps of the 2-D protocol (RNase and/or DNase treatment and centrifuging for cell debris removal). Alternatively or in combination with radiolabeling, covalent fluorescent labeling of proteins can be applied here. Bringing the proteins into the gel and performing the 2-D separation by combining isoelectrofocussing in the first and sodium dodecyl sulfate (SDS) electrophoresis in the second dimension. An alternate 2-D approach uses the combination of two detergent treatments that resolve the protein molecules differently resulting in a scattered diagonal spot pattern (2D-16-BAC- or 2D-CTAB/SDS-polyacrylamide gel electrophoresis). A variety of staining techniques can be applied before or after separation to enable spot detection. Capturing the gel images by using scanners, charge-coupled device (CCD) camera-based, or laser imaging devices. Depending on the protein labeling or staining techniques, a compatible imaging device has to be chosen. The capturing process results in one or more digitized computer images per gel that can be displayed with common image analysis software. The image capture step transforms the quantitative information of the gel into computer-readable data. Correction of positional spot variations by image warping. 2-D electrophoresis results in spot patterns with variations in the spot positions between gels. Therefore, gel images are positionally corrected by a combination of global and local image transforms (image warping). The information about differences in spot positions that was gained in this step is reused later for image fusion and for the transfer of the consensus spot pattern. Image fusion and proteome maps condense the image information of the whole experiment into one fusion image, also called a proteome map. The proteome map contains the information of all protein spots ever detected in the experiment. Spot detection is performed on the proteome map. As a result, a consensus spot pattern is generated, which is valid for all gels in the experiment. It describes the position and the general shape of all protein spots from the experiment. 2 Expression profile analysis identifies interesting spots which will be marked for further analysis, protein identification, and interpretation. Fig. 1 1 2 3 4 5 6 7 8 Fig. 2 a d e Protein staining 3 2003 Fig. 3 Protein labeling, staining, and tagging techniques for the selective detection of proteins. By multiplexing detection approaches, image analysis may relate different subsets of the proteome such as phosphorylated or glycosylated proteins 2000 2 2006 Table 2 The most commonly used dyes in 2-D gels Dye Principle Sensitivity Quantitation Amount/signal Coomassie Brilliant Blue Absorption Very low After calibration Nonlinear Colloidal Coomassie Blue Absorption (very) high After calibration Nonlinear Silver Staining Absorption Very high Impossible Logistic Sypro Ruby Fluorescence High Yes Linear Ruthenium II tris (bathophenanthroline disulfonate) Fluorescence High Yes Linear Flamingo Fluorescence High Yes Linear Lava Purple Fluorescence High Yes Linear Krypton Fluorescence Very high Yes Linear 1998 2003 2002 2007 Characterizing specific protein properties 3 2003 2002 2007 2004 2005 1999 2003 35 14 1988 1990 1999 4 Fig. 4 green red Bacillus subtilis 4 2004 32/33 2007 33 5 Fig. 5 green red blue B. subtilis 2002 2002 Recording and preparation of raw image data 2003 2007 The general rule in 2-D gel image analysis is that the quality of the raw data has a significant impact on the final result. Therefore, it is essential to avoid experimentally caused artifacts and to configure the scanning devices in the best possible way. Background, artifacts, and noise influence the spot detection and quantitation process. Gel disruptions may truncate spots, speckles may mislead the spot detection or distort quantitation, noise can cover low intensity spots, background increases quantity and reduces dynamic range, etc. Background may be caused by insufficiently erased imaging plates (phosphorimaging), insufficient destaining, fluorescing glass plates, gel coverings, and backings. Furthermore, misusing optical filters for fluorescence imaging may cause background. Noise can be produced by high photo-multiplier tube voltages, which leads to the amplification of random signals. Phosphor screens that have not been used for a longer time accumulate noise. Many software packages allow for postscan image manipulations. One has to distinguish between image manipulations that do not change the quantitative information and those that do, incurring some loss of data in the process. All operations that leave pixels intact do not change the measured data, e.g., rotations in 90° steps, mirrorings, and cropping (removing areas from the images that do not contain information of interest). Linear enhancements of resolution and gray levels can be undone without data loss and do not influence quantitative data because normalization is used in spot quantitation. On the other hand, many operations that are used for image enhancement cause minor changes in spot detection and spot quantitation and should be avoided if possible, e.g., free rotations or free scaling change gray level distribution of the manipulated image. 2006 6 Fig. 6 a b c d Finally, there are operations that should definitely be avoided because they result in data loss: for example, gamma correction changes the gray levels nonlinearly, blurring, and converting to JPEG format loose data, etc. Another hazard lies in the application of general purpose image manipulation software to special purpose file formats. In the process of, for example, cropping a gel file in Photoshop, essential calibration information can be lost in the resulting file. Therefore, it is advisable to use specialized software (e.g., the software that came with the scanner, or a 2-D gel image analysis program) that understands the characteristics of the file format. Removing variations in spot positions—warping 7 2002 Fig. 7 a spots b c black color 7 2005 2003 7 2007 Spot detection and quantitation 8 8 9 Fig. 8 a b Fig. 9 a b 2003 The spot detection process can be controlled by setting software-specific parameters, such as expected size of a spot in pixels, or even expected number of spots. Due to ambiguities in the gel images (merged spots, weak spots, noise), automated spot detection can only be a heuristic process in some areas. The user will, therefore, sometimes want to change the spot pattern by removing spots, splitting spot clusters, or joining spots. Manual intervention has a downside as well: Individual users have different perceptions about the “correct” spot shapes, so reproducibility between different operators of the software suffers. It is, therefore, advisable to reduce the necessary manual interventions to a minimum, e.g., by defining points as “markers” for the creation of new/splitting of existing spots and letting the software determine the adapted boundaries. In the simplest case, one assumes that gray values found in the image file are directly proportional to image intensities and, by extension, protein quantity in the small gel area corresponding to the pixel. However, more advanced imaging equipment utilizes calibration information that should be used to arrive at correct quantities: 10 10 Fig. 10 Example of a gray level calibration curve that is used in special image file formats. Gray levels found in the image file have to be interpreted according to the curve before being summed up for quantitation. The curve has lower slope in the low intensity range resulting in better quantitative resolution for weak signals Calibration of the device. Another type of calibration may have to be applied to eliminate variance between imaging devices of the same model. While most of the laser scanners in the market have a built-in autocalibration, many flatbed scanners need to be calibrated manually. This can be done by using calibration wedges which are offered by several resellers, e.g., Stouffer Industries (Mishawaka, IN, USA), Danes-Picta (Praha, Czech Republic), UVP (Upland, CA, USA). Secondary calibration can be applied if the user knows about the relationship of protein amount and measured signal. Finally, one can take into account the dye-specific response curve for protein staining. Protein concentration wedges may help to find a fluorescent or absorptive stain-specific transfer function that may help to derive the protein amount from the emitted light or the measured absorption, respectively. It is to be expected that even different protein species have different response curves resulting from their biochemical properties. 1986 1997 1986 11 Fig. 11 Background subtraction using the rolling ball approach Normalization of spot quantities 12 Fig. 12 a b QQ plot 2007 2005 2004 12 2003 2004 As a result of the spot detection and quantitation step, the user gets a variety of data on each gel spot, including normalized spot quantity, background, spot outline, position of the spot center, and spot quality measures. loss of sample during entry into the IEF gel, efficiency of transfer from first to second dimension, protein loss during staining, staining efficiency, a protein’s staining curve over time, staining curve over concentration, and dye bleaching. 2007 Building expression profiles For comparing the spot intensities over a whole experiment, each spot on a certain gel has to be mapped to the corresponding spots on the other gels in a process called spot matching. The quality of the matching depends on the quality and the reproducibility of electrophoretic separation and spot detection as well as on the methods employed by the software. In the ideal analysis, spot matching would produce exactly one expression profile for every protein species that is visible on any gel. 2000 2005 1 2 2003 2007 Analyzing gene expression 2003 2006 2005 2005 2005 2003 2005 1988 Running differences between gels add a source of errors for spot matching, whereas in microarray data, matching is trivial because every gene is spotted at a known row and column. Spot detection in 2-D gels is much harder because spots may not be distinct. Gene information is not readily available for spots, so it is harder to correlate or cross-validate expression profiles with gene annotations. 2002 2005 www.r-project.org www.bioconductor.org 2003 http://www.tigr.org/software/tm4/mev.html 13 Fig. 13 a b c d e f Hypothesis-driven methods t t U 2001 2007a in the result set 1995 2005 1995 2004 2006 2007b 2005 Hypothesis-independent methods Hypothesis-independent methods were developed for the discovery of patterns in large quantities of possibly high-dimensional data, in the fields of data mining and machine learning. As we expect a small number of fundamental biological processes to be reflected in the expression patterns of a large number of proteins, it makes sense to apply these methods to the analysis of 2-D experiments. Again, much of the work on microarray analysis can be transferred easily because the fundamental unit of data is an expression profile. When using separate spot detections on every gel, the missing values will have to be dealt with by the statistical method, for example, by missing value imputation. A large percentage of missing values decreases the utility of all statistical methods, that is why we recommend using the consensus spot pattern approach described above. 14 2007 Fig. 14 columns rows blue red left-most 1990 1991 2005 2003 15 Fig. 15 a blue b red 2000 2000 Presentation and visualization: from spots to proteome maps 16 Fig. 16 a b middle row left right c 5 6 16 16 2003 35 http://microbio1.biologie.uni-greifswald.de/starv/movie.htm 2004 2006 A related feature was implemented in Proteomweaver (Bio-Rad) that allows for the combination of different narrow pI-gradients into a global proteome map. This composite map utilizes the much better resolution of narrow pH-gradient strips and supports image analysis as if the data came from a single, very wide gel. A proteome map normally serves as basis for further, especially physiologically oriented research and is comparable with a DNA array layout. The proteome map defines at which positions a protein spot was identified and can be recovered during gel analysis. A variety of proteome maps of many kinds of samples is available. Most of them show about a thousand different identified protein spots. Additional data can be attached to spots using labels, e.g., protein identification or functional information. Especially in gel regions with a very dense spot pattern, it is a big challenge to display the protein information without obscuring image information with spot labels. 2006 17 Fig. 17 a B. subtilis 2006 b B. subtilis 17 In contrast to a tabular display of spot quantities, a proteome map retains that spatial relations between spots as well as typical spot forms. It is, thus, easier to relate visually to newly produced gel images. The color coding of spots allows for easy visual identification of interesting subsets of the proteome. Conclusions 2006 http://www.nature.com/nbt/consult www.psidev.info