Before selecting individual probes from either exemplars or consensus sequences, the 5' to 3' orientation of each transcript must be determined. Affymetrix uses computer algorithms that combine information from public annotations with in-house identification of splice signals, polyadenylation sites, and polyadenylation signals to distinguish sense from antisense strands. If the orientation cannot be determined unequivocally due to contradictory information, then the probes for both strands are generated.
In general, 11 to16 probes are selected among all possible 25-mers to represent each transcript. In addition to choosing the probes based on their predicted hybridization properties, candidate sequences are filtered for specificity. Their potential for cross-hybridizing with similar, but unrelated sequences, is evaluated.
To obtain a complete picture of a gene's activity, some probes are selected from regions shared by multiple splice or polyadenylation variants. In other cases, unique probes that distinguish between variants are favored. Inter-probe distance is also factored into the selection process. Probes are 3'-biased to match the target generation characteristics of our sample amplification method, but they are also widely spaced to sample various regions of each transcript and provide robustness of detection.