\begin{align}
\DeclareMathOperator*{\quantile}{quantile}
\DeclareMathOperator*{\median}{median}
\end{align}

\begin{align}
\text{Total count (TC) } a_{ij} &= \frac{c_{ij}}{\sum\limits_{j=1}^{N} c_j}
\end{align}

\begin{align}
\text{Upper Quartile (UQ) } a_{ij} &= \frac{c_{ij}}{\quantile\limits_{c_j > 0}^{0.75} c_j}
\end{align}

\begin{align}
\text{Median (M) } a_{ij} &= \frac{c_{ij}}{\median\limits_{c_j > 0} c_j}
\end{align}

\begin{align}
\text{DESeq } a_{ij} &= \frac{c_{ij}}{\median\limits_{j = 1}^m \frac{c_{ij}}{\frac{1}{m} \sum\limits_{j = 1}^m c_{ij}}}
\end{align}

\begin{align}
\text{TMM } a_{ij} &=
\end{align}

\begin{align}
\text{Quantile: Not explained in the paper}
\end{align}

\begin{align}
\text{RPKM } a_{ij} &= \frac{c_{ij}}{\ell_i \cdot M \cdot 10^{-9}}
\end{align}

\begin{align}
c_{ij} &: \text{read count of gene } i \text{ in sample } j \\
N &: \text{number of genes} \\
m &: \text{number of samples} \\
M &: \text{number of mapped reads}
\end{align}

Real data

</br>Simulation data

Table 1:

SR = single-end read, PE = paired-end read,

D = directional, ND = non-directional

Organism | Type | Number of genes | Replicates per condition | Minimum library size | Maximum library size | Correlation between replicates | Correlation between conditions | % Reads associated with the most expressed gene | Library type | Sequencing machine |
---|---|---|---|---|---|---|---|---|---|---|

h. sapiens |
rna | 26 437 | {3, 3} | 2.0 × 10^{7} |
2.8 × 10^{7} |
(0.98, 0.99) | (0.93, 0.96) | ≈1% | sr 54, nd | gaiix |

a. fumigatus |
rna | 9248 | {2, 2} | 8.6 × 10^{6} |
2.9 × 10^{7} |
(0.92, 0.94) | (0.88, 0.94) | ≈1% | sr 50, d | hiseq2000 |

e. histolytica |
rna | 5277 | {3, 3} | 2.1 × 10^{7} |
3.3 × 10^{7} |
(0.85, 0.92) | (0.81, 0.98) | 6.4–16.2% | pe 100, nd | hiseq2000 |

m. musculus |
mirna | 669 | {3, 2, 2} | 2.0 × 10^{6} |
5.9 × 10^{6} |
(0.95, 0.99) | (0.09, 0.75) | 17.4–51.1% | sr 36, d | gaiix |

A. Distribution of normalized expression

B. Intra-group variance

C. Differential expression analysis</br></br>

z. House-keeping gene variance

Each box corresponds to one sample.

Log-transformed of the ratio of expression levels between 2 conditions (y-axis) vs their Average (x-axis). A value well above/below zero region is indicative of differentially expressed genes

- They are often affected by various factors that are not controlled
- They are usually highly expressed thus not representing genes of low intensities
- HKG are usually a very small subset, so ﬂuctuations in their intensities are highly affected by random or systematic errors
- This use requires the a priori knowledge of the housekeeping genes

False positive rate

Power (Recall)

- Equal library size, no dominant gene
- Non-equal library size, no dominant gene
- Equal library size, some dominant genes

Log-transformed counts vs GC-content.

Each line is one sample, with color-coded conditions (A,B,C).