Let $(\Omega,\mathscr{U},P)$ be a probability space and $X:\Omega\longrightarrow\mathbb{R}^n$ a randome variable. We define an ordering between two vectors in $\mathbb{R}^n$ as follows: Let $x=(x_1,\cdots,x_n),y=(y_1,\cdots,y_n)\in\mathbb{R}^n$. Then $x\leq y$ means $x_i\leq y_i$ for $i=i,\cdots,n$.

*Definition*. The *distribution function* of $X$ is the function $F_X: \mathbb{R}^n\longrightarrow[0,1]$ defined by

$$F_X(x):=P(X\leq x)$$

for all $x\in\mathbb{R}^n$. If $X_1,\cdots,X_m:\Omega\longrightarrow\mathbb{R}^n$ are random variables, their joint distribution function $F_{X_1,\cdots,X_m}:(\mathbb{R}^n)^m\longrightarrow[0,1]$ is defined by

$$F_{X_1,\cdots,X_m}(x_1,\cdots,x_m):=P(X_1\leq x_1,\cdots,X_m\leq x_m)$$

for all $x_i\in\mathbb{R}^n$ and for all $i=1,\cdots,n$.

*Definition*. Let $X$ be a random variable, $F=F_X$ its distribution function. If there exists a nonnegative integrable function $f:\mathbb{R}^n\longrightarrow\mathbb{R}$ such that

$$F(x)=F(x_1,\cdots,x_n)=\int_{-\infty}^{x_1}\cdots\int_{-\infty}^{x_n}f(y_1,\cdots,y_n)dy_1\cdots dy_n$$

then $f$ is called the *density function* for $X$. More generally,

$$P(X\in B)=\int_B f(x)dx$$

for all $B\in\mathscr{B}$ where $\mathscr{B}$ is the Borel $\sigma$-algebra.

*Example*. If $X:\Omega\longrightarrow\mathbb{R}$ has the density function

$$f(x)=\frac{1}{\sqrt{2\pi\sigma^2}}e^{-\frac{|x-m|^2}{2\sigma^2}},\ x\in\mathbb{R}$$

then we say $X$ has a *Gaussian* or *normal* distribution with mean $m$ and variance $\sigma^2$. In this case, we write “*$X$ is an $N(m,\sigma^2)$ random variable*.”

*Example*. If $X: \Omega\longrightarrow\mathbb{R}^n$ has the density

$$f(x)=\frac{1}{\sqrt{(2\pi)^n\det C}}e^{-\frac{1}{2}(x-m)C^{-1}(x-m)^t},\ x\in\mathbb{R}^n$$

for some $m\in\mathbb{R}^n$ and some positive definite symmetric matrix $C$, we say that “*$X$ has a Gaussian or normal distribution with mean $m$ and covariance matrix $C$*.” We write $X$ is an $N(m,C)$ random variable. The *covariance matrix* is given by

\begin{equation}\label{eq:covmatrix}C=E[(X-E(X))^t(X-E(X))]\end{equation}

where $X=(X_1,\cdots,X_n)$, i.e. each $C$ is the matrix whose $(i,j)$ entry is the covariance

$$C_{ij}=\mathrm{cov}(X_i,X_j)=E[(X_i-E(X_i))(X_j-E(X_j))]=E(X_iX_j)-E(X_i)E(X_j)$$

Clearly $C$ is a symmetric matrix. Recall that for a real-valued random matrix $X$ the variance $\sigma^2$ is given by

$$\sigma^2=V(X)=E[(X-E(X))^2]=E[(X-E(X))\cdot (X-E(X))]$$

So one readily sees that \eqref{eq:covmatrix} is a generalization of variance to higher dimensions. It follows from \eqref{eq:covmatrix} that for a vector $b\in\mathbb{R}^n$,

$$V(Xb^t)=bV(X)b^t$$

Since the variance is nonnegative, we see that the covariance matrix is a positive definite matrix. Since $C$ is symmetric, $PCP^{-1}=D$ where $P$ is an orthogonal matrix and $D$ is a diagonal matrix whose main diagonal contains the eigenvalues of $C$. Recall that for two $n\times n$ matrices $A$ and $B$, $\det(AB)=\det(A)\det(B)$ so we see that $\det(C)=\det(D)$. Since all the eigenvalues of a positive definite matrix are positive, $\det(C)>0$.

*Lemma*. Let $X:\Omega\longrightarrow\mathbb{R}^n$ be a random variable and assume that its distribution function $F=F_X$ has the density $f$. Suppose $g:\mathbb{R}^n\longrightarrow\mathbb{R}$ and $Y=g(X)$ is integrable. Then

$$E(Y)=\int_{\mathbb{R}^n}g(x)f(x)dx$$

*Proof*. Suppose first that $g$ is a simple function on $\mathbb{R}^n$.

$$g=\sum_{i=1}^mb_iI_{B_i}\ (B_i\in\mathscr{B})$$

\begin{align*}E(g(X))&=\sum_{i=1}^mb_i\int_{\Omega}I_{B_i}(X)dP\\&=\sum_{i=1}^mb_iP(X\in B_i).\end{align*}

But

\begin{align*}\int_{\mathbb{R}^n}g(x)f(x)dx&=\sum_{i=1}^mb_i\int_{\mathbb{R}^n}I_{B_i}f(x)dx\\&=\sum_{i=1}^nb_i\int_{B_i}f(x)dx\\&=\sum_{i=1}^mb_iP(X\in B_i)\end{align*}

Hence proves the lemma for the case $g$ is a simple function. The rest of the argument extends to general $g$ straightforwardly.

*Corollary*. If $X:\Omega\longrightarrow\mathbb{R}^n$ is a random variable and its distribution function $F=F_X$ has the density $f$, then

$$V(X)=\int_{\mathbb{R}^n}|x-E(X)|^2f(x)dx$$

*Proof*. Recall that $V(X)=E(|X-E(X)|^2)$. Define $g:\mathbb{R}^n\longrightarrow\mathbb{R}$ by

$$g(x)=|x-E(X)|^2$$

for all $x\in\mathbb{R}^n$. Then by the Lemma we have

$$V(X)=\int_{\mathbb{R}^n}|x-E(X)|^2f(x)dx$$

*Corollary*. If $X:\Omega\longrightarrow\mathbb{R}$ is a random variable and its distribution function $F=F_X$ has the density $f$, then $E(X)=\int_{-\infty}^\infty xf(x)dx$ and $V(X)=\int_{-\infty}^\infty |x-E(X)|^2f(x)dx$.

*Proof*. Trivial from the Lemma by taking $g:\mathbb{R}\longrightarrow\mathbb{R}$ the identity map.

*Corollary*. If $X:\Omega\longrightarrow\mathbb{R}^n$ is a random variable and its distribution function $F=F_X$ has the density $f$, then

$$E(X_1\cdots X_n)=\int_{\mathbb{R}^n}x_1\cdots x_nf(x)dx$$

*Proof*. Define $g:\mathbb{R}^n\longrightarrow\mathbb{R}$ by

$$g(x)=x_1\cdots x_n\ \mbox{for all}\ x=(x_1,\cdots,x_n)\in\mathbb{R}^n$$

Then the rest follows by the Lemma.

*Example*. If $X$ is $N(m,\sigma^2)$ then

\begin{align*}

E(X)&=\frac{1}{\sqrt{2\pi\sigma^2}}\int_{-\infty}^\infty xe^{-\frac{(x-m)^2}{2\sigma^2}}dx\\

&=m\\

V(X)&=\frac{1}{\sqrt{2\pi\sigma^2}}\int_{-\infty}^\infty (x-m)^2e^{-\frac{(x-m)^2}{2\sigma^2}}dx\\

&=\sigma^2

\end{align*}

Therefore, $m$ is the mean and $\sigma^2$ is the variance.

*References*:

Lawrence C. Evans, An Introduction to Stochastic Differential Equations, Lecture Notes