Independence

Definition. Let $(\Omega,\mathscr{U},P)$ be a probability space. Let $A,B\in\mathscr{U}$ be two events with $P(B)>0$. $P(A|B)$, the probability of $A$ given $B$ is defined by
$$P(A|B)=\frac{P(A\cap B)}{P(B)}\ \mbox{if}\ P(B)>0$$
If the events $A$ and $B$ are independent,
$$P(A)=P(A|B)=\frac{P(A\cap B)}{P(B)}$$
i.e.
$$P(A\cap B)=P(A)P(B)$$
This is true under the assumption that $P(B)>0$ but we take this for the definition even if $P(B)=0$.

Definition. Two events $A$ and $B$ are independent if
$$P(A\cap B)=P(A)P(B)$$

Definition. Let $X_i:\Omega\longrightarrow\mathbb{R}^n$ be random variables, $i=1,\cdots$. Then random variables $X_1,\cdots$ are said to be independent if $\forall$ integers $k\geq 2$ and $\forall$ choices of Borel sets $B_1,\cdots,B_k\subset\mathbb{R}^n$
\begin{align*}
P(X_1\in B_1,X_2\in B_2,&\cdots,X_k\in B_k)=\\
&P(X_1\in B_1)P(X_2\in B_2)\cdots P(X_k\in B_k)
\end{align*}

Theorem. The random variables $X_1,\cdots,X_,m:\Omega\longrightarrow\mathbb{R}^n$ are independent if and only if
\begin{equation}
\label{eq:indepdistrib}
F_{X_1,\cdots,X_m}(x_1,\cdots,x_m)=F_{X_1}(x_1)\cdots F_{X_m}(x_m)
\end{equation}
$\forall x_1\in\mathbb{R}^n$, $\forall i=1,\cdots,m$. If the random variables have densities, \eqref{eq:indepdistrib} is equivalent to
$$f_{X_1,\cdots,X_m}(x_1,\cdots,x_m)=f_{X_1}(x_1)\cdots f_{X_m}(x_m)$$
$\forall x_i\in\mathbb{R}^n$, $\forall i=1,\cdots,m$, where the function $f$ are the appropriate densities.

Proof. Suppose that $X_1,\cdots,X_m$ are independent. Then
\begin{align*}
F_{X_1,\cdots,X_m}(x_1,\cdots,x_m)&=P(X_1\leq x_1,\cdots, X_m\leq x_m)\\
&=P(X_1\leq x_1)\cdots,P(X_m\leq x_m)\\
&=F_{X_1}(x_1)\cdots F_{X_m}(x_m)
\end{align*}
Let $B_1,B_2,\cdots,B_m\subset\mathbb{R}^n$ be Borel sets. Then
\begin{align*}
P(X_1\in B_1,\cdots,X_m\in B_m)&=\int_{B_1\times\cdots\times B_m}f_{X_1,\cdots,X_m}(x_1,\cdots,x_m)dx_1\cdots x_m\\
&=\left(\int_{B_1}f_{X_1}(x_1)dx_1\right)\cdots\left(\int_{B_m}f_{X_m}(x_m)dx_m\right)\\
&=P(X_1\in B_1)P(X_2\in B_2)\cdots P(X_k\in B_k)
\end{align*}
So, $X_1,\cdots,X_m$ are independent.

Theorem. If $X_1,\cdots,X_m$ are independent real-valued random variables with $E(X_i)<\infty$ ($i=1,\cdots,m$) then $E(X_1\cdots X_m)<\infty$ and
$$E(X_1\cdots X_m)=E(X_1)\cdots E(X_m)$$

Proof.
\begin{align*}
E(X_1\cdots X_m)&=\int_{\mathbb{R}^n}x_1\cdots x_m f_{X_1,\cdots,X_m}(x_1,\cdots,x_m)dx_1\cdots x_m\\
&=\left(\int_{\mathbb{R}}x_1f_{X_1}(x_1)dx_1\right)\cdots\left(\int_{\mathbb{R}}x_mf_{X_m}(x_m)dx_m\right)\\
&=E(X_1)\cdots E(X_m)
\end{align*}

Theorem. If $X_1,\cdots,X_m$ are independent real-valued variables with $V(X_i)<\infty$, $i=1,\cdots,m$ then
$$V(X_1+\cdots+X_m)=V(X_1)+\cdots+V(X_m)$$

Proof. We prove for the case when $m=2$. For general $m$ case the proof follows by induction. Let $m_1=E(X_1)$ and $m_2=E(X_2)$. Then
\begin{align*}
E(X_1+X_2)&=\int_{\Omega}(X_1+X_2)dP\\
&=\int_{\Omega}X_1dP+\int_{\Omega}X_2dP\\
&=E(X_1)+E(X_2)\\
&=m_1+m_2
\end{align*}
\begin{align*}
V(X_1+X_2)&=\int_{\Omega}(X_1+X_2-(m_1+m_2))^2dP\\
&=\int_{\Omega}(X_1-m_1)^2dP+\int_{\Omega}(X_2-m_2)^2dP\\
+2\int_{\Omega}(X_1-m_1)(X_2-m_2)dP\\
&=V(X_1)+V(X_2)+2E[(X_1-m_1)(X_2-m_2)]
\end{align*}
For $X_1,X_2$ being independent, we have $E[(X_1-m_1)(X_2-m_2)]=0$. This completes the proof.

References:

Lawrence C. Evans, An Introduction to Stochastic Differential Equations, Lecture Notes

Distribution Functions

Let $(\Omega,\mathscr{U},P)$ be a probability space and $X:\Omega\longrightarrow\mathbb{R}^n$ a randome variable. We define an ordering between two vectors in $\mathbb{R}^n$ as follows: Let $x=(x_1,\cdots,x_n),y=(y_1,\cdots,y_n)\in\mathbb{R}^n$. Then $x\leq y$ means $x_i\leq y_i$ for $i=i,\cdots,n$.

Definition. The distribution function of $X$ is the function $F_X: \mathbb{R}^n\longrightarrow[0,1]$ defined by
$$F_X(x):=P(X\leq x)$$
for all $x\in\mathbb{R}^n$. If $X_1,\cdots,X_m:\Omega\longrightarrow\mathbb{R}^n$ are random variables, their joint distribution function $F_{X_1,\cdots,X_m}:(\mathbb{R}^n)^m\longrightarrow[0,1]$ is defined by
$$F_{X_1,\cdots,X_m}(x_1,\cdots,x_m):=P(X_1\leq x_1,\cdots,X_m\leq x_m)$$
for all $x_i\in\mathbb{R}^n$ and for all $i=1,\cdots,n$.

Definition. Let $X$ be a random variable, $F=F_X$ its distribution function. If there exists a nonnegative integrable function $f:\mathbb{R}^n\longrightarrow\mathbb{R}$ such that
$$F(x)=F(x_1,\cdots,x_n)=\int_{-\infty}^{x_1}\cdots\int_{-\infty}^{x_n}f(y_1,\cdots,y_n)dy_1\cdots dy_n$$
then $f$ is called the density function for $X$. More generally,
$$P(X\in B)=\int_B f(x)dx$$
for all $B\in\mathscr{B}$ where $\mathscr{B}$ is the Borel $\sigma$-algebra.

Example. If $X:\Omega\longrightarrow\mathbb{R}$ has the density function
$$f(x)=\frac{1}{\sqrt{2\pi\sigma^2}}e^{-\frac{|x-m|^2}{2\sigma^2}},\ x\in\mathbb{R}$$
then we say $X$ has a Gaussian or normal distribution with mean $m$ and variance $\sigma^2$. In this case, we write “$X$ is an $N(m,\sigma^2)$ random variable.”

Example. If $X: \Omega\longrightarrow\mathbb{R}^n$ has the density
$$f(x)=\frac{1}{\sqrt{(2\pi)^n\det C}}e^{-\frac{1}{2}(x-m)C^{-1}(x-m)^t},\ x\in\mathbb{R}^n$$
for some $m\in\mathbb{R}^n$ and some positive definite symmetric matrix $C$, we say that “$X$ has a Gaussian or normal distribution with mean $m$ and covariance matrix $C$.” We write $X$ is an $N(m,C)$ random variable. The covariance matrix is given by
\begin{equation}\label{eq:covmatrix}C=E[(X-E(X))^t(X-E(X))]\end{equation}
where $X=(X_1,\cdots,X_n)$, i.e. each $C$ is the matrix whose $(i,j)$ entry is the covariance
$$C_{ij}=\mathrm{cov}(X_i,X_j)=E[(X_i-E(X_i))(X_j-E(X_j))]=E(X_iX_j)-E(X_i)E(X_j)$$
Clearly $C$ is a symmetric matrix. Recall that for a real-valued random matrix $X$ the variance $\sigma^2$ is given by
$$\sigma^2=V(X)=E[(X-E(X))^2]=E[(X-E(X))\cdot (X-E(X))]$$
So one readily sees that \eqref{eq:covmatrix} is a generalization of variance to higher dimensions. It follows from \eqref{eq:covmatrix} that for a vector $b\in\mathbb{R}^n$,
$$V(Xb^t)=bV(X)b^t$$
Since the variance is nonnegative, we see that the covariance matrix is a positive definite matrix. Since $C$ is symmetric, $PCP^{-1}=D$ where $P$ is an orthogonal matrix and $D$ is a diagonal matrix whose main diagonal contains the eigenvalues of $C$. Recall that for two $n\times n$ matrices $A$ and $B$, $\det(AB)=\det(A)\det(B)$ so we see that $\det(C)=\det(D)$. Since all the eigenvalues of a positive definite matrix are positive, $\det(C)>0$.

Lemma. Let $X:\Omega\longrightarrow\mathbb{R}^n$ be a random variable and assume that its distribution function $F=F_X$ has the density $f$. Suppose $g:\mathbb{R}^n\longrightarrow\mathbb{R}$ and $Y=g(X)$ is integrable. Then
$$E(Y)=\int_{\mathbb{R}^n}g(x)f(x)dx$$

Proof. Suppose first that $g$ is a simple function on $\mathbb{R}^n$.
$$g=\sum_{i=1}^mb_iI_{B_i}\ (B_i\in\mathscr{B})$$
\begin{align*}E(g(X))&=\sum_{i=1}^mb_i\int_{\Omega}I_{B_i}(X)dP\\&=\sum_{i=1}^mb_iP(X\in B_i).\end{align*}
But
\begin{align*}\int_{\mathbb{R}^n}g(x)f(x)dx&=\sum_{i=1}^mb_i\int_{\mathbb{R}^n}I_{B_i}f(x)dx\\&=\sum_{i=1}^nb_i\int_{B_i}f(x)dx\\&=\sum_{i=1}^mb_iP(X\in B_i)\end{align*}
Hence proves the lemma for the case $g$ is a simple function. The rest of the argument extends to general $g$ straightforwardly.

Corollary. If $X:\Omega\longrightarrow\mathbb{R}^n$ is a random variable and its distribution function $F=F_X$ has the density $f$, then
$$V(X)=\int_{\mathbb{R}^n}|x-E(X)|^2f(x)dx$$

Proof. Recall that $V(X)=E(|X-E(X)|^2)$. Define $g:\mathbb{R}^n\longrightarrow\mathbb{R}$ by
$$g(x)=|x-E(X)|^2$$
for all $x\in\mathbb{R}^n$. Then by the Lemma we have
$$V(X)=\int_{\mathbb{R}^n}|x-E(X)|^2f(x)dx$$

Corollary. If $X:\Omega\longrightarrow\mathbb{R}$ is a random variable and its distribution function $F=F_X$ has the density $f$, then $E(X)=\int_{-\infty}^\infty xf(x)dx$ and $V(X)=\int_{-\infty}^\infty |x-E(X)|^2f(x)dx$.

Proof. Trivial from the Lemma by taking $g:\mathbb{R}\longrightarrow\mathbb{R}$ the identity map.

Corollary. If $X:\Omega\longrightarrow\mathbb{R}^n$ is a random variable and its distribution function $F=F_X$ has the density $f$, then
$$E(X_1\cdots X_n)=\int_{\mathbb{R}^n}x_1\cdots x_nf(x)dx$$

Proof. Define $g:\mathbb{R}^n\longrightarrow\mathbb{R}$ by
$$g(x)=x_1\cdots x_n\ \mbox{for all}\ x=(x_1,\cdots,x_n)\in\mathbb{R}^n$$
Then the rest follows by the Lemma.

Example. If $X$ is $N(m,\sigma^2)$ then
\begin{align*}
E(X)&=\frac{1}{\sqrt{2\pi\sigma^2}}\int_{-\infty}^\infty xe^{-\frac{(x-m)^2}{2\sigma^2}}dx\\
&=m\\
V(X)&=\frac{1}{\sqrt{2\pi\sigma^2}}\int_{-\infty}^\infty (x-m)^2e^{-\frac{(x-m)^2}{2\sigma^2}}dx\\
&=\sigma^2
\end{align*}
Therefore, $m$ is the mean and $\sigma^2$ is the variance.
References:

Lawrence C. Evans, An Introduction to Stochastic Differential Equations, Lecture Notes

Probability Measure

In this lecture notes, we study basic measure theory in terms of probability. If you want to learn more about general measure theory, I recommend [2].

Let $\Omega$ be a set whose elements will be called samples.

Definition. A $\sigma$-algebra is a collection $\mathscr{U}$ of subsets of $\Omega$ satisfying

  1. $\varnothing,\Omega\in\mathscr{U}$
  2. If $A\in\mathscr{U}$, then $A^c\in\mathscr{U}$
  3. If $A_1,A_2,\cdots\in\mathscr{U}$, then $\bigcup_{k=1}^\infty A_k,\bigcap_{k=1}^\infty A_k\in\mathscr{U}$

Note: In condition 3, it suffices to say if $A_1,A_2,\cdots\in\mathscr{U}$, then $\bigcup_{k=1}^\infty A_k\in\mathscr{U}$ or if $A_1,A_2,\cdots\in\mathscr{U}$, then $\bigcap_{k=1}^\infty A_k\in\mathscr{U}$. For example, lets assume that if $A_1,A_2,\cdots\in\mathscr{U}$, then $\bigcup_{k=1}^\infty A_k\in\mathscr{U}$. Let $A_1,A_2,\cdots\in\mathscr{U}$. Then by condition 2, $(A_1)^c,(A_2)^c,\cdots\in\mathscr{U}$ so we have $\bigcup_{k=1}^\infty (A_k)^c\in\mathscr{U}$. By condition 2 again with De Morgan’s laws, this means $\bigcap_{k=1}^\infty A_k=\left[\bigcup_{k=1}^\infty (A_k)^c\right]^c\in\mathscr{U}$.

Definition. Let $\mathscr{U}$ be a $\sigma$-algebra of subsets of $\Omega$. A map $P:\mathscr{U}\longrightarrow[0,1]$ a probability measure if $P$ satisfies

  1. $P(\varnothing)=0$, $P(\Omega)=1$
  2. If $A_1,A_2,\cdots\in\mathscr{U}$, then $$P\left(\bigcup_{k=1}^\infty A_k\right)\leq\sum_{k=1}^\infty P(A_k)$$
  3. If $A_1,A_2,\cdots\in\mathscr{U}$ are mutually disjoint, then $$P\left(\bigcup_{k=1}^\infty A_k\right)=\sum_{k=1}^\infty P(A_k)$$

Proposition. Let $A,B\in\mathscr{U}$. If $A\subset B$ then $P(A)\leq P(B)$.

Proof. Let $A,B\in\mathscr{U}$ with $A\subset B$. Then $B=(B-A)\dot\cup A$ where $\dot\cup$ denotes disjoint union. So by condition 3, $P(B)=P(B-A)+P(A)\geq P(A)$ since $P(B-A)\geq 0$.

Definition. A triple $(\Omega,\mathscr{U},P)$ is called a probability space. We say $A\in\mathscr{U}$ is an event and $P(A)$ is the probability of the event $A$. A property which is true except for an event of probability zero is said to hold almost surely (abbreviated “a.s.”).

Example. The smallest $\sigma$-algebra containing all the open subsets of $\mathbb{R}^n$ is called the Borel $\sigma$-algebra and is denoted by $\mathscr{B}$. Here we mean “open subsets” in terms of the usual Euclidean topology on $\mathbb{R}^n$. Since $\mathbb{R}^n$ with the Euclidean topology is second countable, the “open subsets” can be replaced by “basic open subsets”. Assume that a function $f$ is nonnegative, integrable (whatever that means, we will talk about it later) such that $\int_{\mathbb{R}^n}f(x)dx=1$. Define
$$P(B)=\int_Bf(x)dx$$ for each $B\in\mathscr{B}$. Then $(\mathbb{R}^n,\mathscr{B},P)$ is a probability space. The function $f$ is called the density of the probability measure $P$.

Definition. Let $(\Omega,\mathscr{U},P)$ be a probability space. A mapping $X:\Omega\longrightarrow\mathbb{R}^n$ is called an $n$-dimensional random variable if for each $B\in\mathscr{B}$, $X^{-1}(B)\in\mathscr{U}$. Equivalently we also say $X$ is $\mathscr{U}$-measurable. The probability space $(\Omega,\mathscr{U},P)$ is a mathematical construct that we cannot observe directly. But the values $X(\omega)$, $\omega\in\Omega$ of random variable $X$ are observables. Following customary notations in probability theory, we write $X(\omega)$ simply by $X$. Also $P(X^{-1}(B))$ is denoted by $P(X\in B)$.

Definition. Let $A\in\mathscr{U}$. Then the indicator $I_A: \Omega\longrightarrow\{0,1\}$ of $A$ is defined by $$I_A(\omega)=\left\{\begin{array}{ccc}1 & \mbox{if} & \omega\in A\\0 & \mbox{if} & \omega\not\in A\end{array}\right.$$
In measure theory the indicator of $A$ is also called the characteristic function of $A$ and is usually denoted by $\chi_A$. Here we reserve the term “characteristic function” for something else. Clearly the indicator is a random variable since both $\{0\},\{1\}$ are open. The Borel $\sigma$-algebra $\mathscr{B}$ coincides with the discrete topology on $\{0,1\}$. Or without mentioning subspace topology, let $B\in\mathscr{B}$, the Borel $\sigma$-algebra of $\mathbb{R}$. If $0\in B$ and $1\notin B$ then $I_A^{-1}(B)=A^c\in\mathscr{U}$. If $0\notin B$ and $1\in B$ then $I_A^{-1}(B)=A\in\mathscr{U}$. If $0,1\notin B$ then $I_A^{-1}(B)=\varnothing\in\mathscr{U}$. If $0,1\in B$ then $I_A^{-1}(B)=\Omega\in\mathscr{U}$.

If $A_1,A_2,\cdots,A_m\in\mathscr{U}$ with $\Omega=\bigcup_{i=1}^m A_i$ and $a_1,a_2,\cdots,a_m\in\mathbb{R}$, then
$$X=\sum_{i=1}^m a_iI_{A_i}$$ is a random variable called a simple function.

Simple function

Simple function

Lemma. Let $X: \Omega\longrightarrow\mathbb{R}^n$ be a random variable. Then
$$\mathscr{U}(X)=\{X^{-1}(B): B\in\mathscr{B}\}$$ is the smallest $\sigma$-algebra with respect to which $X$ is measurable. $\mathscr{U}(X)$ is called the $\sigma$-algebra generated by $X$.

Definition. A collection $\{X(t)|t\geq 0\}$ of random variables parametrized by time $t$ is called a stochastic process. For each $\omega\in\Omega$, the map $t\longmapsto X(t,\omega)$ is the corresponding sample path.

Let $(\Omega,\mathscr{U},P)$ be a probability space and $X=\sum_{i=1}^k a_iI_{A_i}$ a simple random variable. The probability that $X=a_i$ is $P(X=a_i)=P(X^{-1}(a_i))=P(A_i)$, so $\sum_{i=1}^k a_iP(A_i)$ is the expected value of $X$. We define the integral of $X$ by
\begin{equation}\label{eq:integral}\int_{\Omega}XdP=\sum_{i=1}^k a_iP(A_i)\end{equation}
if $X$ is a simple random variable. A random variable is not necessarily simple so we obviously want to extend the notion of integral to general random variables. First suppose that $X$ is a nonnegative random variable. Then we define
\begin{equation}\label{eq:integral2}\int_{\Omega}XdP=\sup_{Y\leq X,\ Y\ \mbox{simple}}\int_{\Omega}YdP\end{equation}
Let $X$ be a random variable. Let $X^+=\max\{X,0\}$ and $X^-=\max\{-X,0\}$. Then $X=X^+-X^-$. Define
\begin{equation}\label{eq:integral3}\int_{\Omega}XdP=\int_{\Omega}X^+dP-\int_{\Omega}X^-dP\end{equation}For a random variable $X$, we would still call the integral \eqref{eq:integral3} the expected value of $X$ and denote it by $E(X)$. This integral is called Lebesgue integral in real analysis (see [2]). When I first learned Lebesgue integral in my senior year in college, it wasn’t very clear to me as to what motivated one to define Lebesgue integral the way it is. In terms of probability the motivation is so much clear. I personally think that it would be better if we introduce Lebesgue integral to undergraduate students in the context of probability theory rather than  abstract real analysis. If $X:\Omega\longrightarrow\mathbb{R}^n$ is a vector-valued random variable and $X=(X_1,X_2,\cdots,X_n)$, we define $$\int_{\Omega}XdP=\left(\int_{\Omega}X_1dP,\int_{\Omega}X_2dP,\cdots,\int_{\Omega}X_ndP\right)$$As one would expect from an integral, the expected value $E(\cdot)$ is linear.

Definition. We call $$V(X)=\int_{\Omega}|X-E(X)|^2dP$$the variance of $X$.

It follows from the linearity of $E(\cdot)$ that $$V(X)=E(|X-E(X)|^2)=E(|X|^2)-|E(X)|^2$$

Lemma. If $X$ is a random variable and $1\leq p<\infty$, then \begin{equation}\label{eq:chebyshev}P(|X|\geq\lambda)\leq\frac{1}{\lambda^p}E(|X|^p)\end{equation}for all $\lambda>0$. The inequality \eqref{eq:chebyshev} is called Chebyshev’s inequality.

Proof. Since $1\leq p<\infty$, $|X|\geq\lambda\Rightarrow |X|^p\geq\lambda^p$. So, \begin{align*}E(|X|^p)&=\int_{\Omega}|X|^pdP\\&\geq\int_{|X|\geq\lambda}|X|^pdP\\
&\geq\lambda^p\int_{|X|\geq\lambda}dP\\&=\lambda^pP(|X|\geq\lambda).\end{align*}

Example. Let a random variable $X$ have the probability density function $$f(x)=\left\{\begin{array}{ccc}\frac{1}{2\sqrt{3}} & \mbox{if} & -\sqrt{3}<x<\sqrt{3}\\ 0 & \mbox{elsewhere}
\end{array}\right.$$For $p=1$ and $\lambda=\frac{3}{2}$, $\frac{1}{\lambda}E(|X|)=\frac{1}{\sqrt{3}}\approx 0.58$. Note that $E(|X|)=\int_{-\infty}^\infty |x|f(x)dx$. (We will discuss this later.) $P(|X|\geq\frac{3}{2})=1-\int_{-\frac{3}{2}}^{\frac{3}{2}}f(x)dx=1-\frac{\sqrt{3}}{2}=0.134$. Hence we confirm Chebyshev’s inequality.
References: Not in particular order

  1. Lawrence C. Evans, An Introduction to Stochastic Differential Equations, Lecture Notes
  2. H. L. Royden, Real Analysis, Second Edition, Macmillan
  3. Robert V. Hogg, Joseph W. McKean, Allen T. Craig, Introduction to Mathematical Statistics, Sixth Edition, Pearson

Itô’s Formula

Let us consider the 1-dimensional case ($n=1$) of the Stochastic Equation (4) from the last post
\begin{equation}\label{eq:sd3}dX=b(X)dt+dW\end{equation} with $X(0)=0$.
Let $u: \mathbb{R}\longrightarrow\mathbb{R}$ be a smooth function and $Y(t)=u(X(t))$ ($t\geq 0$). What we learned in calculus (the chain rule) would dictate us that $dY$ is
$$dY=u’dX=u’bdt+u’dW,$$
where $’=\frac{d}{dx}$. It may come to you as a surprise to hear this but this is not correct. First by Taylor series expansion we obtain
\begin{align*}
dY&=u’dX+\frac{1}{2}u^{\prime\prime}(dX)^2+\cdots\\
&=u’(bdt+dW)+\frac{1}{2}u^{\prime\prime}(bdt+dW)^2+\cdots
\end{align*}
Now we introduce the following striking formula
\begin{equation}\label{eq:wiener2}(dW)^2=dt\end{equation}
The proof of \eqref{eq:wiener2} is beyond the scope of this notes and so it won’t be given now or ever. However it can be found, for example, in [2]. Using \eqref{eq:wiener2} $dY$ can be written as
$$dY=\left(u’b+\frac{1}{2}u^{\prime\prime}\right)dt+u’dW+\cdots$$
The terms beyond $u’dW$ are of order $(dt)^{\frac{3}{2}}$ and higher. Neglecting these terms, we have
\begin{equation}\label{eq:sd4}dY=\left(u’b+\frac{1}{2}u^{\prime\prime}\right)dt+u’dW\end{equation}
\eqref{eq:sd4} is the stochastic differential equation satisfied by $Y(t)$ and it is called the Itô’s Formula named after a Japanese mathematician Kiyosi Itô.

Example. Let us consider the stochastic differential equation
\begin{equation}\label{eq:sd5}dY=YdW,\ Y(0)=1\end{equation}
Comparing \eqref{eq:sd4} and \eqref{eq:sd5}, we obtain
\begin{align}\label{eq:sd5a}
u’b+\frac{1}{2}u^{\prime\prime}&=0\\\label{eq:sd5b}u’&=u\end{align}
The equation \eqref{eq:sd5b} along with the initial condition $Y(0)=1$ results $u(X(t))=e^{X(t)}$. Using this $u$ with equation \eqref{eq:sd5a} we get $b=-\frac{1}{2}$ and so the equation \eqref{eq:sd3} becomes
$$dX=-\frac{1}{2}dt+dW$$
in which case $X(t)=-\frac{1}{2}t+W(t)$. Hence, we find $Y(t)$ as
$$Y(t)=e^{-\frac{1}{2}t+W(t)}$$

Example. Let $P(t)$ denote the price of a stock at time $t\geq 0$. A standard model assumes that the relative change of price $\frac{dP}{P}$ evolves according to the stochastic differential equation
\begin{equation}\label{eq:relprice}\frac{dP}{P}=\mu dt+\sigma dW\end{equation}
where $\mu>0$ and $\sigma$ are constants called the drift and the volatility of the stock, respectively. Again using Itô’s formula similarly to what we did in the previous example, we find the price function $P(t)$ which is the solution of
$$dP=\mu Pdt+\sigma PdW,\ P(0)=p_0$$
as
$$P(t)=p_0\exp\left[\left(\mu-\frac{1}{2}\sigma^2\right)\right]t+\sigma W(t).$$

References:

1. Lawrence C. Evans, An Introduction to Stochastic Differential Equations, Lecture Notes

2. Bernt Øksendal, Stochastic Differential Equations, An Introduction with Applications, 5th Edition, Springer, 2000

What is a Stochastic Differential Equation?

Consider the population growth model
\begin{equation}\label{eq:popgrowth}\frac{dN}{dt}=a(t)N(t),\ N(0)=N_0\end{equation}
where $N(t)$ is the size of a population at time $t$ and $a(t)$ is the relativive growth rate at time $t$. If $a(t)$ is completely known, one can easily solve \eqref{eq:popgrowth}. In fact, the solution would be $N(t)=N_0\exp\left(\int_0^t a(t)dt\right)$. Now suppose that $a(t)$ is not completely known but it can be written as $a(t)=r(t)+\mbox{noise}$. We do not know the exact behavior of noise but only its probability distribution. Such a case equations like \eqref{eq:popgrowth} is called a stochastic differential equation. More genrally, a stochastic differential equation can be written as
\begin{equation}\label{eq:sd}\frac{dX}{dt}=b(X(t))+B(X(t))\xi(t)\ (t>0),\ X(0)=x_0,\end{equation}
where $b: \mathbb{R}^n\longrightarrow\mathbb{R}^n$ is a smooth vector field and $X: [0,\infty)\longrightarrow\mathbb{R}^n$, $B: \mathbb{R}^n\longrightarrow\mathbb{M}^{n\times m}$ and $\xi(t)$ is an $m$-dimensional white noise. If $m=n$, $x_0=0$, $b=0$ and $B=I$, then \eqref{eq:sd} turns into
\begin{equation}\label{eq:wiener}\frac{dX}{dt}=\xi(t),\ X(0)=0\end{equation}
The solution of \eqref{eq:wiener} is denoted by $W(t)$ and is called the $n$-dimensional Wiener process or Brownian motion. In other words, white noise $\xi(t)$ is the time derivative of the Wiener process. Replace $\xi(t)$ in \eqref{eq:sd} by $\frac{W(t)}{dt}$ and divide the resulting equation by $dt$. Then we obtain
\begin{equation}\label{eq:sd2}dX(t)=b(X(t))dt+B(X(t))dW(t),\ X(0)=x_0\end{equation}
The stochastic differential equation \eqref{eq:sd2} is solved symbolically as
\begin{equation}\label{eq:sdsol}X(t)=x_0+\int_0^tb(X(s))ds+\int_0^tb(X(s))dW(s)\end{equation}
for all $t>0$. In order to make sense of $X(t)$ in \eqref{eq:sdsol} we will have to know what $W(t)$ is and what the integral $\int_0^tb(X(s))dW(s)$, which is called a stochastic integral, means.

References:

  1. Lawrence C. Evans, An Introduction to Stochastic Differential Equations, Lecture Notes
  2. Bernt Øksendal, Stochastic Differential Equations, An Introduction with Applications, 5th Edition, Springer, 2000

The Curvature of a Surface in Euclidean 3-Space $\mathbb{R}^3$

In here, it is seen that the curvature of a unit speed parametric curve $\alpha(t)$ in $\mathbb{R}^3$ can be measured by its acceleration $\ddot\alpha(t)$. In this case, the acceleration happens to be a normal vector field along the curve. Now we turn our attention to surfaces in Euclidean 3-space $\mathbb{R}^3$ and we would like to devise a way to measure the bending of a surface in $\mathbb{R}^3$, and it may be achieved by studying the change of a unit normal vector field on the surface. To study the change of a unit normal vector field on a surface, we need to be able to differentiate vector fields. But first let us review the directional derivative you learned in mutilvariable calculus. Let $f:\mathbb{R}^3\longrightarrow\mathbb{R}$ be a differentiable function and $\mathbf{v}$ a tangent vector to $\mathbb{R}^3$ at $\mathbf{p}$. Then the directional derivative of $f$ in the $\mathbf{v}$ direction at $\mathbf{p}$ is defined by
\begin{equation}
\label{eq:directderiv}
\nabla_{\mathbf{v}}f=\left.\frac{d}{dt}f(\mathbf{p}+t\mathbf{v})\right|_{t=0}.
\end{equation}
By chain rule, the directional derivative can be written as
\begin{equation}
\label{eq:directderiv2}
\nabla_{\mathbf{v}}f=\nabla f(\mathbf{p})\cdot\mathbf{v},
\end{equation}
where $\nabla f$ denotes the gradient of $f$
$$\nabla f=\frac{\partial f}{\partial x_1}E_1(\mathbf{p})+\frac{\partial f}{\partial x_2}E_2(\mathbf{p})+\frac{\partial f}{\partial x_3}E_3(\mathbf{p}),$$
where $E_1, E_2, E_3$ denote the standard orthonormal frame in $\mathbb{R}^3$. The directional derivative satisfies the following properties.

Theorem. Let $f,g$ be real-valued differentiable functions on $\mathbb{R}^3$, $\mathbf{v},\mathbf{w}$ tangent vectors to $\mathbb{R}^3$ at $\mathbf{p}$, and $a,b\in\mathbb{R}$. Then

  1. $\nabla_{a\mathbf{v}+b\mathbf{w}}f=a\nabla_{\mathbf{v}}f+b\nabla_{\mathbf{w}}f$
  2. $\nabla_{\mathbf{v}}(af+bg)=a\nabla_{\mathbf{v}}f+b\nabla_{\mathbf{v}}g$
  3. $\nabla_{\mathbf{v}}fg=(\nabla_{\mathbf{v}}f)g(\mathbf{p})+f(\mathbf{p})\nabla_{\mathbf{v}}g$

The properties 1 and 2 are linearity and the property 3 is Leibniz rule. The directional derivative \eqref{eq:directderiv} can be generalized to the covariant derivative $\nabla_{\mathbf{v}}X$ of a vector field $X$ in the direction of a tangent vector $\mathbf{v}$ at $\mathbf{p}$:
\begin{equation}
\label{eq:covderiv}
\nabla_{\mathbf{v}}X=\left.\frac{d}{dt}X(\mathbf{p}+t\mathbf{v})\right|_{t=0}.
\end{equation}
Let $X=x_1E_1+x_2E_2+x_2E_3$ in terms of the standard orthonormal frame $E_1,E_2,E_3$. Then $\nabla_{\mathbf{v}}X$ can be written as
\begin{equation}
\label{eq:covderiv2}
\nabla_{\mathbf{v}}X=\sum_{j=1}^3\nabla_{\mathbf{v}}x_jE_j.
\end{equation}
Here, $\nabla_{\mathbf{v}}x_j$ is the directional derivative of the $j$-th component function of the vector field $X$ in the $\mathbf{v}$ direction as defined in \eqref{eq:directderiv}. The covariant derivative satisfies the following properties.

Theorem. Let $X,Y$ be vector fields on $\mathbb{R}^3$, $\mathbf{v},\mathbf{w}$ tangent vectors at $\mathbf{p}$, $f$ a real-valued function on $\mathbb{R}^3$, and $a,b$ scalars. Then

  1. $\nabla_{\mathbf{v}}(aX+bY)=a\nabla_{\mathbf{v}}X+b\nabla_{\mathbf{v}}Y$
  2. $\nabla_{a\mathbf{v}+b\mathbf{w}}X=a\nabla_{\mathbf{v}}X+b\nabla_{\mathbf{v}}X$
  3. $\nabla_{\mathbf{v}}(fX)=(\nabla_{\mathbf{v}}f)X(\mathbf{p})+f(\mathbf{p})\nabla_{\mathbf{v}}X$
  4. $\nabla_{\mathbf{v}}(X\cdot Y)=(\nabla_{\mathbf{v}}X)\cdot Y+X\cdot\nabla_{\mathbf{v}}Y$

The properties 1 and 2 are linearity and the properties 3 and 4 are Leibniz rules.

Hereafter, I assume that surfaces are orientable and have nonvanishing normal vector fields. Let $\mathcal{M}\subset\mathbb{R}^3$ be a surface and $p\in\mathcal{M}$. For each $\mathbf{v}\in T_p\mathcal{M}$, define
\begin{equation}
\label{eq:shape}
S_p(\mathbf{v})=-\nabla_{\mathbf{v}}N,
\end{equation}
where $N$ is a unit normal vector field on a neighborhood of $p\in\mathcal{M}$. Since $N\cdot N=1$, $(\nabla_{\mathbf{v}}N)\cdot N=-2S_p(\mathbf{v})\cdot N=0$. This means that $S_p(\mathbf{v})\in T_p\mathcal{M}$. Thus, \eqref{eq:shape} defines a linear map $S_p: T_p\mathcal M\longrightarrow T_p\mathcal{M}$. $S_p$ is called the shape operator of $\mathcal{M}$ at $p$ (derived from $N$).For each $p\in\mathcal{M}$, $S_p$ is a symmetric operator, i.e.,
$$\langle S_p(\mathbf{v}),\mathbf{w}\rangle=\langle S_p(\mathbf{w}),\mathbf{v}\rangle$$
for any $\mathbf{v},\mathbf{w}\in T_p\mathcal{M}$.

Let us assume that $\mathcal{M}\subset\mathbb{R}^3$ is a regular surface so that any differentiable curve $\alpha: (-\epsilon,\epsilon)\longrightarrow\mathcal{M}$ is a regular curve, i.e., $\dot\alpha(t)\ne 0$ for every $t\in(-\epsilon,\epsilon)$. If $\alpha$ is a differentiable curve in $\mathcal{M}\subset\mathbb{R}^3$, then
\begin{equation}
\label{eq:acceleration}
\langle\ddot\alpha,N\rangle=\langle S(\dot\alpha),\dot\alpha\rangle.
\end{equation}
$\langle\ddot\alpha,N\rangle$ is the normal component of the acceleration $\ddot\alpha$ to the surface $\mathcal{M}$. \eqref{eq:acceleration} says the normal component of $\ddot\alpha$ depends only on the shape operator $S$ and the velocity $\dot\alpha$. If $\alpha$ is represented by arc-length, i.e., $|\dot\alpha|=1$, then we get a measurement of the way $\mathcal{M}$ is bent in the $\dot\alpha$ direction. Hence we have the following definition:

Definition. Let $\mathbf{u}$ be a unit tangent vector to $\mathcal{M}\subset\mathbb{R}^3$ at $p$. Then the number $\kappa(\mathbf{u})=\langle S(\mathbf{u}),\mathbf{u}\rangle$ is called the normal curvature of $\mathcal{M}$ in $\mathbf{u}$ direction. The normal curvature $\kappa$ can be considered as a continuous function on the unit circle $\kappa: S^1\longrightarrow\mathbb{R}$. Since $S^1$ is compact (closed and bounded), $\kappa$ attains a maximum and a minimum values, say $\kappa_1$, $\kappa_2$, respectively. $\kappa_1$, $\kappa_2$ are called the principal curvatures of $\mathcal{M}$ at $p$. The principal curvatures $\kappa_1$, $\kappa_2$ are the eigenvalues of the shape operator $S$ and $S$ can be written as the $2\times 2$ matrix
\begin{equation}
\label{eq:shape2}
S=\begin{pmatrix}
\kappa_1 & 0\\
0 & \kappa_2
\end{pmatrix}.
\end{equation}
The arithmetic mean $H$ and the squared Gau{\ss}ian mean $K$ of $\kappa_1$, $\kappa_2$
\begin{align}
\label{eq:mean}
H&=\frac{\kappa_1+\kappa_2}{2}=\frac{1}{2}\mathrm{tr}S,\\
\label{eq:gauss}
K&=\kappa_1\kappa_2=\det S
\end{align}
are called, respectively, the mean and the Gaußian curvatures of $\mathcal{M}$. The definitions \eqref{eq:mean} and \eqref{eq:gauss} themselves however are not much helpful for calculating the mean and the Gaußian curvatures of a surface. We can compute the mean and the Gaußian curvatures of a parametric regular surface $\varphi: D(u,v)\longrightarrow\mathbb{R}^3$ using Gauß’ celebrated formulas
\begin{align}
\label{eq:mean2}
H&=\frac{G\ell+En-2Fm}{2(EG-F^2)},\\
\label{eq:gauss2}
K&=\frac{\ell n-m^2}{EG-F^2},
\end{align}
where
\begin{align*}
E&=\langle\varphi_u,\varphi_u\rangle,\ F=\langle\varphi_u,\varphi_v\rangle,\ G=\langle\varphi_v,\varphi_v\rangle,\\
\ell&=\langle N,\varphi_{uu}\rangle,\ m=\langle N,\varphi_{uv}\rangle,\ n=\langle N,\varphi_{vv}\rangle.
\end{align*}
It is straightforward to verify that
\begin{equation}
\label{eq:normal}
|\varphi_u\times\varphi_v|^2=EG-F^2.
\end{equation}

Example. Compute the Gaußian and the mean curvatures of helicoid
$$\varphi(u,v)=(u\cos v,u\sin v, bv),\ b\ne 0.$$

helicoid

Helicoid

Solution. \begin{align*}
\varphi_u&=(\cos v,\sin v,0),\ \varphi_v=(-u\sin v,u\cos v,b),\\
\varphi_{uu}&=0,\ \varphi_{uv}=(-\sin v,\cos v,0),\ \varphi_{vv}=(-u\cos v,-u\sin v,0).
\end{align*}
$E$, $F$ and $G$ are calculated to be
$$E=1,\ F=0,\ G=b^2+u^2.$$
$\varphi_u\times\varphi_v=(b\sin v,-b\cos v,u)$, so the unit normal vector field $N$ is given by
$$N=\frac{\varphi_u\times\varphi_v}{\sqrt{EG-F^2}}=\frac{(b\sin v,-b\cos v,u)}{\sqrt{b^2+u^2}}.$$
Next, $\ell, m,n$ are calculated to be
$$\ell=0,\ m=-\frac{b}{\sqrt{b^2+u^2}},\ n=0.$$
Finally we find the Gaußian curvature $K$ and the mean curvature $H$:
\begin{align*}
K&=\frac{\ell n-m^2}{EG-F^2}=-\frac{b^2}{(b^2+u^2)^2},\\
H&=\frac{G\ell+En-2Fm}{2(EG-F^2)}=0.
\end{align*}
Surfaces with $H=0$ are called minimal surfaces.

For further reading on the topic I discussed here, I recommend:

Barrett O’Neil, Elementary Differential Geometry, Academic Press, 1967

Trigonometric Integrals

Let us attempt to calculate $\int\cos^n xdx$ where $n$ is a positive integer. In the following table, the first column represents $\cos^{n-1}x$ and its derivative, and the second column represents $\cos x$ and its integral.
$$\begin{array}{ccc}
\cos^{n-1}x & & \cos x\\
&\stackrel{+}{\searrow}&\\
-(n-1)\cos^{n-2}x\sin x & \stackrel{-}{\longrightarrow} & \sin x\\
\end{array}$$
By integration by parts, we have
\begin{align*}
\int\cos^n xdx&=\cos^{n-1}x\sin x+(n-1)\int\cos^{n-2}x\sin^2xdx\\
&=\cos^{n-1}x\sin x+(n-1)\int\cos^{n-2}xdx-(n-1)\int\cos^{n-1}xdx+C’
\end{align*}
where $C’$ is a constant. Solving this for $\int\cos^nxdx$, we obtain
\begin{equation}
\label{eq:cosred}
\int\cos^n xdx=\frac{1}{n}\cos^{n-1}x\sin x+\frac{n-1}{n}\int\cos^{n-2}xdx+C
\end{equation}
where $C=\frac{C’}{n}$. The formula such as \eqref{eq:cosred} is called a reduction formula. Similarly we obtain the following reduction formulae.
\begin{align}
\int\sin^n xdx&=-\frac{1}{n}\sin^{n-1}x\cos x+\frac{n-1}{n}\int\sin^{n-2}dx\\
\int\tan^nxdx&=\frac{1}{n-1}\tan^{n-1}x-\int\tan^{n-2}dx,\ n\ne 1\\
\int\sec^nxdx&=\frac{1}{n-1}\sec^{n-2}x\tan x+\frac{n-2}{n-1}\int\sec^{n-2}xdx,\ n\ne 1
\end{align}
Example. Use the reduction formula \eqref{eq:cosred} to evaluate $\int\cos^3xdx$.

Solution.
\begin{align*}
\int\cos^3xdx&=\frac{1}{3}\cos^2x\sin x+\frac{2}{3}\int\cos xdx\\
&=\frac{1}{3}\cos^2x\sin x+\frac{2}{3}\sin x+C,
\end{align*}
where $C$ is a constant.

Integral like the following example is rather tricky.

Example. Evaluate $\int\sec xdx$.

Solution.
\begin{align*}
\int\sec xdx&=\int\sec x\frac{\sec x+\tan x}{\sec x+\tan x}dx\\
&=\int\frac{\sec^2x+\sec x\tan x}{\sec x+\tan x}dx\\
&=\frac{du}{u}\ (\mbox{substitution}\ u=\sec+\tan x)\\
&=\ln|u|+C\\
&=\ln|\sec x+\tan x|+C,
\end{align*}
where $C$ is a constant.

Example. Evaluate $\int\csc xdx$.

Solution. It can be done similarly to the previous example.
\begin{align*}
\int\csc xdx&=\int\csc x\frac{\csc x+\cot x}{\csc x+\cot x}dx\\
&=-\ln|\csc x+\cot x|+C,
\end{align*}
where $C$ is a constant.

Evaluating Integrals of the Type $\int\sin^mx\cos^nxdx$ Where $m,n$ Are Positive Integers

Case 1. One of the integer powers, say $m$, is odd.

$m=2k+1$ for some integer $k$. So,
\begin{align*}
\sin^mx&=\sin^{2k+1}x\\
&=(\sin^2x)^k\sin x\\
&=(1-\cos^2x)^k\sin x.
\end{align*}
Use the substitution $u=\cos x$ in this case.

Example. Evaluate $\int\sin^3x\cos^2xdx$.

Solution.
\begin{align*}
\int\sin^3x\cos^2xdx&=\int \sin^2x\sin x\cos^2xdx\\
&=\int(1-\cos^2x)\cos^2x\sin xdx\\
&=-\int(1-u^2)u^2du\ (\mbox{substition}\ u=\cos x)\\
&=\frac{u^5}{4}-\frac{u^3}{3}+C\\
&=\frac{\cos^5x}{5}-\frac{\cos^3x}{3}+C,
\end{align*}
where $C$ is a constant.

Example. Evaluate $\int\cos^3xdx$.

Solution.
\begin{align*}
\int\cos^3xdx&=\int\cos^2x\cos xdx\\
&=\int(1-\sin^2x)\cos xdx\\
&=\int(1-u^2)du\ (\mbox{substitution}\ u=\sin x)\\
&=u-\frac{u^3}{3}+C\\
&=\sin x-\frac{\sin^3x}{3}+C,
\end{align*}
where $C$ is a constant.

Case 2. If both $m$ and $n$ are even.

In this case, use the trigonometric identities
$$\sin^2x=\frac{1-\cos 2x}{2},\ \cos^2x=\frac{1+\cos 2x}{2}.$$

Example. Evaluate $\int\sin^2x\cos^4xdx$.

Solution. Left to readers for exercise. The answer is
$$\frac{1}{16}\left(x-\frac{1}{4}\sin 4x+\frac{1}{3}\sin^32x\right)+C,$$
where $C$ is a constant.

Integrals of Powers of $\tan x$ and $\sec x$

This type of integrals can be mostly done by using the trigonometric identity
$$1+\tan^2x=\sec^2x.$$

Example. Evaluate $\int\tan^4xdx$.

Solution.
\begin{align*}
\int\tan^4xdx&=\int\tan^2x\tan^2xdx\\
&=\int\tan^2x(\sec^2x-1)dx\\
&=\int\tan^2x\sec^2xdx-\int\tan^2xdx\\
&=\int u^2du-\int(\sec^2x-1)dx\ (\mbox{substition}\ u=\tan x)\\
&=\frac{\tan^3x}{3}-\tan x+x+C,
\end{align*}
where $C$ is a constant.

Example. Evaluate $\int\sec^3xdx$.

Solution.
\begin{align*}
\int\sec^3xdx&=\int\sec x\sec^2xdx\\
&=\sec x\tan x-\int\tan^2x\sec xdx\ (\mbox{integration by parts})\\
&=\sec x\tan x-\int(\sec^2x-1)\sec xdx\\
&=\sec x\tan x-\int\sec^3xdx+\int\sec xdx+C’\\
&=\sec x\tan x-\int\sec^3xdx+\ln|\sec x+\tan x|+C’,
\end{align*}
where $C’$ is a constant. Hence,
$$\int\sec^3xdx=\frac{1}{2}\sec x\tan x+\frac{1}{2}\ln|\sec x+\tan x|+C,$$
where $C=\frac{C’}{2}$.

Products of Sines and Cosines

This type of integrals include $\int\sin mx\cos nxdx$, $\int\sin mx\cos nxdx$, and $\int\cos mx\cos nxdx$. In this case use the identities
\begin{align*}
\sin mx\sin nx&=\frac{1}{2}[\cos(m-n)x-\cos(m+n)x]\\
\sin mx\cos nx&=\frac{1}{2}[\sin(m-n)x+\sin(m+n)x]\\
\cos mx\cos nx&=\frac{1}{2}[\cos(m-n)x+\cos(m+n)x]
\end{align*}
Example. Evaluate $\int\sin 3x\cos5xdx$.

Solution.
\begin{align*}
\int\sin 3x\cos5xdx&=-\frac{1}{2}\int\sin 2x+\frac{1}{2}\int\sin 8xdx\\
&=\frac{1}{4}\cos 2x-\frac{1}{16}\cos 8x+C,
\end{align*}
where $C$ is a constant.

Example. Evaluate $\int_0^1\sin m\pi x\sin n\pi xdx$ and $\int_0^1\cos m\pi x\cos n\pi xdx$ where $m$ and $n$ are positive integers.

Solution. If $m=n$, then
\begin{align*}
\int_0^1\sin m\pi x\sin n\pi xdx&=\int_0^1\sin^2m\pi xdx\\
&=\int_0^1\frac{1-\cos 4m\pi x}{2}dx\\
&=\frac{1}{2}\int_0^1dx-\frac{1}{2}\int_0^1\cos 4m\pi xdx\\
&=\frac{1}{2}-\frac{1}{8m\pi}[\sin 4m\pi x]_0^1\\
&=\frac{1}{2}.
\end{align*}
Now we assume that $m\ne n$. Then
\begin{align*}
\int_0^1\sin m\pi x\sin n\pi xdx&=\frac{1}{2}\int_0^1\cos(m-n)\pi xdx-\frac{1}{2}\int_0^1\cos(m+n)\pi xdx\\
&=0.
\end{align*}
So, we can simply write the integral as
\begin{equation}
\label{eq:orthofunct}
\int_0^1\sin m\pi x\sin n\pi xdx=\frac{1}{2}\delta_{mn},
\end{equation}
where
$$\delta_{mn}=\left\{\begin{array}{ccc}
1 & \mbox{if} & m=n,\\
0 & \mbox{if} & m\ne 0.
\end{array}\right.$$
$\delta_{mn}$ is called the Kronecker’s delta.
Similarly, we also have
\begin{equation}
\label{eq:orthofunct2}
\int_0^1\cos m\pi x\cos n\pi xdx=\frac{1}{2}\delta_{mn}.
\end{equation}
The integrals \eqref{eq:orthofunct} and \eqref{eq:orthofunct2} play an important role in studying the boundary value problems with heat equation and wave equation. They also appear in the study of different branches of mathematics and physics such as functional analysis, Fourier analysis, electromagnetism, and quantum mechanics, etc. In mathematics and physics, often functions like $\sin n\pi x$ and $\cos n\pi x$ are treated as vectors and integrals like \eqref{eq:orthofunct}  and \eqref{eq:orthofunct2} can be considered as inner products $\langle\sin m\pi x,\sin n\pi x\rangle$ and $\langle\cos m\pi x,\cos n\pi x\rangle$, respectively. In this sense, we can say that $\sin m\pi x$ and $\sin n\pi x$ are orthogonal if $m\ne n$. For this reason, functions $\sin n\pi x$ and $\cos n\pi x$, $n=1,2,\cdots$ are called orthogonal functions.

Integration by Parts

Let $f(x)$ and $g(x)$ be differentiable functions. Then the product rule
$$(f(x)g(x))’=f’(x)g(x)+f(x)g’(x)$$
leads to the integration
\begin{equation}
\label{eq:intpart}
\int f(x)g’(x)dx=f(x)g(x)-\int f’(x)g(x)dx.
\end{equation}
The formula \eqref{eq:intpart} is called integration by parts. If we set $u=f(x)$ and $v=g(x)$, then \eqref{eq:intpart} can be also written as
\begin{equation}
\label{eq:intpart2}
\int udv=uv-\int vdu.
\end{equation}

Example. Evaluate $\int x\cos xdx$.

Solution. Let $u=x$ and $dv=\cos xdx$. Then $du=dx$ and $v=\sin x$. So,
\begin{align*}
\int x\cos xdx&=x\sin x-\int\sin xdx\\
&=x\sin x+\cos x+C,
\end{align*}
where $C$ is a constant.

Example. Evaluate $\int\ln xdx$.

Solution. Let $u=\ln x$ and $dv=dx$. Then $du=\frac{1}{x}dx$ and $v=x$. So,
\begin{align*}
\int\ln xdx&=x\ln x-\int x\cdot\frac{1}{x}dx\\
&=x\ln x-x+C,
\end{align*}
where $C$ is a constant.

Often it is required to apply integration by parts more than once to evaluate a given integral. In that case, it is convenient to use a table as shown in the following example.

Example. Evaluate $\int x^2e^xdx$

Solution. In the following table, the first column represents $x^2$ and its derivatives, and the second column represents $e^x$ and its integrals.
$$\begin{array}{ccc}
x^2 & & e^x\\
&\stackrel{+}{\searrow}&\\
2x & & e^x\\
&\stackrel{-}{\searrow}&\\
2 & & e^x\\
&\stackrel{+}{\searrow}&\\
0 & & e^x.
\end{array}$$
This table shows the repeated application of integration by parts. Following the table, the final answer is given by
$$\int x^2e^xdx=x^2e^x-2xe^x+2e^x+C,$$
where $C$ is a constant.

Example. Evaluate $\int x^3\sin xdx$.

Solution. In the following table, the first column represents $x^3$ and its derivatives, and the second column represents $\sin x$ and its integrals.
$$\begin{array}{ccc}
x^3 & & \sin x\\
&\stackrel{+}{\searrow}&\\
3x^2 & & -\cos x\\
&\stackrel{-}{\searrow}&\\
6x & & -\sin x\\
&\stackrel{+}{\searrow}&\\
6 & & \cos x\\
&\stackrel{-}{\searrow}&\\
0 & & \sin x.
\end{array}$$
Following the table, the final answer is given by
$$\int x^3\sin xdx=-x^3\cos x+3x^2\sin x+6x\cos x-6\sin x+C,$$
where $C$ is a constant.

Example. Evaluate $\int e^x\cos xdx$.

Solution. In the following table, the first column represents $e^x$ and its derivatives, and the second column represents $\cos x$ and its integrals.
$$\begin{array}{ccc}
e^x & & \cos x\\
&\stackrel{+}{\searrow}&\\
e^x & & \sin x\\
&\stackrel{-}{\searrow}&\\
e^x & & -\cos x.
\end{array}$$
Now, this is different from the previous two examples. While the first column repeats the same function $e^x$, the functions second column changes from $\cos x$ to $\sin x$ and to $\cos x$ again up to sign. In this case, we stop there and write the answer as we have done in the previous two examples and add to it $\int e^x(-\cos x)dx$. (Notice that the integrand is the product of functions in the last row.) That is,
$$\int e^x\cos xdx=e^x\sin x-e^x\cos x-\int e^x\cos xdx.$$
For now we do not worry about the constant of integration. Solving this for $\int e^x\cos xdx$, we obtain the final answer
$$\int e^x\cos xdx=\frac{1}{2}e^x\sin x-\frac{1}{2}e^x\cos x+C,$$
where $C$ is a constant.

Example. Evaluate $\int e^x\sin xdx$.

Solution. In the following table, the first column represents $e^x$ and its derivatives, and the second column represents $\sin x$ and its integrals.
$$\begin{array}{ccc}
e^x & & \sin x\\
&\stackrel{+}{\searrow}&\\
e^x & & -\cos x\\
&\stackrel{-}{\searrow}&\\
e^x & & -\sin x.
\end{array}$$
This is similar to the above example. The first columns repeats the same function $e^x$, and the functions in the second column changes from $\sin x$ to $\cos x$ and to $\sin x$ again up to sign. So we stop there and write
$$\int e^x\sin xdx=-e^x\cos x+e^x\sin x-\int e^x\sin xdx.$$
Solving this for $\int e^x\sin xdx$, we obtain
$$\int e^x\sin xdx=-\frac{1}{2}e^x\cos x+\frac{1}{2}e^x\sin x+C,$$
where $C$ is a constant.

Example. Evaluate $\int e^{5x}\cos 8xdx$.

Solution. In the following table, the first column represents $e^{5x}$ and its derivatives, and the second column represents $\cos 8x$ and its integrals.
$$\begin{array}{ccc}
e^{5x} & & \cos 8x\\
&\stackrel{+}{\searrow}&\\
5e^{5x} & & \frac{1}{8}\sin 8x\\
&\stackrel{-}{\searrow}&\\
25e^{5x} & & -\frac{1}{64}\cos 8x.
\end{array}$$
The first columns repeats the same function $e^{5x}$ up to constant multiple, and the functions in the second column changes from $\cos 8x$ to $\sin 8x$ and to $\cos 8x$ again to constant multiple. This case also we do the same.
$$\int e^{5x}\cos 8xdx=\frac{1}{8}e^{5x}\sin 8x+\frac{5}{64}e^{5x}\cos 8x-\frac{25}{64}\int e^{5x}\cos 8xdx.$$
Solving this for $\int e^{5x}\cos 8xdx$, we obtain
$$\int e^{5x}\cos 8xdx=\frac{8}{89}e^{5x}\sin 8x+\frac{5}{89}e^{5x}\cos 8x+C,$$
where $C$ is a constant.

The evaluation of a definite integral by parts can be done as
\begin{equation}
\label{eq:intpart3}
\int_a^b f(x)g’(x)dx=[f(x)g(x)]_a^b-\int_a^b f’(x)g(x)dx.
\end{equation}

Example. Find the area of the region bounded by $y=xe^{-x}$ and the x-axis from $x=0$ to $x=4$.

The graph of y=xexp(-x), x=0..4

The graph of y=xexp(-x), x=0..4

Solution. Let $u=x$ and $dv=e^{-x}dx$. Then $du=dx$ and $v=-e^{-x}$. Hence,
\begin{align*}
A&=\int_0^4 xe^{-x}dx\\
&=[-xe^{-x}]0^4+\int_0^4 e^{-x}dx\\
&=-4e^{-4}+[-e^{-x}]_0^4\\
&=1-5e^{-4}.
\end{align*}

A Convergence Theorem for Fourier Series

In here, we have seen that if a function $f$ is Riemann integrable on every bounded interval, it can be expended as a trigonometric series called a Fourier series by assuming that the series converges to $f$. So, it would be natural to pause the following question. If $f$ is a periodic function, would its Fourier series always converge to $f$? The answer is affirmative if $f$ is in addition piecewise smooth.

Let $S_N^f(\theta)$ denote the $n$-the partial sum of the Fourier series of a $2\pi$-periodic function $f(\theta)$. Then
\begin{equation}
\label{eq:partsum}
\begin{aligned}
S_N^f(\theta)&=\sum_{-N}^N c_ne^{in\theta}\\
&=\frac{1}{2\pi}\sum_{-N}^N\int_{-\pi}^\pi f(\psi)e^{in(\theta-\psi)}d\psi\\
&=\frac{1}{2\pi}\sum_{-N}^N\int_{-\pi}^\pi f(\psi)e^{in(\psi-\theta)}d\psi.
\end{aligned}
\end{equation}
Let $\phi=\psi-\theta$. Then
\begin{align*}
S_N^f(\theta)&=\frac{1}{2\pi}\sum_{-N}^N\int_{-\pi+\theta}^{\pi+\theta} f(\phi+\theta)e^{in\phi}d\phi\\
&=\frac{1}{2\pi}\sum_{-N}^N\int_{-\pi}^\pi f(\phi+\theta)e^{in\phi}d\phi\\
&=\int_{-\pi}^\pi f(\theta+\phi)D_N(\phi)d\phi,
\end{align*}
where
\begin{equation}
\label{eq:dkernel}
\begin{aligned}
D_N(\phi)&=\frac{1}{2\pi}\sum_{-N}^N e^{in\phi}\\
&=\frac{1}{2\pi}\frac{e^{i(N+1)\phi}-e^{-iN\phi}}{e^{i\phi}-1}\\
&=\frac{1}{2\pi}\frac{\sin\left(N+\frac{1}{2}\right)\phi}{\sin\frac{1}{2}\phi}.
\end{aligned}
\end{equation}
$D_N(\phi)$ is called the $N$-th Dirichlet kernel. Note that the Dirichlet kernel can be used to realize the Dirac delta function $\delta(x)$, i.e.
$$\delta(x)=\lim_{n\to\infty}\frac{1}{2\pi}\frac{\sin\left(n+\frac{1}{2}\right)x}{\sin\frac{1}{2}x}.$$

Dirichlet kernel D_n(x), n=1..10, x=-pi..pi

Dirichlet kernel D_n(x), n=1..10, x=-pi..pi

Note that
$$\frac{1}{2}+\frac{\sin\left(N+\frac{1}{2}\right)\theta}{2\sin\frac{1}{2}\theta}=1+\sum_{n=1}^N\cos n\theta\ (0<\theta<2\pi)$$
Using this identity, one can easily show that:

Lemma. For any $N$,
$$\int_{-\pi}^0 D_N(\theta)d\theta=\int_0^{\pi}D_N(\theta)d\theta=\frac{1}{2}.$$

Now, we area ready to prove the following convergence theorem.

Theorem. If $f$ is $2\pi$-periodic and piecewise smooth on $\mathbb{R}$, then
$$\lim_{N\to\infty} S_N^f(\theta)=\frac{1}{2}[f(\theta-)+f(\theta+)]$$
for every $\theta$. Here, $f(\theta-)=\lim_{\stackrel{h\to 0}{h>0}}f(\theta-h)$ and $f(\theta+)=\lim_{\stackrel{h\to 0}{h>0}}f(\theta+h)$. In particular, $\lim_{N\to\infty}S_N^f(\theta)=f(\theta)$ for every $\theta$ at which $f$ is continuous.

Proof. By Lemma,
$$\frac{1}{2}f(\theta-)=f(\theta-)\int_{-\pi}^0 D_N(\phi)d\phi,\ \frac{1}{2}f(\theta+)=f(\theta+)\int_0^\pi D_N(\phi)d\phi.$$
So,
\begin{align*}
S_N^f(\theta)-\frac{1}{2}[f(\theta-)+f(\theta+)]&=\int_{-\pi}^0[f(\theta+\phi)-f(\theta-)]D_N(\phi)d\phi+\\
&\int_0^\pi[f(\theta+\phi)-f(\theta+)]D_N(\phi)d\phi\\
&=\frac{1}{2\pi}\int_{-\pi}^0[f(\theta+\phi)-f(\theta-)]\frac{e^{i(N+1)\phi}-e^{-iN\phi}}{e^{i\phi}-1}d\phi\\
&+\frac{1}{2\pi}\int_0^\pi[f(\theta+\phi)-f(\theta+)]\frac{e^{i(N+1)\phi}-e^{-iN\phi}}{e^{i\phi}-1}d\phi.
\end{align*}
$$\lim_{\phi\to 0+}\frac{f(\theta+\phi)-f(\theta+)}{e^{i\phi}-1}=\frac{f’(\theta+)}{i},\ \lim_{\phi\to 0-}\frac{f(\theta+\phi)-f(\theta-)}{e^{i\phi}-1}=\frac{f’(\theta-)}{i}.$$
Hence, the function
$$g(\phi):=\left\{\begin{aligned}
&\frac{f(\theta+\phi)-f(\theta+)}{e^{i\phi}-1},\ -\pi<\phi<0,\\
&\frac{f(\theta+\phi)-f(\theta-)}{e^{i\phi}-1},\ 0<\phi<\pi
\end{aligned}\right.$$
is piecewise continuous on $[-\pi,\pi]$. By the corollary to Bessel’s inequality,
$$c_n=\frac{1}{2\pi}\int_{-\pi}^\pi g(\phi)e^{in\phi}d\phi\to 0$$
as $n\to\pm\infty$. Therefore,
\begin{align*}
S_N^f(\theta)-\frac{1}{2}[f(\theta-)+f(\theta+)]&=\frac{1}{2\pi}\int_{-\pi}^\pi g(\phi)[e^{i(N+1)\phi}-e^{-iN\phi}]d\phi\\
&=c_{-(N+1)}-c_N\\
&\to 0
\end{align*}
as $N\to\infty$. This completes the proof.

Corollary. If $f$ and $g$ are $2\pi$-periodic and piecewise smooth, and $f$ and $g$ have the same Fourier coefficients, then $f=g$.

Proof. If $f$ and $g$ have the same Fourier coefficients, their their Fourier series are the same. Due to the conditions on $f$ and $g$, the Fourier series of $f$ and $g$ converge to $f$ and $g$ respectively by the above convergence theorem. Hence, $f=g$.

The Curvature of a Curve in Euclidean 3-space $\mathbb{R}^3$

The quantity curvature is intended to be a measurement of the bending or turning of a curve. Let $\alpha: I\longrightarrow\mathbb{R}^3$ be a regular curve (i.e. a smooth curve whose derivative never vanishes). If $\alpha$ were to have the unit speed, i.e.
\begin{equation}
\label{eq:unitspped}
||\dot\alpha(t)||^2=\alpha(t)\cdot\alpha(t)=1.
\end{equation}
Differentiating \eqref{eq:unitspped}, we see that $\dot\alpha(t)\cdot\ddot\alpha(t)=0$, i.e. the acceleration is normal to the velocity which is tangent to $\alpha$. Hence, measuring the acceleration is measuring the curvature. So, if we denote the curvature by $\kappa$, then
\begin{equation}
\label{eq:curvature}
\kappa=||\ddot\alpha(t)||.
\end{equation}
Remember that the definition of curvature \eqref{eq:curvature} requires the curve $\alpha$ to be a unit speed curve, but it is not necessarily always the case. What we know is that we can always reparametrize a curve and reparametrization does not change the curve itself but only changes its speed. There is one particular parametrization that we are interested in as it results a unit speed curve. It is called paramtrization by arc-length. This time let us assume that $\alpha$ is not a unit speed curve and define
\begin{equation}
\label{eq:arclength}
s(t)=\int_a^t||\dot\alpha(u)||du,
\end{equation}
where $a\in I$. Since $\frac{ds}{dt}>0$, $s(t)$ is an increasing function and so it is one-to-one. This means that we can solve \eqref{eq:arclength} for $t$ and this allows us to reparametrize $\alpha(t)$ by the arc-length parameter $s$.

Example. Let $\alpha: (-\infty,\infty)\longrightarrow\mathbb{R}^3$ be given by
$$\alpha(t)=(a\cos t,a\sin t,bt)$$
where $a>0$, $b\ne 0$. $\alpha$ is a right circular helix. Its speed is
$$||\dot\alpha(t)||=\sqrt{a^2+b^2}\ne 1.$$
$s(t)=\sqrt{a^2+b^2}t$, so $t=\frac{s}{\sqrt{a^2+b^2}}$. The reparametrization of $\alpha(t)$ by $s$ is given by
$$\alpha(s)=\left(a\cos\frac{s}{\sqrt{a^2+b^2}},b\sin\frac{s}{\sqrt{a^2+b^2}},\frac{bs}{\sqrt{a^2+b^2}}\right).$$
Hence the curvature $\kappa$ is
$$\kappa=\frac{a}{a^2+b^2}.$$