Probability and Statistics

x⌋= the greatest integer smaller than x

sequence

index variable

explicit formulas

recurrence relation

arithmetic sequences

geometric sequences

bounded above/ below

(un)bounded

convergent ⇒ bounded

bounded, monotone ⇒ convergent

series

convergent vs divergent series

harmonic sum

p series:

convergent ⇒ limn → ∞an = 0

limn → ∞an ≠ 0⇒ divergent

tests

divergence test

partial sum

geometric: |q| < 1

integral: f, f(n) = an, continuous, ultimately decreasing 🡪 same

comparison:

scaling with or

alternating: |an + 1| − |an| < 0, ∀ n > n0, limn → ∞an = 0⇒ convergent

absolute convergence: convergent

conditional convergence: convergent, but divergent

convergent convergent

ratio:

root:

uniform convergence:

fn(x) converge to f(x): fn(x) → f(x) as n → ∞

N(ε) independent to x, ∀n > N, |fn(x) − f(x)| < ε⇔ uniform convergence: fn ⇉ f

dominated convergence: fi(x) → f(x) as x → ∞, ∃g, ∀x, i, fi(x) < g(x) ⇒ limi → ∞abfi(x)d**x = ∫abf(x)d**x

power series:

center a, coefficient cn

 ⇒ x = a⇔ convergent

or ∀x ∈ ℝ, convergent

or ∃R > 0, |x − a| < R⇒ convergent, |x − a| > R⇒ divergent

method: , radius of convergence

|x − a| < R⇒ absolutely convergent, |x − a| > R⇒ divergent from ratio test

Bessel function of the first kind:

basic formula:

plug out terms, create (x − a), see something as 1 − x, do it, range

Taylor and Maclaurin Series:

for |x| < 1

for |x| < 1

differentiation and integration: ,

may lose boundary of convergence

Cauchy product: ,

probability

sample space S: nonempty set

collection of events: subset of S: P(S)

probability measure P(A) ∈ [0, 1], P(⌀) = 0, P(S) = 1

if Ai… are mutually disjoint

symmetric difference: AΔB = (A ∖ B) ∪ (B ∖ A)

complement: Ac :  = S ∖ A

disjoint: A ∩ B = ⌀ ⇔ A ⊆ Bc ⇒ A ∖ B = A ∩ Bc

De Morgan Laws: (A ∪ B)c = Ac ∩ Bc, (A ∩ B)c = Ac ∪ Bc

A = B ⇔ A ⊆ BB ⊆ A

cardinality: |A|= # of elements in A

inclusion-exclusion principle: |A ∪ B| = |A| + |B| − |A ∩ B|, |A ∪ B ∪ C| = |A| + |B| + |C| − |A ∩ B| − |A ∩ C| − |B ∩ C| + |A ∩ B ∩ C|

partition of the sample: Bi, disjoint, S = B1 ∪ B2 ∪ …

P(A) = P(A ∩ B1) + P(A ∩ B2) + …

P(A ∖ B) = P(A ∩ Bc) = P(A) − P(B)

not necessarily disjoint: P(A1 ∩ A2 ∩ …) ≤ P(A1) + P(A2) + …

independent: P(A ∩ B) = P(A)P(B) for two events, and P(A1 ∩ … ∩ Ai) = P(A1)…P(Ai) for more events

uniform probability:

continuous uniform distribution: , [x, y] ⊂ [a, b]

combinatorics: length k, n symbols  ⇒ nk

ordered subset

binomial:

generating function:

exponential generating function:

Bayes’ theorem:

random variable X(bla) = 2

discrete:

PDF probability mass/ density function: pX(x) = P(X = x)

law of total probability:

Bernoulli: pX(0) = q, pX(1) = p

Binomial:

rX(t) = (t**p + q)n

geometric: pX(k) = p**qk

negative-binomial:

Poisson:

E(X) = var(X) = λ

hypergeometric:

continuous distribution: pX(k) = 0 ∀ x ∈ ℝ,∫−∞f(x)d**x = 1

uniform:

exponential:

P(x ≥ t) = eλ**t

normal:

gamma:

Γ(x + 1) = x**Γ(x)

chi-squared: Χ2(n) is the distribution of Z = X12 + … + Xn2 with Xi ∼ N(0, 1)

E(Z) = n

for z ≥ 0 if n = 1

t (or student): t(n) is the distribution of with Xi ∼ N(0, 1)

Table Description automatically generated

CDF cumulative distribution fun

joint distributions Y = h(X)

if X discrete, then

if X continuous, h(x) monotone where fX(x) > 0, then

marginal distribution: FX(x) = limy → ∞FX, Y(x, y)

joint probability fun

marginal PDF: table

joint density fun: f(x, y), ∫−∞−∞f(x, y)dxd**y = 1

marginal PDF fX(x) = ∫−∞fX, Y(x, y)d**y

independent

or f

continuous:

law of total prob: P((X, Y) ∈ B) = ∬BfX(x)fY|X(y|x)dxd**y

independent random variables: ∀B1, B2, P(X ∈ B1, Y ∈ B2) = P(X ∈ B1)P(Y ∈ B2)

expectation: E(X**Y) = E(X)E(Y) if X, Y independent

variance: var(X) = E((X − μx)2) = E(X2) − E2(X), var(X + Y) = var(X) + var(Y) + 2cov(X, Y)

covariance: cov(X, Y) = E((X − μX)(Y − μY)) = E(X**Y) − E(X)E(Y)

cov(a**X + b**Y, Z) = a cov(X, Z) + b cov(Y, Z)

correlation:

i.i.d. mutually independent and identically distributed

X1, …, Xn ∼ Bernoulli(piid.  ⇒ X1 + … + Xn ∼ B(n, p)

X1 ∼ B(n1, p), X2 ∼ B(n2, p) ⇒ X1 + X2 ∼ B(n1 + n2, p)

X1 ∼ Poisson(λ1), X2 ∼ Poisson(λ2) ⇒ X1 + X2 ∼ Poisson(λ1 + λ2)

generating fun: rX(t) = E(tX)

rX + Y(t) = rX(t)rY(t)

kth moment: E(Xk)

kth central moment: E((X − μx)k)

moment-generating fun:

MX(k)(0) = E(Xk)

MX + Y(s) = MX(s)MY(s) X, Y independent, work for rX(t)

uniqueness: if ∃s0 > 0, MX(s) < ∞ ∀s ∈ (−s0, s0), MX(s) = MY(s), then X, Y have the same distribution also work for r

characteristic fun: cX(s)= E(eiXs)

inequalities

Markov’s: X ≥ 0, a > 0

Chebychev’s: a > 0

Cauchy-Schwarz:

=, iff if var(Y) > 0

law of large number

sample sum: Sn = X1 + … + Xn for {Xi} i. i. d.

sample average:

WLLN weak: limn → ∞P(|Mn − μ| ≥ ε) = 0 for {Xi} same μ, var ≤ v, v < ∞, ε > 0 not necessarily i. i. d.

SLLN strong: P{limn → ∞Mn = μ} = 1 for {Xi} i. i. d.

x

central limit theorem: limn → ∞P(Zn ≤ x) = P(Z ≤ x) for {Xi} i. i. d, Z ∼ N(0, 1),

convolution: for XY independent, Z = X + Y

discrete:

continuous: fZ(z) = ∫−∞fX(z − w)fY(w)d**w

different convergence

: {Xn} converges in probability to Y, if limn → ∞P(|Xn − Y| ≥ ε) = 0

: {Xn} converges with probability 1 (almost surely) to Y, if limn → ∞P(Xn = Y) = 1

: {Xn} converges in distribution to Y, if limn → ∞P(Xn ≤ x) = P(Y ≤ x) ∀ x ∈ {x|P(Y = x) = 0}

sample

sample variance:

kth sample moment:

point estimation

the method of moments:  

MLE maximum likelihood estimation: make L big as possible

maximum likelihood fun: L(θ; x1, …, xn) = p(X1 = x1, …Xn = xn|θ) or f

x1, …, xn independent:

good point estimator

unbiased: E(θ̂n) = θ

consistent:

mean squared error E((θ̂n − θ)2) is small

confidence interval: P(θ̂n ≤ θ ≤ θ̂n+) ≥ 1 − α

 ⇐ P(a ≤ g(θ) ≤ b) ≥ 1 − α

N, t: let −b = a,

Χ2: let

deal with Xi ∼ N(μ, σ2)

 

μ = μ0:

hypothesis testing

null hypothesis H0

type I/ II error: H0 true/ false

level of significance α: probability for type I error

power of the test 1 − β, β: probability for type II error

p-value: probability that H0 is true

reject if p-value  < α

for {Xi ∼ N(μ, σ02)} i.i.d., use

for two samples,

linear regression

linear least squares

correlation coefficient:

standard statistical model: Yj = β0 + β1xj + ej, j = 1, 2, …, n

E(ej) = 0, var(ej) = σ2

intercept β0, slope β1, residual ej

estimators

E(β̂0) = β0, E(β̂1) = β1

if ej ∼ N(0, σ2) i.i.d., then β̂0, β̂1 normal dist

sum of squared errors SSE:

independent of β̂0, β̂1

1 − α confidence interval for β̂0, β̂1

Text, letter Description automatically generated

1 − α confidence interval for σ2

1 − α confidence interval for y0 = E(0):