๋ณธ๋ฌธ ๋ฐ”๋กœ๊ฐ€๊ธฐ

๐“ก๐“ธ๐“ธ๐“ถ5: ๐’ฆ๐‘œ๐“‡๐‘’๐’ถ ๐’ฐ๐“ƒ๐’พ๐“‹/Artificial Intelligence(COSE361)

[์ธ๊ณต์ง€๋Šฅ] 13. Probability and Statistics

1. Review of Probabilities and Statics

  1) Elements of Probability 

    - Random experiment : ์‹œํ–‰, Nondeterministic

    - Sample space : ์ „์ฒด์ง‘ํ•ฉ ์š”์†Œ๋“ค. Mutually exclusive(๋ฐฐ๋ฐ˜์‚ฌ๊ฑด)์ด๊ณ  exhaustive(ํ•ฉ์ง‘ํ•ฉ = ์ „์ฒด์ง‘ํ•ฉ) ์ด๋‹ค.

    - Events : ์‚ฌ๊ฑด, Sample space์˜ ๋ถ€๋ถ„์ง‘ํ•ฉ

    - Probability : ํ™•๋ฅ . the proportion, or relative frequency (๋น„์œจ, ์ƒ๋Œ€์  ๋นˆ๋„์ˆ˜) / degree of belief (๊ธฐ๋Œ“๊ฐ’)

  2) Axioms of Probability Theory

    - ํ™•๋ฅ ์€ 0๊ณผ 1 ์‚ฌ์ด์ž„

    - Sample space์˜ ๋ชจ๋“  ํ™•๋ฅ ์„ ๋”ํ•˜๋ฉด 1

    - P(a v b) = P(a) + P(b) - P(a^b)

 3) Random Variable and Probability Density

    - Random variable 

        * Domain : ์ •์˜์—ญ

        โ‘  Discrete : ์ด์‚ฐ์  ํ™•๋ฅ ๋ณ€์ˆ˜

        โ‘ก Continuous : ์—ฐ์†์  ํ™•๋ฅ ๋ณ€์ˆ˜

     - Probability density function (pdf) : ํ™•๋ฅ ๋ฐ€๋„ํ•จ์ˆ˜

           P(X=x) = P(x) = lim(dx->0) P(x<X<x+dx)/dx

     - Probability mass function (pmf) : ํ™•๋ฅ ์งˆ๋Ÿ‰ํ•จ์ˆ˜ 

          P(X = 3) = P(3) = 1/6

     - Cumulative probability density function(cdf) : ๋ˆ„์ ๋ถ„ํฌํ•จ์ˆ˜

         Fx(x) = P(X<=x) = ∫P(u)du

  4) 

    - Prior probability (๊ทธ๋ƒฅ ํ™•๋ฅ  ํ•˜๋‚˜ ๊ทธ ์ž์ฒด) : P(a)

    - Posterior probability (conditional probability, ์กฐ๊ฑด๋ถ€ ํ™•๋ฅ ) : P(a|b) = P(a^b)/P(b)

    - Joint probability (๋™์‹œ์— ์ผ์–ด๋‚  ํ™•๋ฅ ) : P(a^b) = P(X = a, Y = b)

    - Product rule = P(a^b) = P(a|b) P(b) 

    - Marginal probability 

         P(X) = ∑ P(X , y) = ∑ P(X | y)P(y)

 

    **************P(X ^ Y) = P( X, Y ) = P(X | Y) P(Y) ******************

 

  5) Chain Rule

    P(x1 , x2, x3, ... , xn) = P(xn | x1, ..., xn-1)P(x1, ..., xn-1) = P(xn| x1, ... , xn-1)P(xn-1 | n1, .., xn-2) P(x1 , ... , xn-2)

  = P(xn | x1, ..., xn-1)P(xn-1|x1, ..., xn-2) ... P(x2|x1)P(x1) 

  = ∏P(xi | x1, ... , xi-1)

 

  6) Independence

  - Independence (๋…๋ฆฝ) : P( a|b ) = P(a), P(a ^ b) = P(a)P(b)

  - Conditional independence

     : P(X, Y) != P(X)P(Y) (X์™€ Y๋Š” ๋…๋ฆฝ์ด ์•„๋‹˜)

       P(X,Y|Z) = P(X|Z)P(Y|Z)  (Z๊ฐ€ ๋ผ๋ฉด X์™€ Y๊ฐ€ ์„œ๋กœ ๋ฌด๊ด€, ์ฆ‰ ๋…๋ฆฝ์ด ๋จ)

       P(X|Y) != P(X)

       P(X| Y,Z) = P(X|Z)

     ex) X: ํŒ”์ด ์•„ํ””

          Y: ๋น„๊ฐ€ ์˜ด

          Z: ์šฐ์‚ฐ์„ ์”€

         Z๊ฐ€ ์—†์„ ๋•Œ๋Š” Y(๋น„๊ฐ€์˜ด)->(์šฐ์‚ฐ์„ ์”€)->X(ํŒ”์ด ์•„ํ””) ์œผ๋กœ X์™€ Y๊ฐ€ Dependence ํ–ˆ์œผ๋‚˜

         Z๊ฐ€ ์ฃผ์–ด์ง€๋ฉด Y(๋น„๊ฐ€์˜ด)->Z(์šฐ์‚ฐ์„ ์”€)  // Z(์šฐ์‚ฐ์„ ์”€) -> X(ํŒ”์ด ์•„ํ””) ์œผ๋กœ X์™€ Y๊ฐ€ ๋ณ„๊ฐœ๊ฐ€ ๋จ

 

  7) Expectation

   - Expectation : E(X) = ∑ xi P(X=xi) = μx (ํ™•๋ฅ ๋ณ€์ˆ˜์˜ ๊ฐ’ x ํ™•๋ฅ  ์˜ ์ด ํ•ฉ)

   - Covariance (๊ณต๋ถ„์‚ฐ) : cov(X, Y) = E((X - μx)(Y - μy)) : X๊ฐ€ ๋ณ€ํ•จ์— ๋”ฐ๋ผ Y๊ฐ€ ์–ผ๋งˆ๋‚˜ ๋ณ€ํ•˜๋Š”์ง€ expectation 

     * ๊ทธ๋ƒฅ ๋ถ„์‚ฐ = E[(X - μx)^2 ] 

   - Covariance matrix ∑  (๊ณต๋ถ„์‚ฐ ํ–‰๋ ฌ)

      : ∑ij = cov(Xi, Xj) = E( (Xi - μx)(Xj - μj ) )

      ๊ณต๋ถ„์‚ฐ ํ–‰๋ ฌ์˜ ( i, j ) ๊ฐ’์€ Xi ์™€ Xj์˜ ๊ณต๋ถ„์‚ฐ

 

  8) Gaussian Distribution ( ์ •๊ทœ ๋ถ„ํฌ)

x ๊ฐ€ scalar ๊ฐ’์ผ ๋•Œ

    * Multivariate Gaussian distribution : ๋‹ค๋ณ€๋Ÿ‰ ์ •๊ทœ๋ถ„ํฌ

  * ๋ถ„์‚ฐ -> vector (covariance matrix)

 

  9) Central limit Theorem ( ์ค‘์‹ฌ ๊ทนํ•œ ์ •๋ฆฌ)

   : n๊ฐœ์˜ ๋…๋ฆฝ random variable์˜ ํ‰๊ท ์€ n์ด ๋ฌดํ•œ๋Œ€๋กœ ๊ฐ€๋ฉด gausian distribution์„ ๋”ฐ๋ฅธ๋‹ค. 

 

 

 

2. Probabilistic Inference

: ๊ด€์ธกํ•œ evidence๊ฐ€ ์žˆ์„ ๋•Œ, ์•Œ๊ณ  ์‹ถ์€ query proposition์ด true์ผ posterior probability๋ฅผ ๊ตฌํ•˜๋Š” ๊ฒƒ

 

* Bayes' rule : ๋ฒ ์ด์ฆˆ ๊ทœ์น™

  - P(Y | X) = P(X|Y)P(Y) / P(X)

  - P(Y | X, e) = P(X|Y,e)P(Y|e) / P(X|e)

 

Y : cause, ์›์ธ, query proposition, ์•Œ๊ณ  ์‹ถ์€ ๊ฒƒ

X : ๊ฒฐ๊ณผ, ์ฆ์ƒ, evidence

 

์™œ ? P(Y|X) = P(X,Y)/P(X)

      P(X|Y) = P(X, Y)/P(Y)

 -> P(X,Y) = P(Y|X)P(X) = P(X|Y)P(Y) 

 -> P(Y| X) = P(X|Y)P(Y) / P(X) 

 

๋˜๋Š” P(X, Y) = P(X | Y)P(Y) 

์ฆ‰ P(Y | X) = P(X, Y)/ P(X) = P(X|Y)P(Y) / P(X)

 

 

* Naive Bayes model

  P(C, E1, E2, ... , En)

  = P(C) P(E1, E2, ... , En|C)

  = P(C) ∏P(Ei|C) (Ei ๋ผ๋ฆฌ๋Š” ์กฐ๊ฑด๋ถ€๋…๋ฆฝ์ž„)

 

  P(C | E1, E2, ... , En)

  = P(E1, E2 , ... , En| C)P(C) / P(E1, E2, .. , En)  <- Bayes rule 

  = αP(C) ∏P(Ei|C)

( α = 1 / P(E1, E2, .. , En) )