Information geometry & KL divergence

Notice

Recent Posts

Recent Comments

« 2025/04 »
일	월	화	수	목	금	토
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30

Tags more

Today

Total

관리 메뉴

MATH & ML

Information geometry & KL divergence 본문

카테고리 없음

Information geometry & KL divergence

BlogYong 2022. 7. 3. 10:24

아이디어) KL 메져가 제일 좋고 유일하다 : KL은 bregdivergence에도 포함되어있고, f-divergence에도 포함되어있다.

1) Bregman divergence

prob model $p(x;\theta)=\frac{\exp(-\theta\cdot x)}{z}$ (exponential family)

$\theta$ : natural parameter

$z=\sum_x \exp(-\theta\cdot x)$ : constant for sum=1

$\psi(\theta)=\log z(\theta)$ 가 cumulant generating ftn(free energy)이 된다 (미분하면 expectation $-\frac{\psi}{\partial \theta_i}\mathbb{E}[x_i]:=\mu_i$ 이 나오고, 두번미분하면 ^2의 expectation이되고)

이제 우리가 데이터로부터 알수있는건 $\theta_i$ 들이 아니고 $mu_i$ 들이다.

$\psi(\theta)$ 를 $\phi(\mu)$ 로 바꾸고 싶다. (Legendre transformation)

=> $\psi(\theta)+\phi(\mu)=-\theta\cdot\mu$

$\theta$ 에 대한 공간 : primary space / $\mu$ 에 대한 공간 : dual space

이제 우리는 두 모델 $p(X;\theta)$ 와 $p(X;\theta')$ 사이의 거리(메져)를 정의해야하는데 이 대신

\psi(\theta)가 오목함수이기때문에 $D[\psi(\theta),\psi(\theta')]$ 를 bregman divergence로 정의한다.

(bregman divergence는 $\theta'$ 에서 테일러전개를1차까지해서 그 직선과 \psi(\theta)사이의 거리를 의미한다)

그러면 그 divergence가 $D_{breg}[\psi(\theta),\psi(\theta')]=D_{KL}[p'||p]$ 가된다!

또한 $D_{breg}[\phi(\mu),\phi(\mu')]=D_{KL}[p||p']$

Example)

gaussian model

$p(x;\mu,\sigma)=\exp(-\theta_1x-\theta_2x-\psi(\theta_1,\theta_2))=\frac{\exp(-\theta\cdot x)}{z}$

with $\theta=(\theta_1,\theta_2)=(-\frac{\mu}{\sigma^2},\frac{1}{2\sigma^2})$

여기서

공간1) \theta_1, \theta_2 (natural parameter 공간)

공간2) \mu,\sigma^2

공간3) \mu_1=\mu, \mu_2=\sigma^2+\mu^2 (dual공간)

공간2에서 수직인 직선2개를 보면 공간1이나 공간3에서는 직교하지 않는다. (dually flat)

이때 피타고라스성질을 만족하는 measure가 바로 KL divergence가 된다.

2) f-divergence

sufficient statistics에서 transformation이 있다.(기존 데이터 $x_1,...x_n$ 의 정보를 잃지않는 transform)

$T:x\rightarrow y$

이때 f-divergence는 $D[q(x),p(x)]=D[\tilde{q}(y),\tilde{p}(y)]$ 를 만족한다.

이때 $D[q(x),p(x)]=\sum_x p(x)f(\frac{q(x)}{p(x)})$

이때 $f\geq0$ , $f(1)=0$ , f가 convex ftn을 만족해야한다.

(참고문헌 책 : Information Geometry and its application)

3. mirror descent

(추후 추가)

저작자표시 변경금지

Comments

내 블로그 - 관리자 홈 전환	`Q` `Q`
새 글 쓰기	`W` `W`

글 수정 (권한 있는 경우)	`E` `E`
댓글 영역으로 이동	`C` `C`

이 페이지의 URL 복사	`S` `S`
맨 위로 이동	`T` `T`
티스토리 홈 이동	`H` `H`
단축키 안내	`Shift` + `/` `⇧` + `/`

MATH & ML

MATH & ML

Information geometry & KL divergence 본문

Information geometry & KL divergence

티스토리툴바

단축키

내 블로그

블로그 게시글

모든 영역