[Artificial Intelligence] Neural Network Basics

Notice

Recent Posts

Recent Comments

Link

« 2025/07 »
일	월	화	수	목	금	토
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31

Tags more

Archives

Today

Total

관리 메뉴

aajin126's devlog

[Artificial Intelligence] Neural Network Basics 본문

Computer Science/Artificial Intelligence

[Artificial Intelligence] Neural Network Basics

aajin126 2023. 3. 28. 15:18

Modeling Pavlov Using a Neuron

food input이 있을 때만 1.0의 output을 냈던 함수가 STDP에 의해 bell input만 있을 때도 1.0의 output을 출력하게 됨.

Terms

Activation = feature

Weight = filter / kernel

Thresholding : 앞에서 계산한 것을 한 번 더 가공하는 함수 (위의 그림에서는 0.5와 계산 결과를 비교하여 0.5 인 값은 -1.0을 출력하고, 이상인 값은 1.0을 출력한다.)

Mathematical Formulation of a Neuron

$ O_j = act_{\Theta j} (\Sigma_i^n x_iw_{ij}) $

위 그림에서의 transfer function은 neuron 하나라고 볼 수 있는데, 그것은 "Perceptron"이라고도 불린다.

The Most Successful One : Gradient Descent

Cost function을 minimize하는 optimization algorithm
수학적으로 음의 기울기로 정의된 steepest descent 방향으로 반복적으로 이동하는 방법을 이용한다.

$x' = x -\eta\nabla C$

$\eta$ = Learning rate

$\nabla C$ = Gradient operator (해당 점에서의 기울기)

Learning rate가 너무 작으면 min point로 가는데 시간이 오래 걸리고 너무 크면 진동하거나 발산하여 min point로 갈 수 없게 되므로 learning rate를 잘 설정하는 것이 중요하다.

Training a Neuron

Example :

Training a neuron is solving an optimization problem with respect to weights (Find weights that minimize the error)

$error = target - output$

$w'_{00} = w_{00} - \eta\nabla w_{00}$

$w'_{10} = w_{10} - \eta\nabla w_{10}$

weight를 small random number로 초기화한다.
For each training sample $x^{(k)}$
1. output value를 계산한다.

activation과 weight를 곱한 값을 합연산하여 output을 도출해낸다.

2. gradients를 계산한다.

$w'_{ij} = -x_i^{k}$

$\frac {\delta error} {\delta w_{00}} = \frac {\delta (target-output)} {\delta w_{00}} =\frac {\delta (-output)} {\delta w_{00}} = \frac {- \delta (w_{00}x_0 + w_{10}x_1)} {\delta w_{00}} = \frac {- \delta (w_{00}x_0)} {\delta w_{00}} = -x_0$

(target은 constant이기 때문에 식에서 삭제한다.)

3. weight를 update한다.

$w'_{ij} = w_{ij} - \eta\Delta w_{ij}$

learning rate : 0.2

이런 방식을 반복하여 output이 target과 같아지면 학습이 완료 되었다고 한다.

Matrix Notation :

Batch Computing :

두 input이 다 0일때만 -1이 output으로 나오도록 train 한다.

Calculate gradients (learning rate = 0.2)

Why Batch Computing?

Whole dataset을 이용하여 gradients를 계산해야한다.
Matrices 계산이 vector나 scalar 계산보다 훨씬 빠르다.

Epoch and Mini-Batch

Whole dataset을 이용하여 gradient를 계산하는 것은 실제로는 impossible하다.
Training dataset을 mini-batches 라는 작은 단위로 나눈다.
Whole dataset을 전부 pass through 한 것을 epoch라고 한다.

Hyperparameters

We need to tune the following variables :

$\eta$ the learning rate
Mini-batch size
# fo epochs

이 변수들을 "hyperparameters"라고 한다.

hyperparameters를 잘 선택하는 것이 training speed와 accuracy에도 직접적인 영향을 줄 수 있다.

참고

Ewha Woman Univ. Artificial Intelligence (2023-1. Prof.Jaehyeong Sim)

저작자표시 (새창열림)

'Computer Science > Artificial Intelligence' 카테고리의 다른 글

[Artificial Intelligence] Multi-Layer Perceptron (0)	2023.03.30
[Artificial Intelligence]What is Deep Learning? (0)	2023.03.26

'Computer Science/Artificial Intelligence' Related Articles

Comments

aajin126's devlog

[Artificial Intelligence] Neural Network Basics 본문

[Artificial Intelligence] Neural Network Basics

Modeling Pavlov Using a Neuron

Terms

The Most Successful One : Gradient Descent

Training a Neuron

Example :

Matrix Notation :

Batch Computing :

Why Batch Computing?

Epoch and Mini-Batch

Hyperparameters

'Computer Science > Artificial Intelligence' 카테고리의 다른 글

티스토리툴바