Section 1

For each of the problem below, • Identify the loss function L, design variable θ, and its dimension D • Derive the necessary condition for minimum. No need to solve it, though. • Derive the Hessian of the loss function.

Question 1

Projectile range: On a flat surface, the distance traveled by projectile is where is the velocity of the projectile at angle θ, and g = 9.8 m/s2 is the gravitational acceleration. What should the angle θ and velocity v0 be to get the maximum range?

Identify the loss function L, design variable θ, and its dimension D

Derive the necessary condition for minimum. No need to solve it, though.

-\frac{2v_o^2}g\cos(2\theta) \\ \frac{-4v_o}g \cos(\theta)\sin(\theta) \end{Bmatrix} = 0$$ • **Derive the Hessian of the loss function**

H = \mathcal L” (\theta, v_0) = \begin{Bmatrix} \frac{4v_o^2}g\sin(2\theta) && -\frac{4v_0}g\cos(2\theta) \ -\frac{4v_0}g\cos(2\theta)&& -\frac{4}g\cos(2\theta) \end{Bmatrix}

### Question 2 • **Identify the loss function L, design variable θ, and its dimension D** **Beam design**: We want to design a cantilever 1 meter-long beam that can support a vertical load of 10 N at its free end. If the cross-section is circular, what should be its diameter and what should the beam’s length to minimize the deflection of the beam? Maximum deflection of the beam (from structural mechanics) is given by

\delta = \frac{64FL^3}{3\pi ED^4}

where F is the applied force, L is the length of the beam, E = 1 GPa is the Young’s modulus, and D is the diameter of the beam. Plot for diameter between 1 cm and 5 cm. $\mathcal L(D,L) = \frac{64FL^3}{3\pi ED^4}$ • **Derive the necessary condition for minimum. No need to solve it, though.** $\mathcal L' (D, L) = \begin{Bmatrix} -\frac{256FL^3}{3\pi ED^5} \\ \frac{192FL^2}{3\pi ED^4} \end{Bmatrix} = 0$ • **Derive the Hessian of the loss function**.

H = \mathcal L”(D,L) = \begin{Bmatrix} \frac{1280FL^3}{3\pi ED^6} && -\frac{768FL^2}{3\pi ED^5} \-\frac{768FL^2}{3\pi ED^5} && \frac{384FL}{3\pi ED^4}

\end{Bmatrix}

## Section 2 Compute the gradient $∇L(θ)$ and Hessian $∇^2L(θ)$ of the function $\boldsymbol{\mathcal L(\theta)} = 100(\theta_2 - \theta_1^2)^2+(1-\theta_1)^2$ Show that $θ^∗$ = $(1, 1)^⊤$ is a local minimizer of this function $200\theta_1(\theta_2-\theta_1^2)(-2\theta_1)-2(1-\theta_1) = -400θ₁θ₂ + 400θ₁³ - 2 + 2θ₁$

∇L(θ) = \begin{Bmatrix} -400θ₁θ₂ + 400θ₁³ - 2 + 2θ₁ \ 200(\theta_2-\theta_1^2) \end{Bmatrix}

$$∇^2L(θ) = \begin{Bmatrix} -400\theta_2+1200\theta^2+2 && -400\theta_1 \\ -400\theta_1 && 200 \end{Bmatrix}$$ **Is $θ^* = (1, 1)ᵀ$ a local minimizer?** What is grad at this point?:

∇L(1,1) = \begin{Bmatrix} -400 + 400 - 2 + 2 \ 0\end{Bmatrix} = \begin{Bmatrix} 0 \ 0\end{Bmatrix}

So $(1,1)$ is a Stationary Point What is the hessian at this point?

∇^2L(1,1) = \begin{Bmatrix} -400+1200+2 && -400 \ -400 && 200 \end{Bmatrix} = \begin{Bmatrix} 802 && -400 \ -400 && 200 \end{Bmatrix}

Is this positive definite? $\det[∇^2L(1,1)] = 802*200 - (-400*-400) > 0$ And is the top left elem positive? Yes. Therefore it is positive definite, and $θ^* = (1, 1)ᵀ$ is a local minimizer ## Section 3 Find the stationary point of the function $\mathcal L (\theta) = 8\theta_1 + 12\theta_2+\theta_1^2-2\theta_2^2$ and show that it is neither a maximum or minimum, but a saddle point.

\mathcal L’(\theta_1,\theta_2) = \begin{Bmatrix} 8+2\theta_1 \ 12 - 4\theta_2\end{Bmatrix}

H = \mathcal L ”(\theta_1,\theta_2)= \begin{Bmatrix} 2 &0 \ 0 & -4 \end{Bmatrix}

det(H-\lambda I)= (2-\lambda)(-4-\lambda) - (0) = 0

$\lambda = 2, \lambda = -4$ Eigenvalues are a mixture of postive and negative, therefore we have a saddlepoint ## Section 4 4. Which of the following expressions are valid in index notation? (a) α = aibj (b) α = aibi (c) Aij = aibi (d) Aij = aibj (e) ai = Aij bj (f) aj = Aij bj (g) α = Aij bibj (h) α = AijBij (i) α = Aii b, d, e, g, h, i ## Section 5 For the following expressions in the index notation, find their derivative with respect to θK (a) α = aI θI

\partial \alpha / \partial \theta_k = a_i \partial\theta_i/\partial\theta_k = a_i\delta_{ik} = a_k

(b) aI = AIJ θJ $∂aᵢ/∂θₖ = ∂(Aᵢⱼθⱼ)/∂θₖ = Aᵢⱼ∂θⱼ/∂θₖ = Aᵢⱼδⱼₖ = Aᵢₖ$ (c) α = AIJ θI θJ $∂α/∂θₖ = ∂(Aᵢⱼθᵢθⱼ)/∂θₖ = Aᵢⱼ∂(θᵢθⱼ)/∂θₖ = Aᵢⱼ(δᵢₖθⱼ + θᵢδⱼₖ) = Aᵢⱼδᵢₖθⱼ + Aᵢⱼθᵢδⱼₖ = Aₖⱼθⱼ + Aᵢₖθᵢ = (Aₖⱼ + Aⱼₖ)θⱼ$ ## Section 6 Let a be a given D-dimensional vector, and A be a given D×D symmetric matrix. Write the following two loss functions in index notation: $L_1(θ) = a^⊤θ$ and $L_2(θ) = θ^⊤Aθ$. Next, find their gradient and Hessian. Gradient $L_1(\theta) = a_i \theta_i$ , $L_2 = \theta_iA_{ij}\theta_j$

\partial L_1(\theta)/\partial\theta_k = \partial(a_i\theta_i)/\partial\theta_k = a_i \partial\theta_i/\partial \theta_k= a_i\delta_{ik} = a_k, \quad \nabla L_1 = a$$ Hessian:

Second

Gradient

Hessian:

Section 7

Let be continuously differentiable. Show that

\lim_{\epsilon\rightarrow0}\frac{L(\theta+\epsilon p)-L(\theta)}{\epsilon} = g^\intercal p$$

L(\theta+\epsilon p) = L(θ)+e∇L(θ)^\intercal p+o(ϵ)

$o(\epsilon)/\epsilon \rightarrow 0$

\therefore \lim_{\epsilon\rightarrow0}\frac{L(\theta+\epsilon p)-L(\theta)}{\epsilon} = \lim_{\epsilon\rightarrow0}\nabla L(\theta)^\intercal p = \nabla L(\theta)^\intercal p

But we know that g and $\nabla L(\theta)$ are the same, so its equal to $g^\intercal p$ ## Section 8 Consider the function $L(θ1, θ2) = (θ_1 + θ_2^2)^2$ . At the point $θ^⊤_0 = [1, 0]$, we consider the search direction $p^⊤ = [−1, 1]$. Show that $p$ is a descent direction and find all minimisers of the problem $\arg \min_{α>0} L(θ_0 + αp)$ p is descent direction if $g^\intercal p < 0$

g = \nabla L(\theta) = \begin{Bmatrix} 2(\theta_1+\theta_2^2) \ 4\theta_2(\theta_1+\theta_2^2) \end{Bmatrix}

\therefore \nabla L(\theta_0) = \begin{Bmatrix} 2 \ 0\end{Bmatrix}

g^\intercal p=\begin{Bmatrix}2 & 0 \end{Bmatrix} \begin{Bmatrix} -1 \ 1 \end{Bmatrix} = -2 + 0 = -2 < 0

Therefore, direction $p$ is indeed a descent direction Next: $L(\theta_0 + \alpha p) = L([1,0]+[-\alpha, \alpha]) = L(1-\alpha, \alpha)$ = $( (1-\alpha) + \alpha^2)^2$ So we have critical points at $L'(\theta_0+\alpha p) = 0$
L' = 2(2a-1)(1-a+a^2) =0
So: $a = \frac 12$ or $a = \frac{1\pm\sqrt{-3}}{2}$ , but the second has complex roots, so $\alpha =\frac12$ So is this a minimum?

L” = 2[2(1-a+a^2) + (2a-1)(2a-1)] \quad @ \quad \left[\alpha=\frac{1}{2}\right]

L”\left(\frac12,\frac12\right) = 2[2(1-\frac 12 +\frac14)+0] =\frac{12}4 =3 >0

L’(\theta) = \begin{Bmatrix} 2\theta_1 \ 10^5*2(\theta_2-2) \end{Bmatrix}

\mathcal L”(\theta) = \begin{Bmatrix} 2 & 0 \ 0 & 2*10^5\end{Bmatrix}

\det(A-\lambda I ) = \det\begin{Bmatrix} 2 - \lambda & 0 \ 0 & 210^5 - \lambda\end{Bmatrix} = (2-\lambda)(210^5-\lambda)=0

$\therefore \lambda = 2 \text{ or } 2*10^5$ so cond num = $10^5$

\oiiint