Introduction
You must have seen ropes and chains hanging from two posts. Ever wondered what kind of shape they take? A first guess would be a parabola or a higher order polynomial, but that would be wrong. It turns out to be a \(\cosh\) scaled properly to pass through the end points and have the correct length. The rope sags into a shape that minimizes its total potential energy. Calculus of variations [1] is the study of such problems, and it forms the mathematical foundations of classical and modern physics.
This would have been one of many posts you would find elsewhere if the title did not include “with a sliding end.” What we want to describe is a case where one end of the rope is anchored, and the other end is free to slide down the post. You can think of it being tied to a ring that can freely slide up and down. Such problems are known as variational problems with movable boundaries, and they are much richer than plain-old-fixed-end problems. I will set the stage by starting with the fixed ends case, and later move to the sliding-end problem. Just for entertainment, I also built a simple jig to physically verify that this is not just mathematical wizardry.
Functionals
A functional can be considered as an operation that takes in a function and returns a number. The most familiar functional is integration with fixed limits. It takes in \(f(x)\) and spits out \(\int_a^bf(x) dx\), which is just a number. Integration happens naturally in physics. For example, consider a rope hanging from two anchor points as illustrated in Fig. 1.The potential energy of the differential length \(ds\) is \(d g\, y(x) ds\), where \(d\) is the mass density of the rope, and \(g\) is the gravitational acceleration. We can compute the potential energy by integrating over the length to get v=d g \(\propto\int_a^by(x) ds\), where \(ds=\sqrt{dx^2+dy^2}=\sqrt{1+y'^2}dx\). This is also how \(y'\equiv\frac{\partial y}{dx}\) naturally shows up is such problems.
Variational calculus
In a generic case we will have the functional \(v\) of this form: \[\begin{eqnarray} v=\int^{x_1}_{x_0} \mathscr{L}(x,y,y') dx, \tag{1} \end{eqnarray}\] where \(\mathscr{L}\) is the function of interest [more on that later]. Let’s assume that we have a function \(y(x)\) that gives the minimum value for \(v\). If we fiddle \(y\) around the optimal function by a small amount \(\alpha \eta(x)\), i.e., \(y(x)\rightarrow y(x)+\alpha \eta(x)\), where \(\eta(x)\) is an arbitrary function and \(\alpha\) is a small number, then the change in \(v\) should be \(0\). This is analogous to requiring that the derivative should vanish at a local extremum of the function, that is: \(\frac{df(x)}{dx}\vert_{x=x^*}=0\). Rigorously speaking, we can define the following functional \[\begin{eqnarray} v(\alpha)=\int^{x_1(\alpha)}_{x_0(\alpha)} \mathscr{L}(x,y+\alpha \eta,y'+\alpha \eta') dx, \tag{2} \end{eqnarray}\] and require that \[\begin{eqnarray} \frac{d v(\alpha)}{d\alpha}\bigg \vert_{\alpha=0}=0. \tag{3} \end{eqnarray}\] How we will proceed will depend on the conditions we impose that the end points \(x_0\) and \(x_1\).
Both ends fixed
Consider a problem where the end points are specified. This implies that we are not free to wiggle \(y\) at the end points \(x_0\) and \(x_1\), i.e., \[\begin{eqnarray} \eta(x_0)=\eta(x_1)=0. \tag{4} \end{eqnarray}\] The variation is illustrated in Fig. 2.Keeping the boundary conditions in Eq. (4) in mind, let us calculate Eq. (3): \[\begin{eqnarray} \frac{d v(\alpha)}{d\alpha}\bigg \vert_{\alpha=0}&=&\int^{x_1}_{x_0} \frac{d}{d\alpha}\mathscr{L}(x,y+\alpha \eta,y'+\alpha \eta'(x))\bigg \vert_{\alpha=0} dx=\int^{x_1}_{x_0}\left[\frac{\partial}{\partial y}\mathscr{L}(x,y,y') \eta +\frac{\partial}{\partial y'}\mathscr{L}(x,y,y') \frac{d\eta}{dx}\right]dx\nonumber\\ &=&\int^{x_1}_{x_0}\left[\frac{\partial}{\partial y}\mathscr{L}(x,y,y') \eta +\frac{d}{dx}\left(\frac{\partial}{\partial y'}\mathscr{L}(x,y,y')\eta\right)- \frac{d}{dx}\left(\frac{\partial}{\partial y'}\mathscr{L}(x,y,y')\right)\eta\right]dx\nonumber\\ &=&\int^{x_1}_{x_0}\left[\frac{\partial\mathscr{L}}{\partial y} -\frac{d}{dx}\left(\frac{\partial\mathscr{L}}{\partial y'}\right)\right]\eta dx +\frac{\partial\mathscr{L}}{\partial y'}\eta\bigg \vert_{x_0}^{x_1}=\int^{x_1}_{x_0}\left[\frac{\partial\mathscr{L}}{\partial y} -\frac{d}{dx}\left(\frac{\partial\mathscr{L}}{\partial y'}\right)\right]\eta dx, \tag{5} \end{eqnarray}\] where the boundary terms become \(0\) due to the constraints in Eq. (4). Since \(\eta\) is an arbitrary function, in order to set this equation to \(0\), we require the following:
\[\begin{eqnarray} \frac{\partial\mathscr{L}}{\partial y} -\frac{d}{dx}\left(\frac{\partial\mathscr{L}}{\partial y'}\right)=0. \tag{6} \end{eqnarray}\] Equation (6) is known as the Euler-Lagrange equation and the function \(\mathscr{L}\) is called the Lagrangian. In the case of the hanging rope, we have \[\begin{eqnarray} \mathscr{L}=d g y \sqrt{1 +y'^2}. \tag{7} \end{eqnarray}\] We can plug this into Eq. (6) and solve the resulting differential equation for \(y(x)\). However, it is easy to see that it will be a second order differential equation. It won’t be too hard to solve, but we can do better than that. The crucial observation is that \(\mathscr{L}\) has no explicit \(x\) dependence, i.e., \(\frac{\partial\mathscr{L}}{\partial x}=0\). This means the total derivative of \(\mathscr{L}\) can be written as: \[\begin{eqnarray} \frac{d\mathscr{L}}{dx}=\frac{\partial\mathscr{L}}{\partial x}+\frac{\partial\mathscr{L}}{\partial y} y'+\frac{\partial\mathscr{L}}{\partial y'} y''=\cancel{\frac{\partial\mathscr{L}}{\partial x}}+\frac{\partial\mathscr{L}}{\partial y} y'+\frac{\partial\mathscr{L}}{\partial y'} y''=\frac{\partial\mathscr{L}}{\partial y} y'+\frac{\partial\mathscr{L}}{\partial y'} y''. \tag{8} \end{eqnarray}\] We can create these terms out of the Eq. (6) if we multiply it with \(y'\): \[\begin{eqnarray} \frac{d\mathscr{L}}{\partial y}y' -\frac{d}{dx}\left(\frac{d\mathscr{L}}{\partial y'}\right)y'&=&\frac{d\mathscr{L}}{\partial y}y' -\frac{d}{dx}\left(\frac{d\mathscr{L}}{\partial y'}y'\right)+\frac{d\mathscr{L}}{\partial y''}y''=\frac{d}{dx}\left( \mathscr{L}-\frac{d\mathscr{L}}{\partial y'}y'\right)=0, \tag{9} \end{eqnarray}\] which means that \[\begin{eqnarray} \mathscr{L}-\frac{d\mathscr{L}}{\partial y'}y'=C. \tag{10} \end{eqnarray}\] This reduces the order of the differential equation from two to one! Inserting the expression for \(\mathscr{L}\) into Eq. (7) we get: \[\begin{eqnarray} y \sqrt{1 +y'^2}-\frac{y y'2}{ \sqrt{1 +y'^2}}=\frac{y}{\sqrt{1 +y'^2}}=C. \tag{10} \end{eqnarray}\] It is best to solve the equation in a parametric form by defining \(y'=\sinh t\) \[\begin{eqnarray} y=C\sqrt{1 +y'^2}=C\sqrt{1 +\sinh^2 t}=C\cosh t. \tag{11} \end{eqnarray}\] We can extract \(x(t)\) as follows: \[\begin{eqnarray} \frac{dy}{dt}=C\sinh t=\frac{dy}{dx}\frac{dx}{dt}=\sinh t\frac{dx}{dt} \rightarrow \frac{dx}{dt}=C, \tag{12} \end{eqnarray}\] which gives \(x=C t+D\). We can eliminate \(t\) in favor of \(x\): \(t=\frac{x-D}{C}\) and put it back it \(y(t)\) to get \[\begin{eqnarray} y(x)=C \cosh \left(\frac{x-D}{C}\right). \tag{13} \end{eqnarray}\] \(C\) and \(D\) are the integration constants and they can be fixed by requiring that \(y(x_0)=y_0\) and \(y(x_0)=y_1\), i.e., the anchor points are fixed.
If you think about this practically, you will notice a problem. You have a rope and you can decide on the anchor points as you wish. That completely fixes all the constants. How about the length of the rope, though? A longer rope will definitely have a different shape than that of a shorter one. There should have been another parameter in our solution so that it can be adjusted to give the correct length. That is why we have to introduce a Lagrange multiplier to address variation problems with constraints. In this case the constraint is that the solution should give the correct length: \(\int_{x_0}^{x_0} ds=L\), \(L\) being the length of the rope. In order to enforce this requirement we revise \(\mathscr{L}\) to \(\mathscr{L}-\lambda \left(\int_{x_0}^{x_0} ds -L\right)\) where \(\lambda\) is the Lagrange parameter. The new Lagrangian can be written as1 \[\begin{eqnarray} \mathscr{L}=d g (y-\lambda) \sqrt{1 +y'^2}, \tag{14} \end{eqnarray}\] Note that we don’t have to solve the differential equation all over again since the new term just shifts \(y\). Therefore, the final solution is simply a shifted version of previous one: \[\begin{eqnarray} y(x)=\lambda+C \cosh \left(\frac{x-D}{C}\right). \tag{15} \end{eqnarray}\] This makes more sense now: we have a solution with \(3\) free parameters and we have \(3\) conditions [\(2\) end points and the length]. Imposing the conditions we will get a unique solution. Let’s do that by first calculating the length: \[\begin{eqnarray} L&=&\int_{x_0}^{x_0} ds =\int_{x_0}^{x_0} \sqrt{1 +y'^2} dx=\int_{x_0}^{x_0} \sqrt{1 +\sinh^2 \left(\frac{x-D}{C}\right)} dx=\int_{x_0}^{x_0} \cosh \left(\frac{x-D}{C}\right) dx\nonumber\\ &=&C\left[ \sinh \left(\frac{x_0-D}{C}\right)-\sinh \left(\frac{x_0-D}{C}\right)\right]. \tag{16} \end{eqnarray}\] Along with the end point requirements, we have the following conditions: \[\begin{eqnarray} y(x_0)=\lambda+C \cosh \left(\frac{x_0-D}{C}\right)&\equiv& y_0\nonumber\\ y(x_0)=\lambda+C \cosh \left(\frac{x_0-D}{C}\right)&\equiv& y_1\nonumber\\ C\left[ \sinh \left(\frac{x_0-D}{C}\right)-\sinh \left(\frac{x_0-D}{C}\right)\right]&=&L. \tag{17} \end{eqnarray}\] In principle, these equations can be solved numerically, but we can simplify them a bit. Taking the difference of two lines gives: \[\begin{eqnarray} \frac{y_1-y_0}{C}=\cosh \left(\frac{x_0-D}{C}\right)- \cosh \left(\frac{x_0-D}{C}\right). \tag{18} \end{eqnarray}\] Take its square and subtract the square of the third line: \[\begin{eqnarray} \frac{(y_1-y_0)^2}{C^2} -\frac{L^2}{C^2}&=&\cosh^2 \left(\frac{x_0-D}{C}\right)- 2 \cosh \left(\frac{x_0-D}{C}\right)\cosh \left(\frac{x_0-D}{C}\right)+\cosh^2 \left(\frac{x_0-D}{C}\right)\nonumber\\ &&-\sinh^2 \left(\frac{x_0-D}{C}\right)+2 \sinh \left(\frac{x_0-D}{C}\right)\sinh \left(\frac{x_0-D}{C}\right)-\sinh^2 \left(\frac{x_0-D}{C}\right)\nonumber\\ &=&2\left[1-\cosh\left( \frac{x_0-x_0}{C}\right) \right], \tag{18} \end{eqnarray}\] which still needs to be solved numerically. What we accomplished by going through the algebra was that we reduced the problem from three equation with three unknows to one equation with one unknown, \(C\). Once we solve for \(C\), we can insert it back into Eq. (18) to get \(D\), and finally we can compute \(\lambda\).
One end sliding
In many physical problems the end points might be movable. Consider a case where one end is fixed and the other one is not, as in Fig. 3.As seen from the geometry in Fig. 4, we can write the following equation \[\begin{eqnarray} \delta y\bigg \vert_{x_1}=\delta y_1- y'(x_1)\delta x_1. \tag{23} \end{eqnarray}\] Putting this back in Eq. (22) we get \[\begin{eqnarray} \left[\mathscr{L} -y'\frac{\partial\mathscr{L}}{\partial y'}\right]_{x_1}\delta x_1+\frac{\partial\mathscr{L}}{\partial y'}\bigg \vert_{x_1}\delta y_1=0. \tag{24} \end{eqnarray}\]
If \(\delta x_1\) and \(\delta y_1\) are independent, we need to set the two terms in Eq. (24) to \(0\) individually. However, in most physical problems, the end point is constrained to move on a curve \(y_1=\varphi(x_1)\). In such cases we will have \(\frac{\delta y_1}{\delta x_1}=\varphi'(x_1)\), and Eq. (24) simplifies to \[\begin{eqnarray} \left[\mathscr{L} +\left(\varphi'-y'\right)\frac{\partial\mathscr{L}}{\partial y'}\right]_{x_1}\delta x_1=0, \tag{25} \end{eqnarray}\] which is known as the transversality condition. For the specific case of \(\mathscr{L}\) in Eq. (14) we get
\[\begin{eqnarray} \left[\frac{(y-\lambda)(1+\varphi' y') }{\sqrt{1 +y'^2}}\right]_{x_1}=0. \tag{26} \end{eqnarray}\] Assuming \(y-\lambda\neq0\) at the boundary, the only way to satisfy this equation would require \(\varphi' y'\vert_{x_1}=-1\), that is \(y\) should be orthogonal to the curve \(\varphi\). This is really neat. It all boils down this: the curve will still be a catenary, but when it hits the boundary, it should be perpendicular to it. For example, if you have a vertical post, the chain will be parallel to the ground at the post!
Neural Networks
Why on earth will you want to solve a differential equation with neural networks(NNs)? The first answer is, because why not if you can and you like applying NNs on everything?A more reasonable answer would be that sometimes you have the differential equation that encodes the physics of the problem, and empirical data you collected. You want to blend these in to get the best solution. This is referred to as physics informed neural networks(PINN) [2], [3].
In this approach, the deviation from the boundary and the initial conditions are integrated into the loss function. During the training process, the network optimizes the parameters so that the approximate solution satisfies the differential equation and the boundary conditions, and -if available- the data, with the least amount of error. The method ensures that the final solution is in compliance with the differential equation, which stems from the underlying physical theory, hence the name physics informed neural network.
To be absolutely clear, for this particular problem, there is no reason to solve this problem with NN other than that it is fun. Let’s do that and solve Eq. (10).
# Solving the catenary proble with Neural Networks, just for fun
# 06/14/2021, tetraquark@gmail.com
# Requires Python 3.x and TensorFlow v2.2
# tetraquark.netlify.app/post/catenary/index.html
# Ack: Nicholas P.
import pandas as pd
import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt
from scipy.integrate import odeint
import math
# The Euler-Lagrange equation for catenary
def f(y,t):
return -(abs(y**2-1))**0.5
# numerical solution for verification
def num_solve(y0,t):
y_num = odeint(f,y0,t)
return [t,y_num[:,0]]
def NNmodel():
m = tf.keras.models.Sequential([
tf.keras.layers.Dense(16,use_bias=True,activation='tanh'),
tf.keras.layers.Dense(16,use_bias=True,activation='tanh'),
tf.keras.layers.Dense(1, activation='linear')
])
return m
# define loss
@tf.function
def loss(m,x,t):
with tf.GradientTape() as gt:
gt.watch(t)
with tf.GradientTape() as gt2:
gt2.watch(t)
NNsolution = m(t,training=True)
dydx = gt2.gradient(NNsolution,t)
x0 = tf.reshape(tf.constant(0.0,dtype=tf.float32),(1,1)) # initial value
with tf.GradientTape() as gt:
gt.watch(x0)
NNsolution0 = m(x0,training=True)
dydx0 = gt.gradient(NNsolution0,x0)
errEquation =dydx*dydx-(NNsolution**2-1) # error is written in the quadratic form to avoid sqrts
errInitVal = NNsolution0- y0
Lequation = tf.reduce_sum(errEquation*errEquation)/npoints*2
LinitVal = tf.reduce_sum(errInitVal*errInitVal)
f = Lequation+LinitVal
return f
npoints=100;xmax=1;
y0=np.cosh(1) ## initial conditions y(0)
y_init=np.full( npoints, 0) # initial guess, all zeros
t_init=np.linspace(0,xmax,npoints) # will solve this from 0 to xmax, and shift it at the end
# setup model
m = NNmodel()
opt = tf.keras.optimizers.Adam(learning_rate=0.001)
nepochs=200
for i in range(nepochs):
# create a training batch
samples = len(y_init)
samples_index = np.random.choice(len(t_init),samples,replace=False)
t = tf.reshape(tf.constant(t_init[samples_index],dtype=tf.float32),(samples,1))
x = tf.reshape(tf.constant(y_init[samples_index],dtype=tf.float32),(samples,1))
for j in range(10): # train on the batch
# compute gradient of loss wrt NN weights
with tf.GradientTape() as tape:
los = loss(m,x,t)
gradients = tape.gradient(los,m.trainable_variables)
opt.apply_gradients(zip(gradients,m.trainable_variables))
if (i % 10) == 0:
print('epoch: ' + str(i) + '/' + str(nepochs) + ', loss = ' + str(los.numpy()))
x = np.linspace(0.,xmax,npoints+1)
y = m(x).numpy() # solve with NN
[x_sim,y_sim] = num_solve(y0,x) # solve it with ODE solver
yf = [val for sublist in y for val in sublist]
x=x-1
dataset = pd.DataFrame({'x': x, 'y_nn': yf,'y_sim':y_sim}, columns=['x', 'y_nn','y_sim'])
dataset.to_csv('catenary.csv', index=False)
plt.plot(x,y_sim,label='Numerical Solution')
plt.plot(x,y,linestyle='--',label='NN Solution')
plt.legend(loc="upper left")
plt.xlabel('x')
plt.ylabel('y(x)')
plt.title('Catenary Curve')
plt.show()
Figure 7 shows that it is indeed possible to solve a differential equation with neural networks.
Experimental tests
Does this work in real life? Can we confirm that ropes and chains indeed take the catenary shape? Let’s take a look at the image taken at the beautiful city of Estes Park, CO, and see what it tells us.
Image processing with Python
This is not that much of a processing: we just want to load the image an overlay a \(\cosh\) curve to see if it fits the rope shape.
# Fitting a cosh curve on a rope in a photo
# 06/17/2021, tetraquark@gmail.com
# tetraquark.netlify.app
from PIL import Image
import math
imgid= 'featured.jpg'
img = Image.open(imgid)
pixels = img.load()
leftpostx=550
leftposty=1280
rightpostx=3100
rightposty=1350
midx=(leftpostx+rightpostx)/2 +50
midy=1720
offset=750
catparameter= img.size[1]-midy +offset
def catenary(c,x):
return c* math.cosh(x/c)
for i in range(img.size[0]):
for j in range(img.size[1]):
catpoint=img.size[1]-catenary(catparameter,(i-midx))+offset
if abs(j-catpoint ) <5 and i>leftpostx and i<rightpostx:
pixels[i, j] = (0, 255,255)
if ((i-leftpostx)**2+(j-leftposty)**2)**0.5 <20:
pixels[i, j] = (255, 0, 0)
if ((i-rightpostx)**2+(j-rightposty)**2)**0.5 <20:
pixels[i, j] = (0, 255,0)
if ((i-midx)**2+(j-midy)**2)**0.5 <20:
pixels[i, j] = (0, 0,255)
img.show()
img.save('marked_'+imgid)
A test set up
How about the sliding end case? We probably won’t find this out in the wild, so I have to build a test rig. I have a pile of metal shafts that I have been pulling off from dead printers. They are very polished and have low friction. I also found a plastic cylinder that fits perfectly on the shaft, and it slides easily. The curve indeed hits the shaft at \(90^{\circ}\), and I find it very cool!
Build your own catenary
As you have made it to the end of the post, I wouldn’t let you go away without having some fun. You can build your own catenary by adjusting the sliders below. The path is calculated in the back end using JavaScript.
This is coming soon, stay tuned!
We absorb the prefactor \(gd\) by redefining \(\frac{\lambda}{gd}\) as \(\lambda\).↩︎