Difference of two linearly distributed random numbers

\(\require{cancel}\) \(\require{boldsymbol}\)

A couple years ago, I was reading through the lecture notes from Arpaci-Dusseau[1] on operating systems and came across some calculations on HDD seek time. I didn’t quite agree with some of the hidden assumptions in the notes, and decided to take that on as a fun exercise with random variables. I wrote a blog post on it and shared it with one of the authors of the lecture notes. He liked my analysis so much so that he thought it should be extended and submitted for publication in a peer-reviewed conference. We just did that and we are waiting for the decision. Meanwhile, I want to isolate a portion of that blog post and discuss the calculation of probability density of the the absolute difference of two random numbers, which are themselves distributed linearly in a range.

Calculations

Consider a random variable \(R\) which is distributed linearly in a range \([r_\text{i},r_\text{o}]\):

\[\begin{eqnarray} f_R(r)\equiv \frac{2 r}{r_\text{o}^2-r_\text{i}^2}, \text{ where } r_\text{i} \leq r \leq r_\text{o}. \tag{1} \end{eqnarray}\] We take two such random variables and define their absolute difference as a new random variable: \[\begin{eqnarray} S=|R_1-R_2| \tag{2}. \end{eqnarray}\] The task is to figure out the probability density of \(S\), i.e., \(f_S(s)\). To this end, it is best to start from the cumulative distribution: \[\begin{eqnarray} F_S(s)=P(|r_1-r_2|<s)=\iint\limits_{|r_2-r_1|<s}dr_1dr_2 f_R(r_1)f_R(r_2). \tag{3} \end{eqnarray}\] We need to figure out the domain for which \(|r_1-r_2|<s\) is satisfied. It is the green shaded area in Fig. 1.
The domain of interest for integration. In the green shaded area $|r_2-r_1|<s$ is satisfied.

Figure 1: The domain of interest for integration. In the green shaded area \(|r_2-r_1|<s\) is satisfied.

Therefore, the cumulative probability function of the difference can be written as \[\begin{eqnarray} F_S(s)&=&\iint\limits_{|r_2-r_1|<s}dr_1dr_2 f_R(r_1)f_R(r_2)= \iint\limits_{\text{green}}dr_1dr_2 f_R(r_1)f_R(r_2)=1- \iint\limits_{\text{red}}dr_1dr_2 f_R(r_1)f_R(r_2)\nonumber\\ &=&1-2 \int_{r_\text{i}}^{r_\text{o}-s} dr_1 f_R(r_1)\int_{r_1+s}^{r_\text{o}}f_R(r_2) dr_2=1-2 \int_{r_\text{i}}^{r_\text{o}-s} dr_1 f_R(r_1)\left[ F_R(r_\text{o})-F_R(r_1+s)\right], \tag{4} \end{eqnarray}\] from which we can get the probability density by differentiating with respect to \(s\): \[\begin{eqnarray} f_S(s)&=&\frac{\partial}{\partial s}F_S(s)=2\int_{r_\text{i}}^{r_\text{o}-s}dr_1f_R(r_1)f_R(r_1+s)=\frac{4}{3\left(r_\text{o}^2-r_\text{i}^2\right)^2}\left[2(r_\text{o}^3-r_\text{i}^3)-3(r_\text{o}^2+r_\text{i}^2)s+s^3 \right]. \tag{5} \end{eqnarray}\]

Simulation

We first need a way of creating random numbers with the distribution in Eq. (1). Such a linear distribution is not typically available in standard programming languages, and we have to build the distribution ourselves. There are multiple ways of doing this. For example, we can take two uniform random numbers, add them up to get a triangular distribution (this simply follows from the convolution of two rectangular distributions.) We can then carve out the range we are interested by simply rejecting the instances outside. Alternatively, we can take two random variables \(X\) and \(Y\) uniformly distributed in the domain defined by \(r^2_i<x^2+y^2<r^2_o\) with the density \(\frac{1}{\pi(r^2_0-r^2_i)}\). We then define two new random variables \(R=(X^2+Y^2)^{1/2}\) and \(\Phi=\text{sign}(Y)\arccos\left(\frac{X}{(X^2+Y^2)^{1/2}}\right)\). The measure of the integral will transform with the Jacobian matrix \[\begin{eqnarray} \frac{dxdy}{\pi(r^2_0-r^2_i)}=\frac{1}{\pi(r^2_0-r^2_i)} \left|\frac{d(x,y)}{d(r,\phi)}\right|drd\phi =\frac{1}{\pi(r^2_0-r^2_i)}\left|\begin{matrix}cos\phi & -r sin\phi \\sin\phi & r \cos\phi\end{matrix}\right|drd\phi =\frac{r}{\pi(r^2_0-r^2_i)}drd\phi. \tag{6} \end{eqnarray}\] Upon integrating out \(\phi\), we pick up a factor of \(2\pi\), and get the same expression for \(f(r)\) as we had in Eq. (1).

Yet another method is to use inverse transform sampling. Consider a random variable \(U\sim \text{Unif}(0,1)\) and define a random variable \(X=F_X^{-1}(U)\). With this construction, \(X\) will have CDF as \(F_X\). Let’s give this a try. We first calculate \(F_R\) given \(f_R\) in Eq. (1):

\[\begin{eqnarray} F_R(r)\equiv\int_{-\infty}^r d\tau f_R(\tau)=\frac{ r^2-r_\text{i}^2}{r_\text{o}^2-r_\text{i}^2}, \text{ where } r_\text{i} \leq r \leq r_\text{o}, \tag{7} \end{eqnarray}\] which results in \[\begin{eqnarray} F^{-1}_R(U)=\sqrt{r_\text{i}^2+ (r_\text{o}^2-r_\text{i}^2) U}. \tag{8} \end{eqnarray}\] In summary, if we pull numbers from a uniform random variable \(U\sim \text{Unif}(0,1)\) and apply the function in Eq. (8), we will get the linear distribution. Since this is the simplest one, we will take this approach in the simulation. I will simply simulate the random numbers with \(r_\text{i}=1\) and \(r_\text{o}=2\), and overlay the distributions from the simulation with the expected ones from the model above. Figures 2 and 3 show the perfect agreement with the calculated and simulated distributions.

You can find the details of the simulation code below.
  # A code to simulate difference of two rand numbers
  #on 11/17/2023, tetraquark.netlify.app/post/difference_lin_rand/difference_lin_rand/?src=embeddedCode  
  library(plotly);library(htmlwidgets);
  tickFontSize=20;# sets all tick font for all images
  colorBarTickFontSize=15;
  titleFont <- list( #family = "Courier New",
  size = 20,color = '#1f77b4')
  
  m <- list(
    l = 120,
    r = 10,
    b = 80,
    t = 80,
    pad = 4
  )
  
  lg <- list(
       x = 0.25,
      y =1,
     # orientation = "h",
  
    font = list(
    #family = "sans-serif",
      size = 18,
      color = "black"
      )
  #  bgcolor = "#E2E2E2",
  # bordercolor = "#FFFFFF",
  #borderwidth = 2
  )
  plotW=900;plotH=400
  t1 <- list(
    family = "Times New Roman",
    color = "black")
  
  
  
  set.seed(15)
  samplesize<-50000;
  ri=1;ro=2;
  delta_rmax<-ro-ri  # set radii info
  U<-runif(samplesize);
  dsim<-data.frame("R1"=sqrt(ri^2+(ro^2-ri^2)*U))
  U<-runif(samplesize);
  dsim$R2<-sqrt(ri^2+(ro^2-ri^2)*U)
  dsim$s<-abs(dsim$R2-dsim$R1)
  
  fs<-function(s){
    fsNum=2*(ro^3-ri^3)-3*(ro^2+ri^2)*s+s^3
    fsDeNum=3* (ro^2-ri^2)^2
    retV=4*fsNum/fsDeNum
    if(s>delta_rmax | s<0){retV=0}
    return(retV)
  }
  
  fr<-function(r){
    retV=2*r/(ro^2-ri^2)
      if(r>ro | r<ri){retV=0}
    return(retV)
  }
  dth<-data.frame("R"=seq(0,ro+0.5,0.02))
  dth$'fr'<-sapply(dth$R, fr)
  
  binSize=0.6
  figTS<-plot_ly()
  figTS<-figTS %>%  add_lines( x = dth$R, y = dth$fr, line = list(width = 4,color='black'), name="Model")%>% 
    add_histogram(x = ~dsim$R1,
                 type = "bar",
                 histnorm = "probability", name='<span style="color:#1f77b4">Simulation</span>', opacity= 0.3,nbinsx=binSize, hoverinfo = "skip",yaxis = "y2",color=I("#1f77b4")) %>% 
    layout(yaxis2 = list(overlaying = "y", 
                         side = "right", 
                         showgrid = T, 
                         showticklabels = FALSE,
                         zeroline = F,hoverformat= '.2f'))
  
  
  figTS <- figTS %>%
    layout(
              title= list(text = "TeX($$\\Large{\\text{Simulation}}$$)",font = titleFont,  y = 0.78, x = 0.5, xanchor = 'center', yanchor =  'top'),
  
      bargap=0,
      xaxis = list(
        showgrid = T, 
        title = "TeX($$\\Huge{r}$$)",
        font = t1,
         tickmode = "array",
              tickfont = list(size = tickFontSize),
        hoverformat= '.2f',
        tick0 = 0.0
     
    ),
        yaxis = list(
        range = list(0,0.01+max(dth$fr)),
        title = "TeX($$\\Huge{f_R(r)}$$)",
        font = t1,
         tickmode = "array",
              tickfont = list(size = tickFontSize),
        hoverformat= '.2f',
        tick0 = 0.02,
        zeroline=T,
        zerolinecolor="lightgray"
      # ticktext=ticktext,
      # tickvals=tickvals
        )
        ,
        yaxis2 = list(
        range = list(0,.0135),
  overlaying = "y", 
               side = "right", 
               showgrid = F, 
               hoverformat= '.2f',
  zeroline=F
  
    )
    )
  figTS <- figTS %>% layout(paper_bgcolor='rgba(0,0,0,0)',paper_bgcolor='rgba(0,0,0,0)')
  
  figTS <- figTS %>% layout(autosize = F, width = plotW, height = plotH,margin=m)
  figTS <- figTS %>%layout(title="")%>% layout(showlegend = T)  %>%layout(legend=lg)
  
  
  
  figTS%>%
    config(mathjax = "cdn")
  
  ############# Seek Time Distribution Simulation
  
  #htmltools::save_html(figTS, file = "difference_lin_rand\\figTSx.html")
  
  
  
  dths<-data.frame("s"=seq(0,ro-ri,0.02))
  dths$'fs'<-sapply(dths$s, fs)
  binSize=0.6
  figTSd<-plot_ly()
  figTSd<-figTSd %>%  add_lines( x = dths$s, y = dths$fs, line = list(width = 4,color='black'), name="Model")%>% 
    add_histogram(x = ~dsim$s,
                 type = "bar",
                 histnorm = "probability", name='<span style="color:#1f77b4">Simulation</span>', opacity= 0.3,nbinsx=binSize, hoverinfo = "skip",yaxis = "y2",color=I("#1f77b4")) %>% 
    layout(yaxis2 = list(overlaying = "y", 
                         side = "right", 
                         showgrid = T, 
                         showticklabels = FALSE,
                         zeroline = F,hoverformat= '.2f'))
  
  
  
  
  figTSd <- figTSd %>%
    layout(
              title= list(text = "TeX($$\\Large{\\text{Simulation}}$$)",font = titleFont,  y = 0.78, x = 0.5, xanchor = 'center', yanchor =  'top'),
  
      bargap=0,
      xaxis = list(
        range=list(0.01,delta_rmax),
        showgrid = T, 
        title = "TeX($$\\Huge{s}$$)",
        font = t1,
         tickmode = "array",
              tickfont = list(size = tickFontSize),
        hoverformat= '.2f',
        tick0 = 0.0
     
    ),
        yaxis = list(
        range = list(0,0.01+max(dths$fs)),
        title = "TeX($$\\Huge{f_S(s)}$$)",
        font = t1,
         tickmode = "array",
              tickfont = list(size = tickFontSize),
        hoverformat= '.2f',
        tick0 = 0.02,
        zeroline=T,
        zerolinecolor="lightgray"
      # ticktext=ticktext,
      # tickvals=tickvals
        )
        ,
        yaxis2 = list(
        range = list(0,.021),
  overlaying = "y", 
               side = "right", 
               showgrid = F, 
               hoverformat= '.2f',
  zeroline=F
  
    )
    )
  figTSd <- figTSd %>% layout(paper_bgcolor='rgba(0,0,0,0)',paper_bgcolor='rgba(0,0,0,0)')
  figTSd <- figTSd %>% layout(autosize = F, width = plotW, height = plotH,margin=m)
  figTSd <- figTSd %>%layout(title="")%>% layout(showlegend = T)  %>%layout(legend=lg)
  
  figTSd%>%
    config(mathjax = "cdn")
  
  htmltools::save_html(figTSd, file = "figTSd.html")

The linear distribution created out of the uniform random variable with $r_\text{i}=1$ and $r_\text{o}=2$.

Figure 2: The linear distribution created out of the uniform random variable with \(r_\text{i}=1\) and \(r_\text{o}=2\).

Distribution of the absolute difference.

Figure 3: Distribution of the absolute difference.

[1]
R. H. Arpaci-Dusseau and A. C. Arpaci-Dusseau, Operating Systems: Three Easy Pieces, 1.00 ed. https://pages.cs.wisc.edu/~remzi/OSTEP/file-disks.pdf; Arpaci-Dusseau Books, 2018.

Related