Filter Matching

Problem Statement

Consider the set of numbers \([0,10^N-1)\) where \(N\) is the number of digits. How many of these numbers will have a certain filter pattern, i.e., for a given \(N\), how many of the numbers will contain \(01\)?

  • \(N<3\): We can immediately see that the numbers will not have any \(01\) in them .
  • \(N=3\): We can list the numbers with the required pattern: \(\{101,201,\cdots,901 \}\).
  • \(N>3\): How can we calculate the number of instances that include the pattern we are looking for?

Note that we are trying to find out whether a number contains any \(01\). If it does, no matter how many times \(01\) appears in the number, we will count it as \(1\). If we were simply calculating how many times the pattern appears, it would have been a much simpler problem. To illustrate, consider \(50101\): \(01\) appears twice in this number, however this is will count as one.

Total number of appearances

Consider \(n\) digit numbers. First of all, note that the numbers do not start with \(0\) by construction. Therefore, the leading digit, \(X\), can only be \([1-9]\), i.e., it has \(9\) combinations. We can place \(01\) anywhere in the remaining \(n-1\) digits, however, given the fact that \(01\) has two digits, we have \(n-2\) spots available for it:

\[\begin{equation} \underbrace{X \,0\,\overbrace{1\underbrace{\cdots\cdots}_\text{n-3}}^\text{n-2} }_\text{n}\tag{1} \end{equation}\]

Therefore, total number of \(01\) appearances is:

\[\begin{equation} \rm{Total\, number\,of \, appearances}\simeq 9 (n-2)10^{n-3}, \tag{2} \end{equation}\] where the factors \(9\), \(n-2\) and \(10^{n-3}\) come from total possible cases for \(X\), the possible positions of \(01\) and, possible combinations of \(n-3\) digit numbers, respectively. We used \(\simeq\) rather than \(=\) since we will need to correct for the over-counting.

Multiple appearances

Eq. (2) over counts the numbers because it assumes where ever \(01\) lands, it creates a new number. This is not necessarily correct when we already have \(01\) in the \(n-3\) digit number. Swapping \(01\)’s will not create a new number. We need to identify such cases and subtract them out. It will look as follows:

\[\begin{equation} \underbrace{X \fbox{01} \fbox{01}\underbrace{\cdots\cdots}_\text{n-5}}_\text{n} \tag{3} \end{equation}\]

we have two \(\fbox{01}\) objects, and \(n-5\) digits, i.e., \(n-3\) objects total. The total number of different combinations we can create is given by

\[\begin{equation} \rm{first}\, \rm{correction}= 9 \frac{(n-3)!}{2! (n-5)!} 10^{n-5},\tag{4} \end{equation}\] where \((n-3)!\) represents different configurations of \(n-3\) objects, \(2!\) removes the over-counting due to the fact that swapping \(\fbox{01}\)’s does not give a new combination, \((n-5)!\) removes the over-counting within \(n-5\) digits since that is already taken into account with the term \(10^{n-5}\). Subtracting this from Eq. (2), we get

\[\begin{equation} \rm{Total\, number\,of \, appearances}\simeq 9 (n-2)10^{n-3}-9 \frac{(n-3)!}{2! (n-5)!} 10^{n-5}. \end{equation}\]

This equation is almost correct. We need to check for another over-counting we did in Eq. (4), just like in the case of Eq. (2): we assumed that swapping of \(n-5\) with \(\fbox{01}\) creates new combinations. This is clearly not correct if \(n-5\) has \(\fbox{01}\)’s in it. In that case, we need to consider the following: \[\begin{equation} \underbrace{X \fbox{01} \fbox{01}\underbrace{\fbox{01}\underbrace{\cdots\cdots}_\text{n-7}}_\text{n-5}}_\text{n} \tag{5} \end{equation}\]

The number of combinations we can create is: \[\begin{equation} \rm{second}\,\rm{correction}= 9 \frac{(n-4)!}{3! (n-7)!} 10^{n-7}, \tag{6} \end{equation}\]

which is the over counting we did in Eq. (4), i.e., we need to correct the correction term by subtracting this out. \[\begin{equation} \rm{Total\, number\,of \, appearances}\simeq 9 (n-2)10^{n-3}-9 \frac{(n-3)!}{2! (n-5)!} 10^{n-5}+9 \frac{(n-4)!}{3! (n-7)!} 10^{n-7}. \tag{7} \end{equation}\]

Note that in Eq. (6), we again assumed \(n-7\) digits had no \(\fbox{01}\)’s in them, but this might be the case. We have to rinse and repeat: correct the correction to the correction. It might look like a never ending battle, however, we quickly notice the pattern of alternating corrections and shifted factorials. Therefore we can arrive at the final answer as a sum of alternating terms:

\[\begin{equation} \rm{Total\, number\,of \, appearances}=9\sum_{j=2}(-1)^{j}\frac{(n-j)!}{j!(n-2j+1)!}10^{n-2j+1}, \tag{8} \end{equation}\] where the summation will gracefully terminate for \(j>\frac{n+1}{2}\) since factorial of negative numbers is defined as \(\infty\). Note that this is the number of matches for the set of all \(n\) digit numbers. If we wanted to compute the total of all matches up to and including \(N\) digit numbers, we just sum over \(n\): \[\begin{equation} \rm{Total\, number\,of \, appearances\, up\, to \, N\, digits}=9\sum_{n=3}^N\sum_{j=2}(-1)^{j}\frac{(n-j)!}{j!(n-2j+1)!}10^{n-2j+1}, \tag{9} \end{equation}\] where the summation over \(n\) can be started from \(3\) since we know there is no match for \(n<3\).

An approximate solution

If we are just interested in an estimate of the fraction of the numbers that will match the filter pattern, we can come up with a quick estimation. We have an \(n\)- digit number, we are sliding a pattern of \(01\) over \(n-1\) digits to check if they match. The probability that we will not observe \(01\) is \(0.99\). We need to slide the pattern and keep checking for a total of \(n-2\) times. The probability that there will be no match is \(0.99^{n-2}\), which means the probability of matching is \(1-0.99^{n-2}\). This shows that as \(n\) increases, most of the files will have \(01\)’s in it.

The story of \(\pi\)

\(\pi\) is an irrational number and its digits don’t repeat themselves. Many mathematicians believe that it contains every possible finite sequences of numbers in it. How is \(\pi\) relevant to this problem? It just illustrates that if the number is long enough, it will contain any pattern you can imagine. Do you want to find where your name appears in \(\pi\), convert your name to ASCII here, and search for it in \(\pi\) at this site.

Results

Table 1: the table of computed numbers.
digitsCountCount.cumul.PCT.cumul.PCT.appr.firstsecondthirdfourth
100000000
200000000
3990.919000
41801891.891.99180000
5269128802.882.972700-900
635730386103.863.9436000-27000
74446094832194.834.9450000-540090
8531036057935795.795.855400000-900003600
961658991674525706.756.796.3e+07-13500009000-9
107012795507687321207.697.737.2e+08-18900000180000-450
11785113650986198686298.628.658.1e+09-2.52e+083150000-13500
128.681e+109.543e+109.549.569e+10-3.24e+0950400000-315000