\documentclass[12pt,reqno]{article}
\usepackage[usenames]{color}
\usepackage{amssymb}
\usepackage{graphicx}
\usepackage{amscd}
\usepackage[colorlinks=true,
linkcolor=webgreen,
filecolor=webbrown,
citecolor=webgreen]{hyperref}
\definecolor{webgreen}{rgb}{0,.5,0}
\definecolor{webbrown}{rgb}{.6,0,0}
\usepackage{color}
\usepackage{fullpage}
\usepackage{float}
\usepackage{psfig}
\usepackage{graphics,amsmath,amssymb}
\usepackage{amsthm}
\usepackage{amsfonts}
\usepackage{latexsym}
\usepackage{epsf}
\setlength{\textwidth}{6.5in}
\setlength{\oddsidemargin}{.1in}
\setlength{\evensidemargin}{.1in}
\setlength{\topmargin}{-.5in}
\setlength{\textheight}{8.9in}
\newcommand{\seqnum}[1]{\href{http://oeis.org/#1}{\underline{#1}}}
\begin{document}
\begin{center}
\epsfxsize=4in
\leavevmode\epsffile{logo129.eps}
\end{center}
\theoremstyle{plain}
\newtheorem{theorem}{Theorem}
\newtheorem{corollary}[theorem]{Corollary}
\newtheorem{lemma}[theorem]{Lemma}
\newtheorem{proposition}[theorem]{Proposition}
\theoremstyle{definition}
\newtheorem{definition}[theorem]{Definition}
\newtheorem{example}[theorem]{Example}
\newtheorem{conjecture}[theorem]{Conjecture}
\theoremstyle{remark}
\newtheorem{remark}[theorem]{Remark}
\begin{center}
\vskip 1cm{\LARGE\bf
The $3n+1$ Problem: \\
\vskip .12in
A Probabilistic Approach
}
\vskip 1cm
\large
Darrell Cox \\
\href{mailto:dlcox@graysoncable.com}{\tt dlcox@graysoncable.com} \\
\end{center}
\vskip .2 in
\begin{abstract}
The Poisson probability distribution is used to model the number of cycles
in the generalized Collatz problem.
First, \emph{interrelated cycles} are defined and used as a criterion
in counting the cycles for a given $q$ value. Initially, archived data in the
mathematical literature (giving the known $3n+q$ cycles) is analyzed. For
large samples, the Poisson probability distribution gives a poor fit of the
data (there are too many cycles for the large $x$ values). \emph{Associated cycles}
are defined and used as an additional criterion in counting cycles; this
improves the data fit substantially. Some theory and empirical results are
given in an attempt to explain the origin of this distribution of cycle
counts. Degrees of freedom in probability distributions involving the
difference between the number of odd and even elements in a cycle are
shown to be a partial explanation for the distribution of cycle counts.
\emph{($L$, $K$) trees} (generalized associated associated cycles) are defined and
used to account for the smallest difference between the number of odd
and even elements in the cycles for a given $q$ value. The article consists
entirely of analysis of empirical results; no proofs are given.
\end{abstract}
\section{Introduction}
Let $n$ be a natural number. If $n$ is odd, let the next number in a sequence be
$3n+1$, otherwise let the next number be $n/2$. The Collatz conjecture states
that starting with any initial $n$ value, the sequence always ends in the cycle
(4, 2, 1). A standard technique in mathematics is to study generalizations of
apparently intractable problems. Let $d$ be an odd natural number not divisible
by 3. In the following, the $3n+d$ sequence is considered. As is standard in
the mathematical literature, the element after an odd element $n$ is defined to
be $(3n+d)/2$. Let $K$ be the number of odd elements in a $3n+d$ cycle and $L$
the number of even elements. For a given $d$ value, cycles having the same length
($K+L$) and the same number of odd elements will be said to be interrelated.
When counting cycles for a given $d$ value, only one of the interrelated cycles
will be included. In this case, a Poisson probability distribution ($P(x)=
(\lambda^{x}/x!)e^{-\lambda}$, $x=0$, 1, 2, \ldots) where $\lambda=1$ can be used to model the
number
of $3n+d$ cycles for a given $d$ value. (For this parameter value, the $x=1$
count should equal the $x=0$ count, the $x=2$ count should equal half the $x=1$
count, the $x=3$ count should equal a third of the $x=2$ count, etc.) Belaga
and Mignotte~\cite{bm} have tabulated the known primitive $3n+d$ cycles for $d$ less
than 20000 (a cycle is said to be primitive if its elements are not a common
multiple of the elements of another cycle). They state that they believe
these are the only such cycles that exist. For $d$ less than 100, the Poisson
probability distribution models the number of cycles quite well; there are 14
$d$ values where there is only 1 cycle, there are 12 $d$ values where there are
exactly 2 cycles, there are 6 $d$ values where there are exactly 3 cycles, and
there is 1 $d$ value where there are exactly 4 cycles (again, taking interrelated
cycles into account). (They also include $3n-1$ cycles in their table. Here,
these cycles are counted with those for $d=1$.) Note that a cycle count of 1
corresponds to $x=0$, a cycle count of 2 corresponds to $x=1$, etc.
In the remainder of this article, larger samples are considered and $n$ is
allowed to be an integer (positive or negative). (Consequently, the cycles up
to 10000 have been recomputed by the author~\cite[``test23h'' in the software
section]{dc}. There are new cycles that do not appear in Belaga and Mignotte's
table.) There does not appear to be any mention of Poisson probability
distributions in Lagarias'~\cite{jla, jlb} annotated bibliographies of the $3x+1$ problem.
(Lagarias~\cite{jlc} also gives the history of the $3x+1$ problem and surveys the
literature concerning the problem. By way of a collection of papers, Lagarias~\cite{jld}
reports on what is currently known on the $3x+1$ problem.) Davison~\cite{ld}
presented a probabilistic argument concerning the number of circuits required
to iterate from a number $n$ to 1; this appears to be the only probabilistic
argument concerning potential cycles (or cycles) in the literature.
\section{Interrelated Cycles}
A parity vector gives the order of even and odd elements in a $3n+d$ sequence,
a ``1'' for an odd element and a ``0'' for an even element. Let $k_{0}$, $k_{1}$, $k_{2}$, \dots,
$k_{K-1}$ denote the positions of the 1's in a parity vector containing $K$ 1's and
$L$ 0's. Let
$$Z := 3^{K-1}2^{k_{0}}+3^{K-2}2^{k_{1}}+3^{K-3}2^{k_{2}}+\cdots +3^{0}2^{k_{K-1}}.$$
A cycle
exists if and only if $dZ/(2^{K+L}-3^{K})$ is an integer (see Bohm and Sontacchi~\cite{bs}).
Note that every parity vector containing at least one 1 corresponds to a cycle
for some $d$ value (duplicated parity sub-vectors correspond to duplicated sub-
cycles). For example, for the $d=17$ cycle of (2, 1, 10, 5, 16, 8, 4), $k_{0}=1$ and
$k_{1}=3$, $Z=3^{1}2^{1}+3^{0}2^{3}=14$, $119=2^{7}-3^{2}$,
and $(17\cdot14)/119$ is an integer.
There do not appear to be any factors of $d$ left over after the division by
$2^{K+L}-3^{K}$. This has been confirmed experimentally (for the 7429 distinct
($d$, $L$, $K$) values of the $3n+d$ cycles for $d<10000$):
\begin{conjecture}
A $3n+d$ cycle exists only if $d$ divides $2^{K+L}-3^{K}$.
\end{conjecture}
If this conjecture holds, $Z$ need not be considered when searching for
cycles; only the factors of $2^{K+L}-3^{K}$ need be considered. When $d=|2^{K+L}-3^{K}|$,
an arbitrarily constructed parity vector with a length of $K+L$ and containing
$K$ 1's corresponds to a cycle (or possibly duplicated sub-cycles if $L$ and
$K$ are not relatively prime), but the cycle is not guaranteed to be primitive.
When reduced, such a cycle corresponds to a cycle for a $d$ value that is a
proper divisor of $2^{K+L}-3^{K}$. In this sense, there is no problem of finding
cycles; they're all well-defined and determined by the parity vectors. Even the
number of interrelated cycles is determined by the combinatorics of generating
parity vectors that are distinct under rotation. For example, for $L$ and $K$
greater than or equal to 1 and less than or equal to 10, the numbers of distinct
parity vectors are as follows:
\begin{center}
\begin{tabular}{ccccccccccc}
& $K=1$ & 2 & 3 & 4 & 5 & 6 & 7 & 8 & 9 & 10 \\
& & & & & & & & & & \\
$L=1$ & 1 & 1 & 1 & 1 & 1 & 1 & 1 & 1 & 1 & 1 \\
2 & 1 & 2 & 2 & 3 & 3 & 4 & 4 & 5 & 5 & 6 \\
3 & 1 & 2 & 4 & 5 & 7 & 10 & 12 & 15 & 19 & 22 \\
4 & 1 & 3 & 5 & 10 & 14 & 22 & 30 & 43 & 55 & 73 \\
5 & 1 & 3 & 7 & 14 & 26 & 42 & 66 & 99 & 143 & 201 \\
6 & 1 & 4 & 10 & 22 & 42 & 80 & 132 & 217 & 335 & 504 \\
7 & 1 & 4 & 12 & 30 & 66 & 132 & 246 & 429 & 715 & 1144 \\
8 & 1 & 5 & 15 & 43 & 99 & 217 & 429 & 810 & 1430 & 2438 \\
9 & 1 & 5 & 19 & 55 & 143 & 335 & 715 & 1430 & 2704 & 4862 \\
10 & 1 & 6 & 22 & 73 & 201 & 504 & 1144 & 2438 & 4862 & 9252 \\
\end{tabular}
\end{center}
There is, however, the problem of determining which $d$ values the primitive
cycles map to. For example, for $K+L=1$ and $K=1$, the parity vector is (1)
(corresponding to the cycle ($-1$) for $d=1$), for $K+L=2$ and $K=1$, the
parity vector is (0, 1) (corresponding to the cycle (2, 1) for $d=1$), and for
$K+L=3$ and $K=2$, the parity vector is (1, 1, 0) (corresponding to the cycle
($-5$, $-7$, $-10$) for $d$=1). For $K+L=11$ and $K=7$, there are 30 distinct
parity vectors, the first few of which are (0, 1, 1, 1, 1, 0, 1, 1, 1, 0, 0),
(1, 1, 1, 1, 0, 1, 0, 1, 1, 0, 0), (1, 1, 0, 1, 0, 1, 1, 1, 1, 0, 0), \ldots
(corresponding to cycles for $d=139$). The cycle corresponding to the first
parity vector is not primitive (when $d=139$) and corresponds to the remaining
known $d=1$ cycle of ($-34$, $-17$, $-25$, $-37$, $-55$, $-82$, $-41$, $-61$, $-91$, $-136$,
$-68$). Another example factorization will be given. For ($K+L$, $K$)=(8, 4),
there are 10 distinct parity vectors (corresponding to cycles for $d=175$). The
parity vector (1, 1, 0, 0, 1, 1, 0, 0) corresponds to two duplicated primitive
cycles for $d=7$, the parity vector (0, 1, 0, 1, 0, 1, 0, 1) corresponds to four
duplicated primitive cycles for $d=1$, the parity vector (1, 0, 1, 1, 0, 1, 0, 0)
corresponds to a primitive cycle for $d=25$, the parity vectors (0, 0, 1, 1, 1, 1,
0, 0) and (0, 1, 1, 0, 1, 1, 0, 0) correspond to primitive cycles for $d=35$, and
the parity vectors (1, 0, 0, 1, 1, 1, 0, 0), (0, 1, 0, 1, 1, 1, 0, 0), (1, 0, 1, 0,
1, 1, 0, 0), (1, 1, 0, 1, 0, 1, 0, 0), and (0, 1, 1, 1, 0, 1, 0, 0) correspond to
primitive cycles for $d=175$. Although 5 also divides 175, there are no cycles
for this $d$ value when ($K+L$, $K$)=(8, 4) (the parity vectors have been used up
by the cycles for larger $d$ values and $d=1$).
Apparently, every $d$ value is covered in this mapping. A more appropriate
question to ask than why cycles do not exist for a given $d$ value is why they
do exist and what the expected number of cycles is. For the 1667 $d$ values
less than or equal to 4999, the number of $d$ values having 1, 2, 3, 4, 5, 6,
7, 8, 9, 10, and 11 cycles (counting only one of interrelated cycles) are 535,
607, 310, 115, 57, 22, 9, 5, 4, 2, and 1 respectively. This distribution has
a mean of 1.235. For the 3333 $d$ values less than or equal to 9997, the
number of $d$ values having 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, and
15 cycles (counting only one of interrelated cycles) are 1117, 1193, 591, 228,
104, 48, 25, 11, 5, 5, 2, 2, 0, 1, and 1 respectively. This distribution has
a mean of 1.229. This resembles a Poisson probability distribution where
$\lambda=1$, but the counts for the larger $x$ values are too big.
\section{Associated Cycles}
This raises the question of whether there are other ways for cycles to be
interrelated so that the data would more closely fit a Poisson probability
distribution. There are frequently ($L$, $K$) values of the cycles for a given $d$
value that are multiples of other ($L$, $K$) values. For example, for $d=269$, the
($L$, $K$) values of the cycles are (4, 5), (8, 10), (12, 15), and (16, 20). Also,
there are frequently ($L$, $K$) values of the cycles for a given $d$ value that are
in arithmetic progression (and are not multiples of other ($L$, $K$) values). For
example, for $d=1843$, the sorted ($L$, $K$) values (by increasing $L$ values) of
the cycles are (6, 11), (12, 22), (18, 9), (24, 20), (30, 31), (36, 42), (42, 53),
(48, 64), (54, 51), and (90, 117) ((18, 9), (24, 20), (30, 31), (36, 42), (42, 53),
and (48, 64) are in arithmetic progression with an increment of (6, 11)).
Empirical evidence supports the following
\begin{conjecture}
For a given $d$ value, there are cycles with ($L$, $K$) values that
are in arithmetic progression (excluding arithmetic progressions consisting of
multiples of an ($L$, $K$) value) only
if one of the $|L-K|$ values is relatively
small.
\end{conjecture}
The term ``relatively small'' will be defined in more detail in the next section.
The ($L$, $K$) values of just a pair of cycles are considered to be in arithmetic
progression if the difference between them is the ($L$, $K$) value of another cycle.
Let ($L_{1}$, $K_{1}$), ($L_{2}$, $K_{2}$), ($L_{3}$, $K_{3}$), \ldots, denote the sorted ($L$, $K$) values.
Cycles for a given $d$ value with ($L$, $K$) values that are in arithmetic progression
with an increment of ($L_{1}$, $K_{1}$) (and aren't multiples of ($L_{1}$, $K_{1}$)) will be said
to be associated with each other. (Increments of ($L_{2}$, $K_{2}$), ($L_{3}$, $K_{3}$), etc. will
be considered later.) In the case of $d=1843$, the cycles with ($L$, $K$) values of
(24, 20), (30, 31), (36, 42), (42, 53), and (48, 64) will not be counted when
computing the frequency distribution of cycle counts. For $d=4879$, the sorted
($L$, $K$) values of the cycles are (11, 14), (22, 28), (25, 10), (33, 42), (36, 24),
(44, 56), (58, 52), (66, 84), (69, 66), (77, 98), and (80, 80). Note that the
($L$, $K$) values that are in arithmetic progression need not be contiguous;
(25, 10) and (36, 24) are in arithmetic progression and (58, 52), (69, 66),
and (80, 80) are in arithmetic progression (with an increment of (11, 14)).
In this case, the cycles with ($L$, $K$) values of (36, 24), (69, 66), and (80, 80) will
not be counted. ($2^{K_{1}+L_{1}}-3^{K_{1}}$ is frequently negative, in which case the
corresponding cycles do not appear in Belaga and Mignotte's table.)
Let ($L_{b}$, $K_{b}$) denote the first ($L$, $K$) value in the sorted ($L$, $K$) values
for which the arithmetic progression(s) begins. Of the 3333 $d$ values less than
or equal to 9997, associated cycles occur for 328 $d$ values and $L_{1}>K_{1}$ and
$L_{b}K_{b}$ in all but 20 instances. For 17 of these
$d$ values, there are only two ($L$, $K$) values that are in arithmetic progression
(with an increment of ($L_{1}$, $K_{1}$)). For example, for $d=1679$, the sorted
($L$, $K$) values are (20, 28), (37, 32), (97, 116), and (117, 144) ((97, 116) and
(117, 144) are in arithmetic progression with an increment of (20, 28)).
For the remaining 3 $d$ values, there are two pairs of ($L$, $K$) values each that
are in arithmetic progression. When $d=2305$, the sorted ($L$, $K$) values are
(30, 17), (40, 38), (50, 59), (70, 55), and (80, 76). The pairs of ($L$, $K$) values
that are in arithmetic progression are ((40, 38), (70,55)) and ((50, 59), (80, 76))
(note that 2$\cdot$(40, 38)=(80, 76)). In this case, $30>17$ ((30, 17) equals ($L_{1}$, $K_{1}$))
and $50<59$ ((50, 59) is the ($L$, $K$) value starting the second arithmetic
progression). When $d=5113$, the sorted ($L$, $K$) values are (39, 27), (42, 40),
(45, 53), (48, 66), (81, 67), (84, 80), and (180, 212). The pairs of ($L$, $K$)
values that are in arithmetic progression are ((42, 40), (81, 67)) and ((45, 53),
(84, 80)) (note that 2$\cdot$(42, 40)=(84, 80)). In this case, $39>27$ ((39, 27) equals
($L_{1}$, $K_{1}$)) and $45<53$ ((45, 53) is the ($L$, $K$) value starting the second
arithmetic progression). When $d=6989$, the sorted ($L$, $K$) values are (32, 50),
(52, 55), (72, 60), (84, 105), (104, 110), (124, 115), and (188, 215). The pairs
of ($L$, $K$) values that are in arithmetic progression are ((52, 55), (84, 105))
and ((72, 60), (104, 110)) (note that 2$\cdot$(52, 55)=(104, 110)). In this case,
$32<50$ ((32, 50) equals ($L_{1}$, $K_{1}$)) and $72>60$ ((72, 60) is the ($L$, $K$)
value starting the second arithmetic progression).
Even including the 20 $d$ values where $L_{1}>K_{1}$ and $L_{b}>K_{b}$ or $L_{1}