\documentclass[12pt,reqno]{article}
\usepackage[usenames]{color}
\usepackage{amssymb}
\usepackage{graphicx}
\usepackage{amscd}
\usepackage[colorlinks=true,
linkcolor=webgreen,
filecolor=webbrown,
citecolor=webgreen]{hyperref}
\definecolor{webgreen}{rgb}{0,.5,0}
\definecolor{webbrown}{rgb}{.6,0,0}
\usepackage{color}
\usepackage{fullpage}
\usepackage{float}
\usepackage{psfig}
\usepackage{graphics,amsmath,amssymb}
\usepackage{amsthm}
\usepackage{amsfonts}
\usepackage{latexsym}
\usepackage{epsf}
\setlength{\textwidth}{6.5in}
\setlength{\oddsidemargin}{.1in}
\setlength{\evensidemargin}{.1in}
\setlength{\topmargin}{-.5in}
\setlength{\textheight}{8.9in}
\newcommand{\seqnum}[1]{\href{http://oeis.org/#1}{\underline{#1}}}
\DeclareMathOperator{\sgn}{sgn}
\begin{document}
\begin{center}
\epsfxsize=4in
\leavevmode\epsffile{logo129.eps}
\end{center}
\theoremstyle{plain}
\newtheorem{theorem}{Theorem}
\newtheorem{corollary}[theorem]{Corollary}
\newtheorem{lemma}[theorem]{Lemma}
\newtheorem{proposition}[theorem]{Proposition}
\theoremstyle{definition}
\newtheorem{definition}[theorem]{Definition}
\newtheorem{example}[theorem]{Example}
\newtheorem{conjecture}[theorem]{Conjecture}
\theoremstyle{remark}
\newtheorem{remark}[theorem]{Remark}
\begin{center}
\vskip 1cm{\LARGE\bf
Clustering Words and Interval Exchanges
}
\vskip 1cm
\large
S\'ebastien Ferenczi \\
IMPA\\
CNRS --- UMI 2924\\
Estrada Dona Castorina 110 \\
Rio de Janeiro, RJ 22460-32\\
Brazil \\
\href{mailto:ferenczi_sebastien@yahoo.fr}{\tt ferenczi\_sebastien@yahoo.fr}\\
\ \\
Luca Q. Zamboni \\
Institut Camille Jordan\\
Universit\'e Claude Bernard Lyon 1\\
43 boulevard du 11 novembre 1918\\
F-69622 Villeurbanne Cedex \\
France \\
and \\
Department of Mathematics \\
and Turku Centre for Computer Science \\
University of Turku \\
20014 Turku \\
Finland \\
\href{mailto:zamboni@math.univ-lyon1.fr}{\tt zamboni@math.univ-lyon1.fr}
\end{center}
\vskip .2 in
\newcommand{\GN}{{\mathbb N}}
\newcommand{\GR}{{\mathbb R}}
\newcommand{\GZ}{{\mathbb Z}}
\newcommand{\GT}{{\mathbb T}}
\def \ind{_{n \in {\mbox{\rm {\scriptsize I$\!$N}}}}}
\newcommand{\ab}{|}
\begin{abstract}
We characterize words which cluster under the Burrows-Wheeler transform as those words $w$ such that $ww$ occurs in a trajectory of an interval exchange transformation, and build examples of clustering words.
\end{abstract}
\section{Introduction}
In 1994 Michael Burrows and David Wheeler \cite{bw} introduced a
transformation on words which proved very powerful in data
compression. The aim of the present note is to characterize those
words which cluster under the Burrows-Wheeler transform, that is to
say, those words that are transformed into expressions such as
$4^a3^b2^c1^d$ or $2^a5^b3^c1^d4^e.$ Clustering words on a binary
alphabet have already been extensively studied (see, for instance,
\cite{jz,rr4}) and identified as particular factors of the Sturmian
words. Some generalizations and partial characterizations to $r$
letters appear in Restivo and Rosone \cite{rr3}, but it had not yet
been observed that clustering words are intrinsically related to a
dynamical object called {\em interval exchange transformations}
introduced in Oseledets \cite{ose}: we shall define them in Definitions
\ref{d1} and \ref{d2} below, and refer the reader to \cite{v} which
constitutes a classical course on general interval exchange
transformations and contains many of the technical terms found in
Section \ref{s3} below. This link comes essentially from the fact
that the array of conjugates used to define the Burrows-Wheeler
transform gives rise to a {\it discrete} interval exchange
transformation sending its first column to its last column. It turns
out that the converse is also true: interval exchange transformations
generate clustering words. Indeed we prove that clustering words are
exactly those words $w$ such that $ww$ occurs in a trajectory of an
interval exchange transformation. On a binary letter alphabet, this
condition amounts to saying that $ww$ is a factor of an infinite
Sturmian word. We end the paper by some examples and questions on how
to generate clustering words.
This paper began during a workshop on board Via Rail Canada train
number 2. We are grateful to Laboratoire International
Franco-Qu\'eb\'ecois de Recherche en Combinatoire (LIRCO) for funding
and Via for providing optimal working conditions. The second author is
partially supported by a FiDiPro grant from the Academy of Finland.
The authors owe much to Jean-Paul Allouche, the first mathematician who
revealed to both of them the beauty of combinatorics on words, and
who taught the first author that not every paper needs to begin with ``Let
$(X,T,\mu)$ be a system ...''.
\section{Definitions}
Let $A=\{a_1