\documentclass[11pt]{article}
%\usepackage{a4}
\hoffset -0.25in
\advance\textwidth by 0.5in
\newtheorem{thm}{Theorem}
\newtheorem{prop}{Proposition}
\newtheorem{cor}{Corollary}
\newtheorem{lem}{Lemma}
\pagestyle{myheadings}
\markright{\sc the electronic journal of combinatorics 2 (1995), \#R24\hfill}
\thispagestyle{empty}
\def\Ins{{\itshape Insert}}
\def\Del{{\itshape Delete}}
\def\DelMin{{\itshape Delete-Minimum}}
\date{Submitted: February 14 1995; Accepted November 10 1995.}
\author{
M.D. Atkinson\\
S.A. Linton\\
L.A. Walker\\
\\
Department of Mathematical and Computational Sciences\\
University of St. Andrews\\ North Haugh,\\ St. Andrews KY16 9SS,
Scotland\\
\texttt{\{sal,mda,louise\}@dcs.st-and.ac.uk}}
\title{Priority Queues and Multisets}
\begin{document}
\maketitle
\abstract{ A priority queue, a container data structure equipped with the operations
insert and delete-minimum, can re-order its input in various ways,
depending both on the input and on the sequence of operations used. If a
given input $\sigma$ can produce a particular output $\tau$ then
$(\sigma,\tau)$ is said to be an {\em allowable pair}. It is shown that
allowable pairs on a fixed multiset are in one-to-one correspondence with
certain k-way trees and, consequently, the allowable pairs can be
enumerated. Algorithms are presented for determining the number of
allowable pairs with a fixed input component, or with a fixed output
component. Finally, generating functions are used to study the maximum
number of output components with a fixed input component, and a symmetry
result is derived.
}
Mathematical Reviews Subject Classifications: 68R05,05A15,68P05
\section{Introduction}
Abstract data types (ADTs) are a fundamental design tool in modern
software systems. They are setting the direction of programming in
the 1980's and 1990's as firmly as Structured Programming set it in
the 1960's and 1970's. In principle there is an infinity of possible
ADTs since an ADT is defined once its permitted set of operations has
been specified, and there is no restriction on this set except for
common sense. In practice, a small number of ADTs occur over and over
again (stacks, arrays, queues, dictionaries etc.) suggesting that some
ADTs are more ``natural'' than others.
Exactly the same phenomenon occurs in algebra and, significantly, the
natural algebraic systems (groups, rings, modules etc.) have very rich
theories. Many of the commonly occurring ADTs are {\em container}
data types: they are holders for collections of data items and support
an \Ins\ operation and a \Del\ operation (often
restricted in some way).
Normally, a container data type is used in the following way. First it is
initialised as empty. Then some input sequence of data items is inserted
into it one by one (the \Ins\ operation) and these items are removed (the
\Del\ operation) in some order and placed in an output sequence. The
insertions and deletions may be interleaved in any way (provided that the
\Del\ operation is never used on the empty container). Thus, a container
data type is a mechanism for transforming an input sequence into an output
sequence. The functional behaviour of a container data type is essentially
characterised by the relationship between the input
sequences and the output sequences. An understanding of this
relationship allows us to judge the capabilities of a data type and to
assess its potential applications. In general, if $\sigma$ is an
input sequence that gives rise to an output sequence $\tau$ then we
shall say that $(\sigma,\tau)$ is {\em allowable} (with respect to
the data type). The statistics of
the allowability relation associated with a given data type
are measures of the transformational
capability of that data type.
In the case of a queue (where the \Del\ operation always removes the
element which has been in the queue the longest) the allowability
relation is trivial since the output sequence is always the same as
the input sequence; in other words, the only allowable pairs are of the
form $(\sigma,\sigma)$. In the case of stacks (where the \Del\ operation
always removes the element which was placed in the stack most
recently) the relation is more complicated and there are significant
connections with a number of other combinatorial objects such as trees
\cite{Knuth1} 2.3.4.4, ballot sequences \cite{Knuth3} p531, Young
tableaux \cite{Knuth3} pp63,64, and triangulations of polygons
\cite{Cormen} pp320--324.
For dictionaries (which have an unrestricted \Del\ operation) the
output can be any permutation of the input.
The allowability relation has also been considered for various types
of deques. For a general deque (where insertions and deletions are
permitted at both ends) Pratt \cite{Pratt} has characterised the
allowable pairs in terms of forbidden subsequences. For input
restricted deques (where insertion is permitted at one end only) and
output restricted deques, Knuth \cite{Knuth1} 2.2.1 has given
generating functions and recurrence relations for computing the number
of allowable pairs of a given length.
In this paper we shall consider the case of priority queues. Priority
queues are characterised by two main operations, \Ins\ and
\DelMin. Several efficient implementations of them are known
in which $O(\log n)$ operations suffice for each of \Ins\ and
\DelMin, where $n$ is the number of items currently in the
priority queue. A {\em priority queue algorithm} is defined to be a
sequence of \Ins\ and \DelMin\ operations which is well-formed
in the same sense as bracket sequences. In other words there are
equal numbers of each operation and (in order that a \DelMin\ is
only executed when the priority queue is non-empty) the
$t^{\mbox{th}}$ \DelMin\ operation must be preceded by at least
$t$ \Ins\ operations. A priority queue algorithm with $n$ \Ins\ and
$n$ \DelMin\ operations transforms an input sequence of length
$n$ into an output sequence of length $n$. In the terminology
introduced above, the pair $(\sigma,\tau)$ is allowable if and only if
there is a priority queue algorithm which transforms $\sigma$ into
$\tau$.
Our focus here is on the allowability relation for priority queues.
The case of input and output sequences which are permutations of
${1,2,\ldots ,n}$ was considered in
\cite{Atkinson-Thiyagarajah,Atkinson-Beals,Golin-Zaks} whilst binary
sequences were considered in \cite{Atkinson}. As was shown in
\cite{Atkinson-Beals,Golin-Zaks} allowable pairs of permutations are
in one-to-one correspondence with labelled trees for which there is an
extensive enumeration theory \cite{Moon} beginning with Cayley's
famous enumeration formula. Labelled trees are in one-to-one
correspondence with other combinatorial objects; for example, with the
number of ways of expressing an $n$-cycle as a product of $n-1$
transpositions \cite{Cameron} p86. We shall consider the
general case in which the input and output sequences are elements of a
finite multiset $S$. From now on $S$ will denote a fixed multiset
of size $n$ whose elements are $1,2,\ldots,k$ with element $i$ having
multiplicity $a_i$. Our results unify those in
\cite{Atkinson-Thiyagarajah,Atkinson-Beals,Atkinson,Golin-Zaks} and
lead to some combinatorial results which are peculiar to the general
case.
In the next section we give a one-to-one correspondence between the set of
allowable pairs and a set of $k$-way trees. This correspondence
permits us to enumerate the set of allowable pairs. In section 3 we
give polynomial time algorithms for computing the number of outputs
that can arise from a fixed input, and for the number of inputs that
can give rise to a fixed output. Finally, in section~4, we consider the maximal
number of outputs that a given input can generate as a function of
$a_1,\ldots,a_k$. We derive recurrences and a multi-variate generating
function, and conclude with an unexpected symmetry result.
\section{Allowable pairs}
The main theorem in this section
concerns the number of allowable pairs of sequences
in which the elements of the input (and output) sequences are
permutations of the fixed multiset $S$. Although in reality equal
symbols of a multiset cannot be distinguished we shall find it
convenient, as an expositional device only, to pretend that each symbol has a unique
identity. The symbols all equal to some fixed $i$ will be
distinguished from each other by their order of occurrence in the
input sequence $\sigma$. We further suppose, again as an
expositional device only, that occurrences of multiple copies of the same symbol are
deleted from the priority queue in the same order as they are
inserted; so the order of occurrence of equal symbols in the output sequence
$\tau$ is the same as their order in $\sigma$.
We occasionally make this convention explicit by subscripting each
symbol of a given value with the order of its occurrence among the
symbols of this value. For example, the allowable pair of Figure~1
below would be written:
$$(3_1 2_1 3_2 1_1 1_2 2_2 2_3 3_3 1_3 2_4 1_4, 1_1 2_1 3_1 1_2 2_2
3_2 2_3 1_3 3_3 1_4 2_4)$$
These two devices allow us to associate each input symbol with a
unique output symbol. This allows us to track a symbol's progress as
it goes into the priority queue and eventually emerges as a member of
the output sequence. In practice, of course, when there are several
equal minimal elements in the priority queue, any one of them might be
deleted by a \DelMin\ operation. But each of them will produce
the same value in the output sequence, so it does no harm to suppose
that it is the minimal value of longest residence in the priority
queue that is deleted.
\begin{thm}
The number of allowable pairs $(\sigma,\tau)$ where $\sigma$ (and
$\tau$) are permutations of the multiset $S$ is
$$\frac{1}{n+1}\prod_{i=1}^k{{n+1}\choose a_i}$$
\end{thm}
\begin{cor}
The number of allowable pairs $(\sigma,\tau)$ where $\sigma$ ranges
over all the $k^n$ sequences of length $n$ with at most $k$ distinct
members is $$\frac{1}{n+1}{{kn+k}\choose{n}}$$
\end{cor}
\noindent {\bf Proof} of Corollary. The number in question is
$$\sum{\frac{1}{n+1}\prod_{i=1}^{k}{{n+1}\choose{a_i}}}$$
where the summation is over all integer vectors $(a_1,a_2,\ldots,a_k)$ with
non-negative components such that $\sum{a_i} = n$. This is clearly the coefficient of
$x^n$ in
$$\frac{1}{n+1}(1+x)^{n+1}(1+x)^{n+1}\ldots (1+x)^{n+1} = \frac{1}{n+1}(1+x)^{kn+k}$$
from which the result follows.
In the case that all $a_i=1$ and $k=n$, Theorem 1 corresponds to
sequences which are permutations of $\{1,2,\ldots,n\}$ and gives the
main theorem of \cite{Atkinson-Thiyagarajah} while the case
$k=2$ of the Corollary corresponds to binary sequences and gives one
of the results of \cite{Atkinson}.
Theorem 1 is proved by establishing a one-to-one correspondence
between allowable pairs and certain trees. We recall the definition
of a $k$-way tree: a {\em k-way tree} is either empty or it consists
of a root node and a sequence of $k$ $k$-way subtrees. It is common
to represent a $k$-way tree by a diagram in which the root node is
connected to its non-empty subtrees by edges which point in one of $k$
fixed directions.
\setlength{\unitlength}{0.75cm}
\begin{picture}(16.4,8)(3,0)
\thicklines
\put (6,7){\line(-1,-1){3}}
\put (6,7){\line(1,-1){3}}
\put (6,7){\line(0,-1){3}}
\put (6,4){\line(1,-1){1}}
\put (9,4){\line(-1,-1){1}}
\put (7,3){\line(0,-1){1}}
\put (8,3){\line(0,-1){1}}
\put (7,2){\line(-1,-1){1}}
\put (7,2){\line(1,-1){1}}
\put (8,1){\line(-1,-1){1}}
\put (8,1){\line(0,-1){1}}
\put (6,7) {\circle*{0.2}}
\put (3,4) {\circle*{0.2}}
\put (6,4) {\circle*{0.2}}
\put (9,4) {\circle*{0.2}}
\put (7,3) {\circle*{0.2}}
\put (8,3) {\circle*{0.2}}
\put (7,2) {\circle*{0.2}}
\put (8,2) {\circle*{0.2}}
\put (6,1) {\circle*{0.2}}
\put (8,1) {\circle*{0.2}}
\put (7,0) {\circle*{0.2}}
\put (8,0) {\circle*{0.2}}
\put (13,7){\line(-1,-1){3}}
\put (13,7){\line(1,-1){3}}
\put (13,7){\line(0,-1){3}}
\put (13,4){\line(0,-1){2}}
\put (16,4){\line(0,-1){2}}
\put (13,2){\line(-1,-1){1}}
\put (13,2){\line(1,-1){2}}
\put (14,1){\line(-1,-1){1}}
\put (13,7) {\circle*{0.2}}
\put (10,4) {\circle*{0.2}}
\put (13,4) {\circle*{0.2}}
\put (16,4) {\circle*{0.2}}
\put (13,3) {\circle*{0.2}}
\put (16,3) {\circle*{0.2}}
\put (13,2) {\circle*{0.2}}
\put (16,2) {\circle*{0.2}}
\put (12,1) {\circle*{0.2}}
\put (14,1) {\circle*{0.2}}
\put (13,0) {\circle*{0.2}}
\put (15,0) {\circle*{0.2}}
\put (13,7.2){\makebox(0,0)[b]{$\infty$}}
\put (9.8,4){\makebox(0,0)[r]{$1$}}
\put (16.2,4){\makebox(0,0)[l]{$3$}}
\put (13.2,4){\makebox(0,0)[l]{$2$}}
\put (13.2,3){\makebox(0,0)[l]{$3$}}
\put (13.2,2){\makebox(0,0)[lb]{$2$}}
\put (16.2,3){\makebox(0,0)[l]{$1$}}
\put (16.2,2){\makebox(0,0)[l]{$2$}}
\put (11.8,1){\makebox(0,0)[r]{$1$}}
\put (14.2,1){\makebox(0,0)[lb]{$3$}}
\put (13,-0.2){\makebox(0,0)[t]{$1$}}
\put (15,-0.2){\makebox(0,0)[t]{$2$}}
\end{picture}
\setlength{\unitlength}{0.85cm}
\nopagebreak
\begin{center}
$(3_1 2_1 3_2 1_1 1_2 2_2 2_3 3_2 1_3 2_4 1_4,
1_1 2_1 3_1 1_2 2_3 3_3 2_3 1_3 3_3 1_4 2_4)$\nobreak\\[1ex]
Figure 1: a 3-way tree, its corresponding unambiguous tree and
allowable pair
\end{center}
We also define an {\em unambiguous} tree. Such a tree is rooted,
unordered, and labelled; the root is labelled $\infty$ and the
non-root nodes are labelled with the elements of a multiset of
integers subject to the condition that the children of every node
have distinct labels.
There is an obvious one-to-one correspondence between $k$-way trees
and unambiguous trees on multisets on $\{1,2,\ldots,k\}$. Given the
diagram of a k-way tree we can label the root node $\infty$, and label
the lower node of any edge in the $i^{\mbox {th}}$ direction with the
label $i$; this gives an unambiguous tree. Conversely, from any
unambiguous tree we can use the labels to direct the edges in the
appropriate directions thereby obtaining a $k$-way tree. In this
correspondence the number of labels equal to $i$ in an unambiguous
tree is equal to the number of edges pointing in the $i^{\mbox {th}}$ direction
in the associated $k$-way tree. Figure 1 depicts a 3-way tree and
corresponding unambiguous tree. It also shows the allowable pair
associated with the unambiguous tree by the one-to-one correspondence
of Theorem 2.
Our first lemma gives a decomposition of an allowable pair which we
use as an inductive tool throughout this section and the next. In this
lemma we introduce the practice, continued in the proof of Theorem 2,
of prefacing input sequences by the symbol $\infty$. There are two
reasons for this. The first is that we often need to refer to the
symbol that immediately precedes one of the symbols $k$ in $\sigma$;
and if $k$ happens to be the first symbol of $\sigma$, we can, by use
of the $\infty$ symbol, handle this in a uniform way. The second
reason has to do with Theorem 2, where we associate unambiguous trees
on $n+1$ nodes with allowable pairs $(\sigma,\tau)$ of length $n$; we
can then find a labeling of the tree nodes with the symbols in $\infty\sigma$.
\begin{lem}\label{lemmaA} Let $\sigma,\tau$ be sequences of length $n$ with $a_i$
occurrences of $i$ (for $i=1,2,\ldots,k$). Suppose that $\tau$ is
expressed as $\tau=\tau_0 k \tau_1\ldots k \tau_{a_k}$ where none of
the $\tau_i$ contain $k$. The pair $(\sigma,\tau)$ is allowable if
and only if there exist subsequences
$\sigma_0,\sigma_1,\ldots,\sigma_{a_k}$ of $\sigma$ and elements
$x_1,\ldots,x_{a_k}$ of $\infty\sigma$ such that
\begingroup
\def\theenumi{\roman{enumi}}
\def\labelenumi{(\theenumi)}
\begin{enumerate}
\item for each $0 \leq i \leq a_k$, $(\sigma_i,\tau_i)$ is allowable,
\item $\sigma$ with all occurrences of $k$ removed is the sequence
$\sigma_0\sigma_1\ldots \sigma_{a_k}$,
\item for each $1 \leq j \leq a_k$, $x_j$ is immediately before the
$j^{\mbox {th}}$ $k$ in $\infty\sigma$ and the position of $x_j$ in
$\infty\sigma$ is strictly to the left of the position of the $j^{\mbox
{th}}$ $k$ in $\infty\tau$
\end{enumerate}
\endgroup
\end{lem}
\noindent{\bf Proof} First assume that $(\sigma,\tau)$ is allowable and
let $\cal G$ be a priority queue algorithm which transforms $\sigma$
into $\tau$. We recall our convention that occurrences of multiple
copies of the same symbol are deleted from the priority queue by
$\cal G$ in the same order as they are inserted.
For each $j=1,2,\ldots,a_k$, the $j^{\mbox {th}}$ $k$ must be deleted
by $\cal G$ after all elements appearing before it in $\sigma$ have
been deleted (because those elements are all less than $k$ or equal
to and more senior than this $k$). Therefore, defining $x_j$ to be
the symbol of $\infty\sigma$ immediately before the $j^{\mbox {th}}$
$k$, the position of $x_j$ in $\infty\sigma$ is strictly to the left
of the position of the $j^{\mbox {th}}$ $k$ in $\infty\tau$; this
gives condition (iii).
Define, for every $i=0,\dots,a_k$, the sequence $\sigma_i$ to be the
subsequence of $\sigma$ which is mapped to $\tau_i$ under the action
of $\cal G$. Since $(\sigma,\tau)$ is allowable, $(\sigma_i,\tau_i)$
is also allowable for all $0 \leq i \leq a_k$ and so (i) holds.
To show (ii), it is enough to prove that for any
$i=0,\ldots,a_{k-1}$, if $x$ appears in $\tau_i$ and $y$ appears in
$\tau_{i+1}$, then $x$ precedes $y$ in $\sigma$. Assume otherwise
and argue for a contradiction. Since $y$ precedes $x$ in $\sigma$
and succeeds $x$ in $\tau$, it follows that $y>x$. However the
$i^{\mbox {th}}$ $k$ in $\tau$ appears before $y$ in $\tau$ and so
this symbol $k$ appears before $y$ in $\sigma$. Therefore when $x$
is deleted from the priority queue by $\cal G$, $y$ and $k$ must be
present in the priority queue; it follows that $y$ will be deleted
before $k$ since $y0$ (if not, we replace $k$ by $k-1$) we may put $\sigma= \alpha k \beta$
where $\beta= b_1 b_2\ldots b_r$ contains no element equal to $k$. Then we
note that
$$t(\alpha k\beta) = \left\{\begin{array}{ll}t(\alpha) & \hbox{\rm if }r = 0\\
t(\alpha)t(\beta)+t(\alpha b_1k b_2\ldots b_r)&\hbox{\rm otherwise}\end{array}\right.$$
%\begin{eqnarray*}
%\lefteqn{t(\alpha k \beta)= } & &\\ &&\left\{\begin{array}{ll}
%\displaystyle t(\alpha)\hbox{ if }r= 0\\ \displaystyle t(\alpha)t(\beta)+t(\alpha b_1 k b_2\ldots
%b_r)\hbox{ otherwise} \end{array}\right..\end{eqnarray*}
\noindent (the term $t(\alpha)t(\beta)$ enumerates those outputs which
arise from deleting the element $k$ from the priority queue before
inserting the first symbol of $\beta$ (at that point the priority
queue will be empty) and the term $t(\alpha b_1 k b_2 \dots b_r)$
enumerates the outputs which arise from inserting $b_1$ before the
element $k$ is deleted).
As in \cite{Atkinson-Beals} these equations can be made the basis of a
dynamic programming algorithm with execution time $O(n^4)$.
The quantity $s(\tau)$ cannot be calculated by generalising the
permutation algorithm of \cite{Atkinson-Beals} (that algorithm uses
the distinctness of the elements in an essential way). We can,
however, use Lemma \ref{lemmaA}. In the notation of that lemma we have
$$s(\tau)= P\prod_{i= 1}^{k}s(\tau_i)$$ where $P$ is the number of
ways in which the $a_k$ occurrences of $k$ may be validly positioned
within $\sigma_0 \sigma_1 \ldots \sigma_{a_k}$. A valid positioning
is, by Lemma \ref{lemmaA}, one where each $k$ occurs no later than the
corresponding $k$ in $\tau$.
In order to calculate $P$, let $r= a_k$ and let $n_1