Published: 17th August 2011DOI:
10.4204/EPTCS.63ISSN: 2075-2180
| ELibM mirror of EPTCS 63 |

Words 2011

Prague, Czech Republic, 12-16th September 2011

This volume contains proceedings of the Eighth International Conference WORDS 2011. It was held in Prague, Czech Republic, from 12th to 16th September 2011 as a joint undertaking of the Czech Technical University and the Charles University.

The conference is a biannual meeting devoted to the mathematical theory of words, i.e., finite or infinite sequences of symbols over a finite alphabet. Since the first meeting in 1997 in Rouen, France, WORDS became the main event in the field. Words arise as a natural object in many areas and therefore the conference is also open to applications, be it in computer science, biology, linguistics or physics, while the emphasis remains on combinatorial, algebraic and algorithmic points of view.

Program committee selected 27 contributions out of 40 submissions to be presented at the conference. Each paper was reviewed by two referees. The reader will find here revised versions of the accepted contributions as well as summaries of six invited lectures.

We thank authors for their interest in WORDS, program committee members and other reviewers (listed below) for their effort, and the EPTCS editors, in particular Wolfgang Thomas and the Editor-in-Chief Rob van Glabbeek, for accepting the publication of the proceedings and for their helpful approach.

The event was financially supported by the Faculty of Nuclear Sciences and Physical Engineering of the Czech Technical University, by the Faculty of Mathematics and Physics of the Charles University, by the Doppler Institute for Mathematical Physics and Applied Mathematics, grant LC06002, and by the Czech Science Foundation, grant GAČR 201/09/0584.

Julien Cassaigne, Marseille

James Currie, Winnipeg

Clelia de Felice, Salerno

Christiane Frougny, Paris

Štěpán Holub, Prague (co-chair)

Juhani Karhumäki, Turku

Jean Neraud, Rouen

Dirk Nowotka, Stuttgart

Edita Pelantová, Prague (co-chair)

Laurent Vuillon, Le Bourget-du-Lac

Lubomíra Balková

Daniel Dombek

Štěpán Holub

Karel Klouda

Milan Krbálek

Zuzana Masáková

Edita Pelantová

Štěpán Starosta

There are two attributes by which the Prague Linguistic School is generally characterized: 'structural' and 'functional'. While 'structural' is a common denominator of several linguistic trends that originated in the first decades of the 20th century following Ferdinard de Saussure's pioneering linguistic approach, the term 'functional' was used by de Saussure only quite occasionally and it is supposed to be a distinctive feature of the writings of the Prague scholars, who at the same time as they recognized the necessity to describe and explain the collection of language phenomena as a structured whole rather than as a mechanical agglomeration, they emphasized that this structured whole — language — should be understood as a functioning means of communication.

In our contribution, we would like to characterize the Prague functionalism by pointing out some basic features of this approach as they are reflected in the Prague School writings both from the pre-war period as well as in the analyses of those linguists belonging to what may be called "second generation" of the Prague School. We focus on three issues which we consider to be of a more general interest:

**(i) The notion of function in limguistic studies:**
The difficulty of an adequate characterization of the use of the term function and its derivatives
consists in the fact that the very term function is homonymous and its use varies also in different
scientific domains (Nagel 1961 distinguishes six various meanings of the term function in modern science).
The ambiguity of this term can be demonstrated also on the different understanding of function by the Prague
School. It was mainly Roman Jakobson who treated the concept of function in linguistics in the general
theoretical framework of finalism or teleology. In his understanding, the claim "A phenomenon *x* is a means
for the realization of an end *F*" is equivalent to the claim "A phenomenon *x* has a function *f*",
that is to say that to have a function *f* is equivalent to "to serve as a means for the end (purpose) *F*".
The teleological approach was reflected by Jakobson in his principle of 'therapeutic changes' in the
development of language: language system always tends to a certain balance and the distortion of
this balance initiates changes — which removes this insufficiency — but evokes imbalance in some
other parts of the system and the process of therapeutic changes continues ad infinitum.

**(ii) The articulation of the form — function relation into levels:**
The need for a systematic and integrated description of the relation of functions and forms has led
to conceive the core of language system as consisting of levels: the units of the lower level may
be looked upon as representations of the units levels can be understood as representations (forms)
of the units of the adjacent higher levels, which in turn are viewed as functions of the adjacent
lower levels units. From the methodological standpoint, there are two possible ways how to proceed:
either to adopt the speaker's point of view and to proceed from function to form; i.e., from common needs
of communication, or to take an opposite point of view and to proceed from form to function.
A far-reaching significance for the understanding of the relations between units of adjacent levels
is the notion of asymmetrical dualism introduced by S. Karcevskij (1929). The main idea consists in
the recognition that the sign and its meaning (or rather function; Karcevskij uses the French
term signification) do not cover in all their points the same field: the same sign has several functions
(e.g. the morpheme -a in Czech may have the function of Nominative singular of feminine nouns,
as in \v{z}en-a, or Nominative plural of neuter nouns as in m\v{e}st-a, and so on), and the same function
can be expressed by several signs (e.g. the function of Nominative singular of feminine nouns can
be expressed by the morpheme -a, or -e, or zero). There is always a certain tension between signifiant
and signifi\'e and the asymmetrical dualism of the structure of the sign makes it possible for language to develop.

**(iii) The communicative role of language:**
Language is regarded as a functioning system, adapted to its communicative role, diversified
in more or less different social and local varieties, open to possible "variation", to change;
this is reflected in the system itself: the system has a stable (solid central) core and peripheral domains
(irregularities, obsolete or recent marginal phenomena) need not be in complete accordance with the laws and
tendencies governing the central core. One of the important contributions of Prague scholars is their
systematic attention paid to the information structure of sentences, i.e., in brief, to the distinction
between the topic of the sentence("what is the sentence 'about'") and its focus ("what the sentence says about the topic").
Praguian scholars were the first to emphasize that this distinction has a semantic relevance.

In the full version of the paper, the above mentioned aspects will be discussed both from the historical perspective and the present day tenets of modern Prague theoretical linguistics and they will be illustrated by rich empirical material.

A *pattern* is a finite word of variables. A word *w* *avoids*
pattern *P* if for any substitution *φ* of the variables of *P* with
non-empty words, *φ(P)* is not a factor of *w*.
Let Σ_{i} denote the *i*-letter alphabet *{0,1,…,i−1}*.
The *avoidability index* *λ(P)* of *P* is the smallest integer *k* such that
there exists an infinite word *w* over *Σ _{k}* avoiding

An infinite word *w* is said to be HDOL if it is the image by a morphism *h*
of the fixed point of a morphism *m*, that is, *w=h(m ^{ω}(0))*.
Cassaigne conjectures that for every avoidable pattern

Lower bounds on *λ(P)* are quite easy to obtain by backtracking,
and Cassaigne's conjecture is supported by the fact that most of the proofs
of upper bounds consist in finding a suitable HDOL word over *Σ _{λ(P)}*
and showing that it avoids

For a given pattern, there can exist many HDOL words. For example in the case of squares,
we have that *λ(AA)=3* and the fixed point *m ^{ω}(0)*
of any ternary square-free morphism

Thue proved that ternary square-free words avoiding *010* and *212*
that are bi-prolongable are exactly the factors of the fixed point *t*
of *0→012*, *1→02*, *2→1*. More recently, we proved that
the only infinite binary word avoiding *AABBCABBA* and the factors *0011* and *1100*
is essentially the image of *t* by the morphism

0→ | 0010110111011101001, |

1→ | 00101101101001, |

2→ | 00010. |

In this talk, we consider the possibility that for every avoidable pattern *P*, there exists a finite set *S*
of forbidden patterns and factors, containg *P*, such that the words over *Σ _{λ(P)}* are essentially the factors of an HDOL word.
This is a strong version of Cassaigne's conjecture. I will give many examples of such HDOL words characterized by forbidden patterns and factors,
as well as related open problems. We will also discuss the factor complexity of words avoiding patterns.

In this paper, we study an algebraic and combinatorial problem concerning bounded context-free languages.
A strictly related notion is that of sparse language: a language *L* is termed sparse if its counting function,
that is, the function *f _{L}* that maps every integer

The starting point of this investigation is the following result proved in [3]:
for every sparse context-free language *L _{1}*,
there exists a regular language

A conceptual limit of the above mentioned construction is that the regular language *L _{2}*
is on a different alphabet from the one of

Now the general problem above can be formulated as follows:
given a bounded context-free language *L _{1}*, does there exist a regular language

In the sequel, for short, we refer to it as CE (Commutative Equivalence) Problem. The CE Problem was conjectured to have an affirmative solution by Stefano Varricchio and told to the authors during their past collaboration. In this paper, we prove the conjecture. More precisely, we prove the following statement.

**Theorem 1**
*Every bounded context-free language L_{1} is commutatively equivalent to a regular language
L_{2}. Moreover the language L_{2} can be effectively constructed starting
from an effective presentation of L_{1}.*

Observe that the CE problem can be seen as a kind of counting problem in the class of bounded context-free languages.
Despite the fact that such class has been widely investigated in the past, CE problem
appears to require some new techniques. Indeed, bounded context-free languages can be inherently ambiguous;
in addition, if *u _{1}^{*}⋯ u_{n}^{*}, u_{i}∈ A^{+}*,
is the set that contains the bounded context-free language, then, in general,

Now we would like to give a broader picture about the relationships between the main result of this paper
and some classical theorems on bounded context-free
languages. The first result that deserves to be mentioned is a well-known theorem by Parikh
[16].
For this purpose, let us first introduce a notion. Given two languages *L _{1}*
and

It is worth noticing that the property of Parikh equivalence by no means implies the property of commutative equivalence.
Indeed, let *A = {a, b}* and let *L _{1} = (ab)^{*} ∪ (ba)^{*}* and

Another theorem that is central in this context has been proved by Ginsburg (see [8], Theorem 5.4.2).
For this purpose, let us first introduce a notion.
Let *L⊆ u _{1}^{*}⋯ u_{n}^{*}* be a bounded language
where, for every

Due to the space limitation, we are not able to report our proof of Theorem 1.
Therefore, we limit ourselves to give an idea of the main techniques involved in the proof.
A basic tool that has been used to study our problem is the celebrated theorem by Eilenberg and
Schützenberger that provides an algebraic characterization of semi-linear sets in terms of semi-simple
sets ([6,14). The proof is essentially divided in two steps. The first deals with the case where
the bounded context-free language is described, via the Ginsburg map *φ*, by a linear set.
In this case, the proof makes use of a technique of combinatorics on words concerning variable-length codes and of
a new technique (inspired to our work [5]) of geometrical nature for the decomposition of linear sets.
The second step deals with the general case, that is, when the language is described, via the Ginsburg map *φ*,
by a semi-linear set. In this step, together with the mathematical tools mentioned above, some arguments of
elementary number theoretical nature are used to get our main theorem (see e.g. [9]).

[2] N. Chomsky, and M.-P. Schützenberger (1963): The Algebraic Theory of Context-free Languages, in P. Braffort and D. Hirschberg (eds.), "Computer Programming and Formal Systems", pp. 118–161, North Holland Publishing Company, Amsterdam, doi:10.1016/S0049-237X(08)72023-8.

[3] F. D'Alessandro, B. Intrigila, and S. Varricchio (2006): On the structure of the counting function of context-free languages,

[4] F. D'Alessandro, B. Intrigila, and S. Varricchio (2009): The Parikh counting functions of sparse context-free languages are quasi-polynomials,

[5] F. D'Alessandro, B. Intrigila, and S. Varricchio (2009): On some counting problems for semi-linear sets,

[6] S. Eilenberg, and M.-P. Schützenberger (1969): Rational sets in commutative monoids,

[7] P. Flajolet (1987): Analytic models and ambiguity of context-free languages,

[8] S. Ginsburg (1966):

[9] G. H. Hardy, and E. M. Wright (1979):

[10] J. Honkala (1998): Decision problems concerning thinness and slenderness of formal languages,

[11] O. Ibarra, and B. Ravikumar (1986): On sparseness, ambiguity and other decision problems for acceptors and transducers,

[12] L. Ilie, G. Rozenberg, and A. Salomaa (2000): A characterization of poly-slender context-free languages,

[13] R. Incitti (2001): The growth function of context-free languages,

[14] R. Ito, Every semi-linear set is a finite union of disjoint linear sets (1969):

[15] M. Latteux and G. Thierrin (1984): On bounded context-free languages,

[16] R. J. Parikh (1966): On context-free languages,

[17] D. Raz (1997): Length considerations in context-free languages,