Chapter-04 -- Indexed grammars and languages

The Wayback Machine - https://web.archive.org/web/20070311042935/http://www.cogs.susx.ac.uk:80/research/nlp/gazdar/nlp-in-prolog/ch04/chapter-04-sh-1.6.3.html

Indexed grammars and languages

If, in addition to a finite set of attribute value pairs, categories can also employ a single recursive feature called STACK, say, and if rules are able to assign to, test and augment this feature, then the resulting expressive power is that of a class of grammars known as the indexed grammars.

For example, suppose we allow categories X that can be described in the following way:







<X cat> = S,

<X stack top> = a,

<X stack stack top> = a,

<X stack stack stack top> = z.

and had a grammar with rules such as:







    Rule {push endmarker on to stack}

        S  ->  a  A:

            <A stack top> = z

            <A stack stack> = <S stack>.

    Rule {push 'a' on to stack}

        A0  ->  a  A1:

            <A1 cat> = A

            <A2 cat> = A

            <A1 stack top> = a

            <A1 stack stack> = <A0 stack>.

    Rule {copy stack from A to B}

        A   ->  B:

            <A stack> = <B stack>.

    Rule {pop 'a' off stack}

        B0  ->  b  B1 c:

            <B0 cat> = B

            <B1 cat> = B

            <B0 stack top> = a

            <B0 stack stack> = <B1 stack>.

    Rule {pop endmarker off stack}

        B   ->  b c:

            <B stack top> = z.

Then we have a grammar for the language a\un\db\un\dc\un\d, a language that is not a CFL. How have we managed to describe a non-context-free language in the same PATR notation that produced CF-PSGs like Grammar1 and Grammar4? By introducing the recursive feature STACK, we have moved to a situation where there are infinitely many categories - a situation not permitted in CF-PSGs. Thus, for instance, for every n there is a category X that we can describe by:







<X stack\un\d top> = a

and no two such categories are the same.

The languages describable

by indexed grammars - namely, the indexed languages - are a natural class of formal languages which form a proper superset of the CFLs and a proper subset of the context-sensitive or type 1 languages. They are of interest to NLP researchers because no phenomena are yet known which would lead them to believe that the natural languages do not form a subset of the indexed languages.

In particular, it is clear that indexed grammars are available for the Swiss German facts and for most other sets of facts that have been even conjectured to hold problems for context-free description.

The indexed languages thus provide us, at least for the moment, with a kind of upper bound for natural language syntax.

We can no longer be surprised by non-CFL patterns, although their rarity is a matter of some interest, but we should be very surprised at, and duly suspicious of, putatively non-indexed language phenomena.

Send us a comment.

Feb	MAR	Apr
	11
2006	2007	2008