The Synchronous Structured String-Tree Correspondence (S-SSTC) [2] is a flexible annotation schema that declaratively describes (possibly irregular and non-standard) correspondences between a pair of SSTCs, i.e. between a pair of strings and their respective arbitrary tree representations. The S-SSTC has applications in various NLP tasks, including example-based machine translation, question answering, etc.

An S-SSTC is a general structure, comprising a pair of SSTCs. The ‘synchronised’ correspondences between the two SSTCs are specified on two levels:

- synchronous correspondences between the
*tree nodes*(i.e.*lexical alignments*) are described by SNODE interval correspondences; - synchronous correspondences between the
*subtrees*(i.e.*structural alignments*) are described by STREE interval correspondences.

## Formal Definition

Let $S$ and $T$ be two SSTCs. An S-SSTC is a triple $(S,T,\varphi_{S,T})$ where $\varphi_{S,T}$ is a set of links defining the synchronous correspondences between $S$ and $T$ at different internal levels of the two SSTC structures.

A synchronous correspondence link $\ell \in \varphi_{S,T}$ can be of type $\mathop{\ell}\limits_\text{sn}$ or $\mathop{\ell}\limits_\text{st}$.

### SNODE Correspondences

$\mathop{\ell}\limits_\text{sn}$ records the synchronous correspondences at level of nodes in $S$ and $T$ (i.e. lexical correspondences between specified nodes) and normally $\mathop{\ell}\limits_\text{sn} = (X_1, X_2)$ where $X_1$ and $X_2$ are sequences of SNODE correspondences in $\text{co}$, which may be empty.

- More specifically, $\mathop{\ell}\limits_\text{sn}$ is a pair $( \mathop{\ell}\limits_{\text{sn}_S}, \mathop{\ell}\limits_{\text{sn}_T} )$ where $\mathop{\ell}\limits_{\text{sn}_S}$ is from the first SSTC ($S$) and $\mathop{\ell}\limits_{\text{sn}_T}$ is from the second SSTC ($T$).
- $\mathop{\ell}\limits_\mathit{sn}$ is represented by sets of intervals such that:
- $\mathop{\ell}\limits_{\text{sn}_S} = \lbrace i_1\_j_1 + \ldots + i_k\_j_k + \ldots + i_p\_j_p \rbrace$ where $i_k\_j_k \in X:\text{SNODE}$ correspondence in $\text{co}$ of $S$
- $\mathop{\ell}\limits_{\text{sn}_T} = \lbrace i_1\_j_1 + \ldots + i_k\_j_k + \ldots + i_p\_j_p \rbrace$ where $i_k\_j_k \in X:\text{SNODE}$ correspondence in $\text{co}$ of $T$

### STREE Correspondences

Similarly, $\mathop{\ell}\limits_\text{st}$ records the synchronous correspondences at level of nodes in $S$ and $T$ (i.e. structural correspondences between specified nodes) and normally $\mathop{\ell}\limits_\text{st} = (Y_1, Y_2)$ where $Y_1$ and $Y_2$ are sequences of STREE correspondences in $\text{co}$, which may be empty.

More specifically, $\mathop{\ell}\limits_\text{st}$ is a pair $( \mathop{\ell}\limits_{\text{st}_S}, \mathop{\ell}\limits_{\text{st}_T} )$ where $\mathop{\ell}\limits_{\text{st}_S}$ is from the first SSTC ($S$) and $\mathop{\ell}\limits_{\text{st}_T}$ is from the second SSTC ($T$) as defined below:

- $\mathop{\ell}\limits_{\text{st}_S} = \lbrace i_1\_j_1 + \ldots + i_k\_j_k + \ldots + i_p\_j_p \rbrace$ where $i_k\_j_k \in \text{Y:STREE}$ correspondence in $\text{co}$ of $S$, or $(i_k\_j_k) = (i_k\_j_k) - (i_u\_j_v) \quad|\quad i_u \geq i_k , j_v \geq j_k$; i.e. $( i_u\_j_v) \subseteq(i_k\_j_k)$ which corresponds to an incomplete subtree.
- $\mathop{\ell}\limits_{\text{st}_T} = \lbrace i_1\_j_1 + \ldots + i_k\_j_k + \ldots + i_p\_j_p \rbrace$ where $i_k\_j_k \in \text{Y:STREE}$ correspondence in $\text{co}$ of $T$, or $(i_k\_j_k) = (i_k\_j_k) - (i_u\_j_v) \quad|\quad i_u \geq i_k , j_v \geq j_k$; i.e. $( i_u\_j_v) \subseteq(i_k\_j_k)$ which corresponds to an incomplete subtree.

## Bilingual Translation Example Annotation

The figure below shows an S-SSTC representing an annotated English—Malay translation example.

SNODE correspondences | STREE correspondences |
---|---|

(0_1, 0_1) (1_2+4_5, 1_2) (3_4, 2_3) (2_3, 3_4) |
(0_5, 0_5) (0_1, 0_1) (2_4, 2_4) (3_4, 3_4) |

**Figure 1:** Example S-SSTC

The SNODE correspondence (`0_1`, `0_1)`) states that English ‘He’ corresponds to Malay ‘Dia’, while the SNODE correspondence (`1_2+4_5`, `1_2`) captures the correspondence between the discontiguous ‘picked…up’ and ‘kutip’. The STREE correspondence (`2_4`, `2_4`) captures the correspondence between the English noun phrase ‘the ball’ and Malay noun phrase ‘bola itu’, as well as between the subtrees containing these phrases.

## Handling Non-Standard Phenomena

The S-SSTC annotation framework is flexible and robust in handling non-standard translation phenomena. We describe some example cases, which are drawn from the problem of using synchronous formalisms to define translations between languages (see [1])

### Many-to-One Mapping

Figure 1 illustrates a case where the English sentence has non-standard cases of featurisation, crossed dependency and a many-to-one synchronous correspondence in *“picks up”*. Another case is reordering of words in the phrases, which is clear in the phrase *“the heavy box”* and it corresponding phrase *“kotak berat itu”* in the target.

### Crossed Dependencies

[todo]

### Elimination of Dominance

[todo]

### Inversion of Dominance

[todo]

*COLING 2002 Post-Conference Workshop “Workshop on Machine Translation in Asia”*. Taipei, Taiwan.