C-ORAL-ROM, a spontaneous spoken corpus
The corpus used for the present study is developed within the framework of the European project C-Oral-Rom. The main goal of this project was to build four corpora in four romance languages (Italian, Spanish, French and Portuguese) with similar design features: the same number of words, the same types of communicative situations and the same transcription format.
C-ORAL-ROM is made up of 300000 words covering a wide range of communicative topics, genres and communicative situations: from colloquial interactions to interactions highly influenced by the written modality, such as the mass media or formal contexts. The corpus design is represented in tables 1.1 and 1.2 below.
This variety of modalities and communicative genres provides a great chance to validate this annotation model. The Spanish C-ORAL-ROM corpus is annotated in XML with categorical and semantic information based on eventive semantics.
Add Comment