Projects.

Corpora built at bmanuel.org.

CT.

"Corpus Taurinense".

Corpus Taurinense is a corpus of 13th Century Florentine texts, POS-tagged accordingly an expressly devised EAGLES tagset. Besides the homepage, the Version 1.8 of the corpus is fully available.

NUNC.

"NewsgroupsUseNet Corpora".

NUNC is a multilingual (It. De. Fr. En. Es. Ma. Su. Ee. Pt.) suite of corpora based on the language of newsgroups, freely available and querable online. Devised by Manuel Barbera, NUNC was born in 2002, and is currently under developement by A. Allora, M. Barbera, S. Colombo, E. Corino, C. Marello, S. Casavecchia, C. Onesti, M. Tomatis, L. Valle and others. There are already some betas available for testing (Italian, UK English, French and Spanish).

Jus Jurium.

Jus Jurium (viz. 'minestrone of Laws') is a free Italian Corpus covering the full Legal universe of discourse current in contemporary Italy, POS-tagged and with textual and diplomatical markup. Devised by Manuel Barbera, soon joined by Cristina Onesti and Elisa Corino, the Corpus Juris was born in february 2005. Besides the homepage, full documentation a first beta of the corpus will be available soon.

Athenaeum.

Athenaeum is a free corpus built up with texts produced by Turin University, POS-tagged and classified by topics and text gender. Athenaeum was born in 2004 to celebrate Turin University 6th centenary. Besides the homepage, a first version of the corpus is already available.

VALICO.

"Varietà di Apprendimento della Lingua Italiana: Corpus Online".

VALICO is an Italian international Learner Corpus freely available and querable online. Devised by Manuel Barbera and Carla Marello, soon joined by Elisa Corino, VALICO was born on the 17th of June 2003. There are also available, besides the homepage, the Guidelines, a fully functional beta, and some exercise expecially devised for Valico and Vinca.

VINCA.

"Varietà di Italiano di Nativi Corpus Appaiato".

VINCA is a Corpus of Native Written Italian freely available and querable online. Devised by Manuel Barbera and Carla Marello, soon joined by Elisa Corino, VINCA was born in 2004 as paired corpus for VALICO. There are by now available, besides the homepage, the Guidelines and some exercise expecially devised for Valico and Vinca.

Internal Documentation.

Some useful (but unstable! it's always under developement) documentation, intended primarily for internal use: Athenaeum header template and markup file, CT specification, NUNC header template and markup file, Valico header template, Vinca header template, and the FIRB macro-header template. Beware that all this TXT stuff may sometimes look scrambled when viewed through some web browsers, but will be just fine when downloaded on your client.

Additional Infos.

All the corpora are (and will remain) freely available online: you are legally entitled to use them, and it's enough that you recognise whence your data came from. All corpora are encoded in CQP format and are accessed through the Corpus Query Workbench of IMS Stuttgart (there is a PS, PDF or HTML manual online at the IMS site).

***HTML code & design by Manuel Barbera***