The programme for the Theoretical and Methodological Seminar has been published. It will take place in room P104 in the main building on Wednesday from 14:10 to 15:40. The seminar will also be broadcast online.
The programme for the Theoretical and Methodological Seminar has been published. It will take place in room P104 in the main building on Wednesday from 14:10 to 15:40. The seminar will also be broadcast online.
Release 14 of the SYN corpus of contemporary written Czech was published. As a result of inclusion of journalistic texts from 2024, its size has reached almost 5.5G words. New in version 14 is a completely reworked annotation of multi-word units.
We have released SYN2025, a new synchronic representative corpus. It is the sixth continuation of the series of reference corpora of printed Czech, with its design and annotation (including syntax) corresponding in particular to the SYN2020 corpus.
We released a new version of the hosted EEBO corpus. Compared to its predecessor, EEBO v2 is almost twice as large and brings also linguistic annotation (regularisation, lemmatisation and POS tagging) that makes its use much easier.
We are proud to announce that the CLARIN K-centre certificate for the Czech National Corpus has been renewed. The centre is specialized in corpus linguistics with the emphasis on empirical research of Czech.
Our colleagues from the Slovak National Corpus JÚĽŠ SAV in Bratislava released a brand new version of the CNC Mapka application adapted for working with Slovak dialects. Congratulations for their achievement!
InterCorp release 16ud was published. Text-wise the same as release 16, but with the UD annotation that is comparable across languages and also includes syntax. In addition, release 16ud features the metrics of syntactic complexity and lexical diversity.
In cooperation with ICL we updated a new corpus of contemporary Czech poetry (KSP). Compared to the previous version, a number of printed collections have been added, poems from web servers better filtered, and the corpus structure simplified.
If you need to find out how things look like in corpus data, you no longer need to look for a suitable application and learn to write a CQL query. You can simply ask the “Corpus Linguist” model in ChatGPT that will ask CNC for you.
On Monday, September 9, 2024, it is exactly 30 years since the Institute of the Czech National Corpus was founded at the Faculty of Arts. We have prepared several new corpora for the anniversary, so you really have a lot to look forward to!