DeepPavlov/rubert-base-cased-conversational
rubert-base-cased-conversational
Conversational RuBERT (Russian, cased, 12?layer, 768?hidden, 12?heads, 180M parameters) was trained on OpenSubtitles[1], Dirty, Pikabu, and a Social Media segment of Taiga corpus[2]. We assembled a new vocabulary for Conversational RuBERT model on this data and initialized the model with RuBERT.
08.11.2021: upload model with MLM and NSP heads
[1]: P. Lison and J. Tiedemann, 2016, OpenSubtitles2016: Extracting Large Parallel Corpora from Movie and TV Subtitles. In Proceedings of the 10th International Conference on Language Resources and Evaluation (LREC 2016)
[2]: Shavrina T., Shapovalova O. (2017) TO THE METHODOLOGY OF CORPUS CONSTRUCTION FOR MACHINE LEARNING: ?TAIGA? SYNTAX TREE CORPUS AND PARSER. in proc. of “CORPORA2017”, international conference , Saint-Petersbourg, 2017.
數(shù)據(jù)評估
本站OpenI提供的DeepPavlov/rubert-base-cased-conversational都來源于網(wǎng)絡(luò),不保證外部鏈接的準(zhǔn)確性和完整性,同時(shí),對于該外部鏈接的指向,不由OpenI實(shí)際控制,在2023年 6月 20日 上午2:38收錄時(shí),該網(wǎng)頁上的內(nèi)容,都屬于合規(guī)合法,后期網(wǎng)頁的內(nèi)容如出現(xiàn)違規(guī),可以直接聯(lián)系網(wǎng)站管理員進(jìn)行刪除,OpenI不承擔(dān)任何責(zé)任。