<span id="3dn8r"></span>
    1. <span id="3dn8r"><optgroup id="3dn8r"></optgroup></span><li id="3dn8r"><meter id="3dn8r"></meter></li>


        Releasing Hindi ELECTRA model

        This is a first attempt at a Hindi language model trained with Google Research’s ELECTRA.
        As of 2022 I recommend Google’s MuRIL model trained on English, Hindi, and other major Indian languages, both in their script and latinized script: https://huggingface.co/google/muril-base-cased and https://huggingface.co/google/muril-large-cased
        For causal language models, I would suggest https://huggingface.co/sberbank-ai/mGPT, though this is a large model
        Tokenization and training CoLab
        I originally used a modified ELECTRA for finetuning, but now use SimpleTransformers.
        Blog post – I was greatly influenced by: https://huggingface.co/blog/how-to-train


        Example Notebooks

        This small model has comparable results to Multilingual BERT on BBC Hindi news classification
        and on Hindi movie reviews / sentiment analysis (using SimpleTransformers)
        You can get higher accuracy using ktrain by adjusting learning rate (also: changing model_type in config.json – this is an open issue with ktrain): https://colab.research.google.com/drive/1mSeeSfVSOT7e-dVhPlmSsQRvpn6xC05w?usp=sharing
        Question-answering on MLQA dataset: https://colab.research.google.com/drive/1i6fidh2tItf_-IDkljMuaIGmEU6HT2Ar#scrollTo=IcFoAHgKCUiQ
        A larger model (Hindi-TPU-Electra) using ELECTRA base size outperforms both models on Hindi movie reviews / sentiment analysis, but
        does not perform as well on the BBC news classification task.


        Corpus

        Download: https://drive.google.com/drive/folders/1SXzisKq33wuqrwbfp428xeu_hDxXVUUu?usp=sharing
        The corpus is two files:

        • Hindi CommonCrawl deduped by OSCAR https://traces1.inria.fr/oscar/
        • latest Hindi Wikipedia ( https://dumps.wikimedia.org/hiwiki/ ) + WikiExtractor to txt

        Bonus notes:

        • Adding English wiki text or parallel corpus could help with cross-lingual tasks and training


        Vocabulary

        https://drive.google.com/file/d/1-6tXrii3tVxjkbrpSJE9MOG_HhbvP66V/view?usp=sharing
        Bonus notes:

        • Created with HuggingFace Tokenizers; you can increase vocabulary size and re-train; remember to change ELECTRA vocab_size


        Training

        Structure your files, with data-dir named “trainer” here
        trainer
        - vocab.txt
        - pretrain_tfrecords
        -- (all .tfrecord... files)
        - models
        -- modelname
        --- checkpoint
        --- graph.pbtxt
        --- model.*

        CoLab notebook gives examples of GPU vs. TPU setup
        configure_pretraining.py


        Conversion

        Use this process to convert an in-progress or completed ELECTRA checkpoint to a Transformers-ready model:
        git clone https://github.com/huggingface/transformers
        python ./transformers/src/transformers/convert_electra_original_tf_checkpoint_to_pytorch.py
        --tf_checkpoint_path=./models/checkpointdir
        --config_file=config.json
        --pytorch_dump_path=pytorch_model.bin
        --discriminator_or_generator=discriminator
        python

        from transformers import TFElectraForPreTraining
        model = TFElectraForPreTraining.from_pretrained("./dir_with_pytorch", from_pt=True)
        model.save_pretrained("tf")

        Once you have formed one directory with config.json, pytorch_model.bin, tf_model.h5, special_tokens_map.json, tokenizer_config.json, and vocab.txt on the same level, run:
        transformers-cli upload directory

        數(shù)據(jù)統(tǒng)計(jì)

        數(shù)據(jù)評(píng)估

        monsoon-nlp/hindi-bert瀏覽人數(shù)已經(jīng)達(dá)到607,如你需要查詢?cè)撜镜南嚓P(guān)權(quán)重信息,可以點(diǎn)擊"5118數(shù)據(jù)""愛(ài)站數(shù)據(jù)""Chinaz數(shù)據(jù)"進(jìn)入;以目前的網(wǎng)站數(shù)據(jù)參考,建議大家請(qǐng)以愛(ài)站數(shù)據(jù)為準(zhǔn),更多網(wǎng)站價(jià)值評(píng)估因素如:monsoon-nlp/hindi-bert的訪問(wèn)速度、搜索引擎收錄以及索引量、用戶體驗(yàn)等;當(dāng)然要評(píng)估一個(gè)站的價(jià)值,最主要還是需要根據(jù)您自身的需求以及需要,一些確切的數(shù)據(jù)則需要找monsoon-nlp/hindi-bert的站長(zhǎng)進(jìn)行洽談提供。如該站的IP、PV、跳出率等!

        關(guān)于monsoon-nlp/hindi-bert特別聲明

        本站OpenI提供的monsoon-nlp/hindi-bert都來(lái)源于網(wǎng)絡(luò),不保證外部鏈接的準(zhǔn)確性和完整性,同時(shí),對(duì)于該外部鏈接的指向,不由OpenI實(shí)際控制,在2023年 5月 26日 下午5:54收錄時(shí),該網(wǎng)頁(yè)上的內(nèi)容,都屬于合規(guī)合法,后期網(wǎng)頁(yè)的內(nèi)容如出現(xiàn)違規(guī),可以直接聯(lián)系網(wǎng)站管理員進(jìn)行刪除,OpenI不承擔(dān)任何責(zé)任。

        相關(guān)導(dǎo)航

        Trae官網(wǎng)

        暫無(wú)評(píng)論

        暫無(wú)評(píng)論...
        主站蜘蛛池模板: 最近中文字幕高清免费中文字幕mv | 国产亚洲视频在线观看| 最近免费中文字幕4| 最近免费中文字幕大全高清大全1| a级日本高清免费看| 男人都懂www深夜免费网站| 永久免费A∨片在线观看| 水蜜桃视频在线观看免费播放高清| 特级毛片免费观看视频| 色婷婷综合缴情综免费观看| 猫咪免费观看人成网站在线| 黄色片网站在线免费观看| 一级做a爰黑人又硬又粗免费看51社区国产精品视 | 亚洲免费无码在线| 亚洲精品老司机在线观看| 国产自偷亚洲精品页65页| 亚洲国产精品一区二区久久hs | 免费a在线观看播放| 亚洲国产成人精品女人久久久 | 一本色道久久88综合亚洲精品高清 | 一级**爱片免费视频| 97在线免费视频| 很黄很污的网站免费| **实干一级毛片aa免费| 野花高清在线电影观看免费视频| 美女被免费视频网站a国产| 四虎永久在线精品免费影视| 国产性爱在线观看亚洲黄色一级片 | 国产精品酒店视频免费看| 又粗又硬又大又爽免费视频播放| 亚洲午夜日韩高清一区| 亚洲av日韩av激情亚洲| 亚洲一区二区三区无码国产| 亚洲暴爽av人人爽日日碰| 一级黄色免费大片| 57pao一国产成视频永久免费| 最近中文字幕mv手机免费高清| 国产精品久免费的黄网站| 亚洲精品午夜国产VA久久成人| 亚洲黄色在线观看视频| 立即播放免费毛片一级|