<span id="3dn8r"></span>
    1. <span id="3dn8r"><optgroup id="3dn8r"></optgroup></span><li id="3dn8r"><meter id="3dn8r"></meter></li>


        Releasing Hindi ELECTRA model

        This is a first attempt at a Hindi language model trained with Google Research’s ELECTRA.
        As of 2022 I recommend Google’s MuRIL model trained on English, Hindi, and other major Indian languages, both in their script and latinized script: https://huggingface.co/google/muril-base-cased and https://huggingface.co/google/muril-large-cased
        For causal language models, I would suggest https://huggingface.co/sberbank-ai/mGPT, though this is a large model
        Tokenization and training CoLab
        I originally used a modified ELECTRA for finetuning, but now use SimpleTransformers.
        Blog post – I was greatly influenced by: https://huggingface.co/blog/how-to-train


        Example Notebooks

        This small model has comparable results to Multilingual BERT on BBC Hindi news classification
        and on Hindi movie reviews / sentiment analysis (using SimpleTransformers)
        You can get higher accuracy using ktrain by adjusting learning rate (also: changing model_type in config.json – this is an open issue with ktrain): https://colab.research.google.com/drive/1mSeeSfVSOT7e-dVhPlmSsQRvpn6xC05w?usp=sharing
        Question-answering on MLQA dataset: https://colab.research.google.com/drive/1i6fidh2tItf_-IDkljMuaIGmEU6HT2Ar#scrollTo=IcFoAHgKCUiQ
        A larger model (Hindi-TPU-Electra) using ELECTRA base size outperforms both models on Hindi movie reviews / sentiment analysis, but
        does not perform as well on the BBC news classification task.


        Corpus

        Download: https://drive.google.com/drive/folders/1SXzisKq33wuqrwbfp428xeu_hDxXVUUu?usp=sharing
        The corpus is two files:

        • Hindi CommonCrawl deduped by OSCAR https://traces1.inria.fr/oscar/
        • latest Hindi Wikipedia ( https://dumps.wikimedia.org/hiwiki/ ) + WikiExtractor to txt

        Bonus notes:

        • Adding English wiki text or parallel corpus could help with cross-lingual tasks and training


        Vocabulary

        https://drive.google.com/file/d/1-6tXrii3tVxjkbrpSJE9MOG_HhbvP66V/view?usp=sharing
        Bonus notes:

        • Created with HuggingFace Tokenizers; you can increase vocabulary size and re-train; remember to change ELECTRA vocab_size


        Training

        Structure your files, with data-dir named “trainer” here
        trainer
        - vocab.txt
        - pretrain_tfrecords
        -- (all .tfrecord... files)
        - models
        -- modelname
        --- checkpoint
        --- graph.pbtxt
        --- model.*

        CoLab notebook gives examples of GPU vs. TPU setup
        configure_pretraining.py


        Conversion

        Use this process to convert an in-progress or completed ELECTRA checkpoint to a Transformers-ready model:
        git clone https://github.com/huggingface/transformers
        python ./transformers/src/transformers/convert_electra_original_tf_checkpoint_to_pytorch.py
        --tf_checkpoint_path=./models/checkpointdir
        --config_file=config.json
        --pytorch_dump_path=pytorch_model.bin
        --discriminator_or_generator=discriminator
        python

        from transformers import TFElectraForPreTraining
        model = TFElectraForPreTraining.from_pretrained("./dir_with_pytorch", from_pt=True)
        model.save_pretrained("tf")

        Once you have formed one directory with config.json, pytorch_model.bin, tf_model.h5, special_tokens_map.json, tokenizer_config.json, and vocab.txt on the same level, run:
        transformers-cli upload directory

        數據評估

        monsoon-nlp/hindi-bert瀏覽人數已經達到610,如你需要查詢該站的相關權重信息,可以點擊"5118數據""愛站數據""Chinaz數據"進入;以目前的網站數據參考,建議大家請以愛站數據為準,更多網站價值評估因素如:monsoon-nlp/hindi-bert的訪問速度、搜索引擎收錄以及索引量、用戶體驗等;當然要評估一個站的價值,最主要還是需要根據您自身的需求以及需要,一些確切的數據則需要找monsoon-nlp/hindi-bert的站長進行洽談提供。如該站的IP、PV、跳出率等!

        關于monsoon-nlp/hindi-bert特別聲明

        本站OpenI提供的monsoon-nlp/hindi-bert都來源于網絡,不保證外部鏈接的準確性和完整性,同時,對于該外部鏈接的指向,不由OpenI實際控制,在2023年 5月 26日 下午5:54收錄時,該網頁上的內容,都屬于合規合法,后期網頁的內容如出現違規,可以直接聯系網站管理員進行刪除,OpenI不承擔任何責任。

        相關導航

        蟬鏡AI數字人

        暫無評論

        暫無評論...
        主站蜘蛛池模板: 久久久久久久99精品免费观看| 无码国产精品一区二区免费vr| 精品成在人线AV无码免费看| 久久精品国产亚洲麻豆| 久草免费福利在线| 久久影院亚洲一区| 中国一级特黄的片子免费| 中文字幕亚洲无线码| 一级毛片在线播放免费| 激情综合色五月丁香六月亚洲| fc2免费人成在线| 国产成人精品日本亚洲专区61| a级毛片免费全部播放| 亚洲av无码乱码国产精品fc2| 毛片在线全部免费观看| 91天堂素人精品系列全集亚洲| 在线a免费观看最新网站| 久久亚洲最大成人网4438| 日本成人免费在线| 无人视频免费观看免费视频 | 老司机亚洲精品影视www| 猫咪免费观看人成网站在线| 久久亚洲AV无码西西人体| 可以免费观看的毛片| 亚洲国产成人在线视频| 成人黄软件网18免费下载成人黄18免费视频 | 久久亚洲AV成人无码| 手机在线看永久av片免费| 精品特级一级毛片免费观看| 国产亚洲成人久久| **一级一级毛片免费观看| 亚洲欧洲免费无码| 久久伊人亚洲AV无码网站| 亚洲一区二区三区免费在线观看| 亚洲日韩AV一区二区三区四区| 亚洲午夜精品一级在线播放放| 桃子视频在线观看高清免费视频| 亚洲综合欧美色五月俺也去 | 五月天网站亚洲小说| 黄网址在线永久免费观看| 国产精品青草视频免费播放|