<span id="3dn8r"></span>
    1. <span id="3dn8r"><optgroup id="3dn8r"></optgroup></span><li id="3dn8r"><meter id="3dn8r"></meter></li>

        deepset/gbert-base-germandpr-question_encoder


        Overview

        Language model: gbert-base-germandpr
        Language: German
        Training data: GermanDPR train set (~ 56MB)
        Eval data: GermanDPR test set (~ 6MB)
        Infrastructure: 4x V100 GPU
        Published: Apr 26th, 2021


        Details

        • We trained a dense passage retrieval model with two gbert-base models as encoders of questions and passages.
        • The dataset is GermanDPR, a new, German language dataset, which we hand-annotated and published online.
        • It comprises 9275 question/answer pairs in the training set and 1025 pairs in the test set.
          For each pair, there are one positive context and three hard negative contexts.
        • As the basis of the training data, we used our hand-annotated GermanQuAD dataset as positive samples and generated hard negative samples from the latest German Wikipedia dump (6GB of raw txt files).
        • The data dump was cleaned with tailored scripts, leading to 2.8 million indexed passages from German Wikipedia.

        See https://deepset.ai/germanquad for more details and dataset download.


        Hyperparameters

        batch_size = 40
        n_epochs = 20
        num_training_steps = 4640
        num_warmup_steps = 460
        max_seq_len = 32 tokens for question encoder and 300 tokens for passage encoder
        learning_rate = 1e-6
        lr_schedule = LinearWarmup
        embeds_dropout_prob = 0.1
        num_hard_negatives = 2


        Performance

        During training, we monitored the in-batch average rank and the loss and evaluated different batch sizes, numbers of epochs, and number of hard negatives on a dev set split from the train set.
        The dev split contained 1030 question/answer pairs.
        Even without thorough hyperparameter tuning, we observed quite stable learning. Multiple restarts with different seeds produced quite similar results.
        Note that the in-batch average rank is influenced by settings for batch size and number of hard negatives. A smaller number of hard negatives makes the task easier.
        After fixing the hyperparameters we trained the model on the full GermanDPR train set.
        We further evaluated the retrieval performance of the trained model on the full German Wikipedia with the GermanDPR test set as labels. To this end, we converted the GermanDPR test set to SQuAD format. The DPR model drastically outperforms the BM25 baseline with regard to recall@k.

        deepset/gbert-base-germandpr-question_encoder


        Usage


        In haystack

        You can load the model in haystack as a retriever for doing QA at scale:
        retriever = DensePassageRetriever(
        document_store=document_store,
        query_embedding_model="deepset/gbert-base-germandpr-question_encoder"
        passage_embedding_model="deepset/gbert-base-germandpr-ctx_encoder"
        )


        Authors

        • Timo M?ller: timo.moeller [at] deepset.ai
        • Julian Risch: julian.risch [at] deepset.ai
        • Malte Pietsch: malte.pietsch [at] deepset.ai


        About us

        deepset/gbert-base-germandpr-question_encoder
        We bring NLP to the industry via open source!
        Our focus: Industry specific language models & large scale QA systems.
        Some of our work:

        • German BERT (aka “bert-base-german-cased”)
        • GermanQuAD and GermanDPR datasets and models (aka “gelectra-base-germanquad”, “gbert-base-germandpr”)
        • FARM
        • Haystack

        Get in touch:
        Twitter | LinkedIn | Website
        By the way: we’re hiring!

        數據評估

        deepset/gbert-base-germandpr-question_encoder瀏覽人數已經達到538,如你需要查詢該站的相關權重信息,可以點擊"5118數據""愛站數據""Chinaz數據"進入;以目前的網站數據參考,建議大家請以愛站數據為準,更多網站價值評估因素如:deepset/gbert-base-germandpr-question_encoder的訪問速度、搜索引擎收錄以及索引量、用戶體驗等;當然要評估一個站的價值,最主要還是需要根據您自身的需求以及需要,一些確切的數據則需要找deepset/gbert-base-germandpr-question_encoder的站長進行洽談提供。如該站的IP、PV、跳出率等!

        關于deepset/gbert-base-germandpr-question_encoder特別聲明

        本站OpenI提供的deepset/gbert-base-germandpr-question_encoder都來源于網絡,不保證外部鏈接的準確性和完整性,同時,對于該外部鏈接的指向,不由OpenI實際控制,在2023年 6月 6日 下午2:57收錄時,該網頁上的內容,都屬于合規合法,后期網頁的內容如出現違規,可以直接聯系網站管理員進行刪除,OpenI不承擔任何責任。

        相關導航

        蟬鏡AI數字人

        暫無評論

        暫無評論...
        主站蜘蛛池模板: 亚洲无线码在线一区观看| 大地资源免费更新在线播放| 亚洲成在人线电影天堂色| 久久水蜜桃亚洲av无码精品麻豆| 亚洲免费人成视频观看| 春暖花开亚洲性无区一区二区| 亚洲视频在线观看不卡| 久久精品国产精品亚洲人人| 久久噜噜噜久久亚洲va久| 一本久久a久久精品亚洲| 亚洲福利秒拍一区二区| 久久久久亚洲AV成人片| 亚洲AV色欲色欲WWW| 久久99精品视免费看| 日韩免费高清视频网站| 精品久久香蕉国产线看观看亚洲| 久久成人免费电影| 在线精品亚洲一区二区小说| 亚洲在成人网在线看| 特级毛片A级毛片100免费播放| 成人a毛片视频免费看| 黄色网址免费大全| 亚洲中文字幕无码爆乳av中文| 免费的涩涩视频在线播放| 成人看的午夜免费毛片| 永久免费av无码网站韩国毛片| 久久精品毛片免费观看| 91香蕉成人免费网站| 亚洲一级黄色视频| 免费毛片在线看不用播放器| 日本免费一本天堂在线| 亚洲理论在线观看| 女人18毛片免费观看| 亚洲大尺度无码专区尤物| 国产午夜亚洲精品不卡电影| 亚洲免费日韩无码系列| 天天操夜夜操免费视频| 亚洲色av性色在线观无码| 99re6在线视频精品免费| 91精品免费观看| 国产亚洲中文日本不卡二区|