DistilBert for Dense Passage Retrieval trained with Balanced Topic Aware Sampling (TAS-B)

We provide a retrieval trained DistilBert-based model (we call the dual-encoder then dot-product scoring architecture BERT_Dot) trained with Balanced Topic Aware Sampling on MSMARCO-Passage.
This instance was trained with a batch size of 256 and can be used to re-rank a candidate set or directly for a vector index based dense retrieval. The architecture is a 6-layer DistilBERT, without architecture additions or modifications (we only change the weights during training) – to receive a query/passage representation we pool the CLS vector. We use the same BERT layers for both query and passage encoding (yields better results, and lowers memory requirements).
If you want to know more about our efficient (can be done on a single consumer GPU in 48 hours) batch composition procedure and dual supervision for dense retrieval training, check out our paper: https://arxiv.org/abs/2104.06967 ??
For more information and a minimal usage example please visit: https://github.com/sebastian-hofstaetter/tas-balanced-dense-retrieval

Effectiveness on MSMARCO Passage & TREC-DL’19

We trained our model on the MSMARCO standard (“small”-400K query) training triples re-sampled with our TAS-B method. As teacher models we used the BERT_CAT pairwise scores as well as the ColBERT model for in-batch-negative signals published here: https://github.com/sebastian-hofstaetter/neural-ranking-kd

MSMARCO-DEV (7K)

MRR@10	NDCG@10	Recall@1K
BM25	.194	.241	.857
TAS-B BERT_Dot (Retrieval)	.347	.410	.978

數據統計

數據評估

sebastian-hofstaetter/distilbert-dot-tas_b-b256-msmarco瀏覽人數已經達到579，如你需要查詢該站的相關權重信息，可以點擊"5118數據""愛站數據""Chinaz數據"進入；以目前的網站數據參考，建議大家請以愛站數據為準，更多網站價值評估因素如：sebastian-hofstaetter/distilbert-dot-tas_b-b256-msmarco的訪問速度、搜索引擎收錄以及索引量、用戶體驗等；當然要評估一個站的價值，最主要還是需要根據您自身的需求以及需要，一些確切的數據則需要找sebastian-hofstaetter/distilbert-dot-tas_b-b256-msmarco的站長進行洽談提供。如該站的IP、PV、跳出率等！

特別聲明

本站OpenI提供的sebastian-hofstaetter/distilbert-dot-tas_b-b256-msmarco都來源于網絡，不保證外部鏈接的準確性和完整性，同時，對于該外部鏈接的指向，不由OpenI實際控制，在2023年 5月 26日下午5:55收錄時，該網頁上的內容，都屬于合規合法，后期網頁的內容如出現違規，可以直接聯系網站管理員進行刪除，OpenI不承擔任何責任。

OpenI致力于優質、實用的網絡站點資源收集與分享！本文地址http://www.futurefh.com/sites/10679.html轉載請注明