Switch Transformers by Google Brain
Switch Transformers是一種用于擴(kuò)展到萬億參數(shù)模型的模型,通過簡單和高效的稀疏性實(shí)現(xiàn)了對大規(guī)模語言模型的訓(xùn)練和預(yù)訓(xùn)練加速。,Switch Transformers by Google Brain官網(wǎng)入口網(wǎng)址
標(biāo)簽:商業(yè)AI 生產(chǎn)效率商業(yè)AI 深度學(xué)習(xí) 生產(chǎn)效率 自然語言處理Switch Transformers by Google Brain官網(wǎng)
Switch Transformers是一種用于擴(kuò)展到萬億參數(shù)模型的模型,通過簡單和高效的稀疏性實(shí)現(xiàn)了對大規(guī)模語言模型的訓(xùn)練和預(yù)訓(xùn)練加速。
網(wǎng)站服務(wù):生產(chǎn)效率,深度學(xué)習(xí),自然語言處理,商業(yè)AI,生產(chǎn)效率,深度學(xué)習(xí),自然語言處理。
Switch Transformers by Google Brain簡介
In deep learning, models typically reuse the same parameters for all inputs. Mixture of Experts (MoE) defies this and instead selects different parameters for each incoming example. The result is a sparsely-activated model — with outrageous numbers of parameters — but a constant computational cost. However, despite several notable successes of MoE, widespread adoption has been hindered by complexity, communication costs and training instability — we address these with the Switch Transformer. We simplify the MoE routing algorithm and design intuitive improved models with reduced communication and computational costs. Our proposed training techniques help wrangle the instabilities and we show large sparse models may be trained, for the first time, with lower precision (bfloat16) formats. We design models based off T5-Base and T5-Large to obtain up to 7x increases in pre-training speed with the same computational resources. These improvements extend into multilingual settings where we measure gains over the mT5-Base version across all 101 languages. Finally, we advance the current scale of language models by pre-training up to trillion parameter models on the "Colossal Clean Crawled Corpus" and achieve a 4x speedup over the T5-XXL model.
什么是”Switch Transformers by Google Brain”?
本文介紹了一種名為Switch Transformers的模型,該模型通過簡單和高效的稀疏性實(shí)現(xiàn)了對萬億參數(shù)模型的擴(kuò)展。通過選擇不同的參數(shù)來處理每個(gè)輸入示例,Switch Transformers實(shí)現(xiàn)了稀疏激活模型,具有大量的參數(shù)但恒定的計(jì)算成本。
“Switch Transformers by Google Brain”有哪些功能?
1. 簡化的MoE路由算法:Switch Transformers簡化了Mixture of Experts(MoE)的路由算法,減少了復(fù)雜性和通信成本。
2. 降低通信和計(jì)算成本:Switch Transformers設(shè)計(jì)了直觀的改進(jìn)模型,減少了通信和計(jì)算成本。
3. 改進(jìn)的訓(xùn)練技術(shù):Switch Transformers提供了一些訓(xùn)練技術(shù),幫助解決訓(xùn)練不穩(wěn)定的問題,并展示了可以使用更低精度(bfloat16)格式訓(xùn)練大型稀疏模型的能力。
應(yīng)用場景:
Switch Transformers可應(yīng)用于各種深度學(xué)習(xí)任務(wù),特別是自然語言處理和機(jī)器翻譯領(lǐng)域。它可以用于訓(xùn)練大規(guī)模的語言模型,提高預(yù)訓(xùn)練速度,并在多語言環(huán)境中取得更好的效果。
“Switch Transformers by Google Brain”如何使用?
Switch Transformers可以通過下載論文中提供的代碼和數(shù)據(jù)集來使用。用戶可以根據(jù)自己的需求進(jìn)行模型的訓(xùn)練和預(yù)訓(xùn)練,并將其應(yīng)用于各種深度學(xué)習(xí)任務(wù)中。
Switch Transformers by Google Brain官網(wǎng)入口網(wǎng)址
https://arxiv.org/abs/2101.03961
OpenI小編發(fā)現(xiàn)Switch Transformers by Google Brain網(wǎng)站非常受用戶歡迎,請?jiān)L問Switch Transformers by Google Brain網(wǎng)址入口試用。
數(shù)據(jù)統(tǒng)計(jì)
數(shù)據(jù)評估
本站OpenI提供的Switch Transformers by Google Brain都來源于網(wǎng)絡(luò),不保證外部鏈接的準(zhǔn)確性和完整性,同時(shí),對于該外部鏈接的指向,不由OpenI實(shí)際控制,在2024年 4月 18日 上午8:00收錄時(shí),該網(wǎng)頁上的內(nèi)容,都屬于合規(guī)合法,后期網(wǎng)頁的內(nèi)容如出現(xiàn)違規(guī),可以直接聯(lián)系網(wǎng)站管理員進(jìn)行刪除,OpenI不承擔(dān)任何責(zé)任。