<span id="3dn8r"></span>
    1. <span id="3dn8r"><optgroup id="3dn8r"></optgroup></span><li id="3dn8r"><meter id="3dn8r"></meter></li>


        Vision Transformer (base-sized model)

        Vision Transformer (ViT) model pre-trained on ImageNet-21k (14 million images, 21,843 classes) at resolution 224×224. It was introduced in the paper An Image is Worth 16×16 Words: Transformers for Image Recognition at Scale by Dosovitskiy et al. and first released in this repository. However, the weights were converted from the timm repository by Ross Wightman, who already converted the weights from JAX to PyTorch. Credits go to him.
        Disclaimer: The team releasing ViT did not write a model card for this model so this model card has been written by the Hugging Face team.


        Model description

        The Vision Transformer (ViT) is a transformer encoder model (BERT-like) pretrained on a large collection of images in a supervised fashion, namely ImageNet-21k, at a resolution of 224×224 pixels.
        Images are presented to the model as a sequence of fixed-size patches (resolution 16×16), which are linearly embedded. One also adds a [CLS] token to the beginning of a sequence to use it for classification tasks. One also adds absolute position embeddings before feeding the sequence to the layers of the Transformer encoder.
        Note that this model does not provide any fine-tuned heads, as these were zero’d by Google researchers. However, the model does include the pre-trained pooler, which can be used for downstream tasks (such as image classification).
        By pre-training the model, it learns an inner representation of images that can then be used to extract features useful for downstream tasks: if you have a dataset of labeled images for instance, you can train a standard classifier by placing a linear layer on top of the pre-trained encoder. One typically places a linear layer on top of the [CLS] token, as the last hidden state of this token can be seen as a representation of an entire image.


        Intended uses & limitations

        You can use the raw model for image classification. See the model hub to look for
        fine-tuned versions on a task that interests you.


        How to use

        Here is how to use this model in PyTorch:
        from transformers import ViTImageProcessor, ViTModel
        from PIL import Image
        import requests
        url = 'https://res.www.futurefh.com/2023/05/20230526095402-647081ba87f76.jpg'
        image = Image.open(requests.get(url, stream=True).raw)
        processor = ViTImageProcessor.from_pretrained('google/vit-base-patch16-224-in21k')
        model = ViTModel.from_pretrained('google/vit-base-patch16-224-in21k')
        inputs = processor(images=image, return_tensors="pt")
        outputs = model(**inputs)
        last_hidden_states = outputs.last_hidden_state

        Here is how to use this model in JAX/Flax:
        from transformers import ViTImageProcessor, FlaxViTModel
        from PIL import Image
        import requests
        url = 'https://res.www.futurefh.com/2023/05/20230526095402-647081ba87f76.jpg'
        image = Image.open(requests.get(url, stream=True).raw)
        processor = ViTImageProcessor.from_pretrained('google/vit-base-patch16-224-in21k')
        model = FlaxViTModel.from_pretrained('google/vit-base-patch16-224-in21k')
        inputs = processor(images=image, return_tensors="np")
        outputs = model(**inputs)
        last_hidden_states = outputs.last_hidden_state


        Training data

        The ViT model was pretrained on ImageNet-21k, a dataset consisting of 14 million images and 21k classes.


        Training procedure


        Preprocessing

        The exact details of preprocessing of images during training/validation can be found here.
        Images are resized/rescaled to the same resolution (224×224) and normalized across the RGB channels with mean (0.5, 0.5, 0.5) and standard deviation (0.5, 0.5, 0.5).


        Pretraining

        The model was trained on TPUv3 hardware (8 cores). All model variants are trained with a batch size of 4096 and learning rate warmup of 10k steps. For ImageNet, the authors found it beneficial to additionally apply gradient clipping at global norm 1. Pre-training resolution is 224.


        Evaluation results

        For evaluation results on several image classification benchmarks, we refer to tables 2 and 5 of the original paper. Note that for fine-tuning, the best results are obtained with a higher resolution (384×384). Of course, increasing the model size will result in better performance.


        BibTeX entry and citation info

        @misc{wu2020visual,
        title={Visual Transformers: Token-based Image Representation and Processing for Computer Vision},
        author={Bichen Wu and Chenfeng Xu and Xiaoliang Dai and Alvin Wan and Peizhao Zhang and Zhicheng Yan and Masayoshi Tomizuka and Joseph Gonzalez and Kurt Keutzer and Peter Vajda},
        year={2020},
        eprint={2006.03677},
        archivePrefix={arXiv},
        primaryClass={cs.CV}
        }

        @inproceedings{deng2009imagenet,
        title={Imagenet: A large-scale hierarchical image database},
        author={Deng, Jia and Dong, Wei and Socher, Richard and Li, Li-Jia and Li, Kai and Fei-Fei, Li},
        booktitle={2009 IEEE conference on computer vision and pattern recognition},
        pages={248--255},
        year={2009},
        organization={Ieee}
        }

        數據評估

        google/vit-base-patch16-224-in21k瀏覽人數已經達到985,如你需要查詢該站的相關權重信息,可以點擊"5118數據""愛站數據""Chinaz數據"進入;以目前的網站數據參考,建議大家請以愛站數據為準,更多網站價值評估因素如:google/vit-base-patch16-224-in21k的訪問速度、搜索引擎收錄以及索引量、用戶體驗等;當然要評估一個站的價值,最主要還是需要根據您自身的需求以及需要,一些確切的數據則需要找google/vit-base-patch16-224-in21k的站長進行洽談提供。如該站的IP、PV、跳出率等!

        關于google/vit-base-patch16-224-in21k特別聲明

        本站OpenI提供的google/vit-base-patch16-224-in21k都來源于網絡,不保證外部鏈接的準確性和完整性,同時,對于該外部鏈接的指向,不由OpenI實際控制,在2023年 5月 26日 下午5:54收錄時,該網頁上的內容,都屬于合規合法,后期網頁的內容如出現違規,可以直接聯系網站管理員進行刪除,OpenI不承擔任何責任。

        相關導航

        蟬鏡AI數字人

        暫無評論

        暫無評論...
        主站蜘蛛池模板: 亚洲色大成网站www永久网站| 亚洲高清在线视频| 亚洲最大av资源站无码av网址| 久久免费区一区二区三波多野| 亚洲中文字幕无码永久在线| 久久国产乱子伦精品免费午夜| 久久精品亚洲乱码伦伦中文| 国产精品永久免费| 亚洲韩国精品无码一区二区三区| v片免费在线观看| 亚洲色自偷自拍另类小说| 久久99精品免费一区二区| 亚洲七七久久精品中文国产| 久久九九久精品国产免费直播| 亚洲精品乱码久久久久久蜜桃不卡 | 特级毛片免费播放| 亚洲视频一区二区| 成人精品一区二区三区不卡免费看| 亚洲国产精品嫩草影院在线观看 | 国产乱码免费卡1卡二卡3卡| 亚洲一区二区三区免费视频| 在线免费视频一区二区| 无码 免费 国产在线观看91| 亚洲精品美女久久777777| 久久久精品2019免费观看| 亚洲性69影院在线观看| 在线观看亚洲免费| jizz免费观看视频| 99久久精品国产亚洲| 成人au免费视频影院| 一级免费黄色大片| 亚洲美女中文字幕| 四虎影在线永久免费四虎地址8848aa | 在线精品一卡乱码免费| 国产精品亚洲精品久久精品| 亚洲日韩aⅴ在线视频| 青娱乐免费视频在线观看| 免费观看又污又黄在线观看| 亚洲精品人成在线观看| 日韩免费视频观看| 无码精品人妻一区二区三区免费看 |