Michio JP

Michio JP

1629796171

Focal Transformer | Official Implementation of Focal Transformer

Focal Transformer

This is the official implementation of our Focal Transformer -- "Focal Self-attention for Local-Global Interactions in Vision Transformers", by Jianwei Yang, Chunyuan Li, Pengchuan Zhang, Xiyang Dai, Bin Xiao, Lu Yuan and Jianfeng Gao.

Introduction

Our Focal Transfomer introduced a new self-attention mechanism called focal self-attention for vision transformers. In this new mechanism, each token attends the closest surrounding tokens at fine granularity but the tokens far away at coarse granularity, and thus can capture both short- and long-range visual dependencies efficiently and effectively.

With our Focal Transformers, we achieved superior performance over the state-of-the-art vision Transformers on a range of public benchmarks. In particular, our Focal Transformer models with a moderate size of 51.1M and a larger size of 89.8M achieve 83.6 and 84.0 Top-1 accuracy, respectively, on ImageNet classification at 224x224 resolution. Using Focal Transformers as the backbones, we obtain consistent and substantial improvements over the current state-of-the-art methods for 6 different object detection methods trained with standard 1x and 3x schedules. Our largest Focal Transformer yields 58.7/58.9 box mAPs and 50.9/51.3 mask mAPs on COCO mini-val/test-dev, and 55.4 mIoU on ADE20K for semantic segmentation.

Benchmarking

Image Classification on ImageNet-1K

ModelPretrainUse ConvResolutionacc@1acc@5#paramsFLOPsCheckpointConfig
Focal-TIN-1KNo22482.295.928.9M4.9Gdownloadyaml
Focal-TIN-1KYes22482.796.130.8M4.9Gdownloadyaml
Focal-SIN-1KNo22483.696.251.1M9.4Gdownloadyaml
Focal-BIN-1KNo22484.096.589.8M16.4Gdownloadyaml

Object Detection and Instance Segmentation on COCO

Mask R-CNN

BackbonePretrainLr Schd#paramsFLOPsbox mAPmask mAP
Focal-TImageNet-1K1x49M291G44.841.0
Focal-TImageNet-1K3x49M291G47.242.7
Focal-SImageNet-1K1x71M401G47.442.8
Focal-SImageNet-1K3x71M401G48.843.8
Focal-BImageNet-1K1x110M533G47.843.2
Focal-BImageNet-1K3x110M533G49.043.7

RetinaNet

BackbonePretrainLr Schd#paramsFLOPsbox mAP
Focal-TImageNet-1K1x39M265G43.7
Focal-TImageNet-1K3x39M265G45.5
Focal-SImageNet-1K1x62M367G45.6
Focal-SImageNet-1K3x62M367G47.3
Focal-BImageNet-1K1x101M514G46.3
Focal-BImageNet-1K3x101M514G46.9

Other detection methods

BackbonePretrainMethodLr Schd#paramsFLOPsbox mAP
Focal-TImageNet-1KCascade Mask R-CNN3x87M770G51.5
Focal-TImageNet-1KATSS3x37M239G49.5
Focal-TImageNet-1KRepPointsV23x45M491G51.2
Focal-TImageNet-1KSparse R-CNN3x111M196G49.0

Semantic Segmentation on ADE20K

BackbonePretrainMethodResolutionIters#paramsFLOPsmIoUmIoU (MS)
Focal-TImageNet-1KUPerNet512x512160k62M998G45.847.0
Focal-SImageNet-1KUPerNet512x512160k85M1130G48.050.0
Focal-BImageNet-1KUPerNet512x512160k126M1354G49.050.5
Focal-LImageNet-22KUPerNet640x640160k240M3376G54.055.4

Getting Started

Citation

If you find this repo useful to your project, please consider to cite it with following bib:

@misc{yang2021focal,
    title={Focal Self-attention for Local-Global Interactions in Vision Transformers}, 
    author={Jianwei Yang and Chunyuan Li and Pengchuan Zhang and Xiyang Dai and Bin Xiao and Lu Yuan and Jianfeng Gao},
    year={2021},
    eprint={2107.00641},
    archivePrefix={arXiv},
    primaryClass={cs.CV}
}

Acknowledgement

Our codebase is built based on Swin-Transformer. We thank the authors for the nicely organized code!

Contributing

This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.

When you submit a pull request, a CLA bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact opencode@microsoft.com with any additional questions or comments.

Trademarks

This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft trademarks or logos is subject to and must follow Microsoft's Trademark & Brand Guidelines. Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship. Any use of third-party trademarks or logos are subject to those third-party's policies.

Download Details:

Author: microsoft

Source Code: https://github.com/microsoft/Focal-Transformer 

What is GEEK

Buddha Community

Ajay Kapoor

1624252974

Digital Transformation Consulting Services & solutions

Compete in this Digital-First world with PixelCrayons’ advanced level digital transformation consulting services. With 16+ years of domain expertise, we have transformed thousands of companies digitally. Our insight-led, unique, and mindful thinking process helps organizations realize Digital Capital from business outcomes.

Let our expert digital transformation consultants partner with you in order to solve even complex business problems at speed and at scale.

Digital transformation company in india

#digital transformation agency #top digital transformation companies in india #digital transformation companies in india #digital transformation services india #digital transformation consulting firms

Chelsie  Towne

Chelsie Towne

1596716340

A Deep Dive Into the Transformer Architecture – The Transformer Models

Transformers for Natural Language Processing

It may seem like a long time since the world of natural language processing (NLP) was transformed by the seminal “Attention is All You Need” paper by Vaswani et al., but in fact that was less than 3 years ago. The relative recency of the introduction of transformer architectures and the ubiquity with which they have upended language tasks speaks to the rapid rate of progress in machine learning and artificial intelligence. There’s no better time than now to gain a deep understanding of the inner workings of transformer architectures, especially with transformer models making big inroads into diverse new applications like predicting chemical reactions and reinforcement learning.

Whether you’re an old hand or you’re only paying attention to transformer style architecture for the first time, this article should offer something for you. First, we’ll dive deep into the fundamental concepts used to build the original 2017 Transformer. Then we’ll touch on some of the developments implemented in subsequent transformer models. Where appropriate we’ll point out some limitations and how modern models inheriting ideas from the original Transformer are trying to overcome various shortcomings or improve performance.

What Do Transformers Do?

Transformers are the current state-of-the-art type of model for dealing with sequences. Perhaps the most prominent application of these models is in text processing tasks, and the most prominent of these is machine translation. In fact, transformers and their conceptual progeny have infiltrated just about every benchmark leaderboard in natural language processing (NLP), from question answering to grammar correction. In many ways transformer architectures are undergoing a surge in development similar to what we saw with convolutional neural networks following the 2012 ImageNet competition, for better and for worse.

#natural language processing #ai artificial intelligence #transformers #transformer architecture #transformer models

Michio JP

Michio JP

1629796171

Focal Transformer | Official Implementation of Focal Transformer

Focal Transformer

This is the official implementation of our Focal Transformer -- "Focal Self-attention for Local-Global Interactions in Vision Transformers", by Jianwei Yang, Chunyuan Li, Pengchuan Zhang, Xiyang Dai, Bin Xiao, Lu Yuan and Jianfeng Gao.

Introduction

Our Focal Transfomer introduced a new self-attention mechanism called focal self-attention for vision transformers. In this new mechanism, each token attends the closest surrounding tokens at fine granularity but the tokens far away at coarse granularity, and thus can capture both short- and long-range visual dependencies efficiently and effectively.

With our Focal Transformers, we achieved superior performance over the state-of-the-art vision Transformers on a range of public benchmarks. In particular, our Focal Transformer models with a moderate size of 51.1M and a larger size of 89.8M achieve 83.6 and 84.0 Top-1 accuracy, respectively, on ImageNet classification at 224x224 resolution. Using Focal Transformers as the backbones, we obtain consistent and substantial improvements over the current state-of-the-art methods for 6 different object detection methods trained with standard 1x and 3x schedules. Our largest Focal Transformer yields 58.7/58.9 box mAPs and 50.9/51.3 mask mAPs on COCO mini-val/test-dev, and 55.4 mIoU on ADE20K for semantic segmentation.

Benchmarking

Image Classification on ImageNet-1K

ModelPretrainUse ConvResolutionacc@1acc@5#paramsFLOPsCheckpointConfig
Focal-TIN-1KNo22482.295.928.9M4.9Gdownloadyaml
Focal-TIN-1KYes22482.796.130.8M4.9Gdownloadyaml
Focal-SIN-1KNo22483.696.251.1M9.4Gdownloadyaml
Focal-BIN-1KNo22484.096.589.8M16.4Gdownloadyaml

Object Detection and Instance Segmentation on COCO

Mask R-CNN

BackbonePretrainLr Schd#paramsFLOPsbox mAPmask mAP
Focal-TImageNet-1K1x49M291G44.841.0
Focal-TImageNet-1K3x49M291G47.242.7
Focal-SImageNet-1K1x71M401G47.442.8
Focal-SImageNet-1K3x71M401G48.843.8
Focal-BImageNet-1K1x110M533G47.843.2
Focal-BImageNet-1K3x110M533G49.043.7

RetinaNet

BackbonePretrainLr Schd#paramsFLOPsbox mAP
Focal-TImageNet-1K1x39M265G43.7
Focal-TImageNet-1K3x39M265G45.5
Focal-SImageNet-1K1x62M367G45.6
Focal-SImageNet-1K3x62M367G47.3
Focal-BImageNet-1K1x101M514G46.3
Focal-BImageNet-1K3x101M514G46.9

Other detection methods

BackbonePretrainMethodLr Schd#paramsFLOPsbox mAP
Focal-TImageNet-1KCascade Mask R-CNN3x87M770G51.5
Focal-TImageNet-1KATSS3x37M239G49.5
Focal-TImageNet-1KRepPointsV23x45M491G51.2
Focal-TImageNet-1KSparse R-CNN3x111M196G49.0

Semantic Segmentation on ADE20K

BackbonePretrainMethodResolutionIters#paramsFLOPsmIoUmIoU (MS)
Focal-TImageNet-1KUPerNet512x512160k62M998G45.847.0
Focal-SImageNet-1KUPerNet512x512160k85M1130G48.050.0
Focal-BImageNet-1KUPerNet512x512160k126M1354G49.050.5
Focal-LImageNet-22KUPerNet640x640160k240M3376G54.055.4

Getting Started

Citation

If you find this repo useful to your project, please consider to cite it with following bib:

@misc{yang2021focal,
    title={Focal Self-attention for Local-Global Interactions in Vision Transformers}, 
    author={Jianwei Yang and Chunyuan Li and Pengchuan Zhang and Xiyang Dai and Bin Xiao and Lu Yuan and Jianfeng Gao},
    year={2021},
    eprint={2107.00641},
    archivePrefix={arXiv},
    primaryClass={cs.CV}
}

Acknowledgement

Our codebase is built based on Swin-Transformer. We thank the authors for the nicely organized code!

Contributing

This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.

When you submit a pull request, a CLA bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact opencode@microsoft.com with any additional questions or comments.

Trademarks

This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft trademarks or logos is subject to and must follow Microsoft's Trademark & Brand Guidelines. Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship. Any use of third-party trademarks or logos are subject to those third-party's policies.

Download Details:

Author: microsoft

Source Code: https://github.com/microsoft/Focal-Transformer 

Eve  Klocko

Eve Klocko

1596736920

A Deep Dive Into the Transformer Architecture

Transformers for Natural Language Processing

It may seem like a long time since the world of natural language processing (NLP) was transformed by the seminal “Attention is All You Need” paper by Vaswani et al., but in fact that was less than 3 years ago. The relative recency of the introduction of transformer architectures and the ubiquity with which they have upended language tasks speaks to the rapid rate of progress in machine learning and artificial intelligence. There’s no better time than now to gain a deep understanding of the inner workings of transformer architectures, especially with transformer models making big inroads into diverse new applications like predicting chemical reactions and reinforcement learning.

Whether you’re an old hand or you’re only paying attention to transformer style architecture for the first time, this article should offer something for you. First, we’ll dive deep into the fundamental concepts used to build the original 2017 Transformer. Then we’ll touch on some of the developments implemented in subsequent transformer models. Where appropriate we’ll point out some limitations and how modern models inheriting ideas from the original Transformer are trying to overcome various shortcomings or improve performance.

What Do Transformers Do?

Transformers are the current state-of-the-art type of model for dealing with sequences. Perhaps the most prominent application of these models is in text processing tasks, and the most prominent of these is machine translation. In fact, transformers and their conceptual progeny have infiltrated just about every benchmark leaderboard in natural language processing (NLP), from question answering to grammar correction. In many ways transformer architectures are undergoing a surge in development similar to what we saw with convolutional neural networks following the 2012 ImageNet competition, for better and for worse.

#natural language processing #ai artificial intelligence #transformers #transformer architecture #transformer models

Edna  Bernhard

Edna Bernhard

1596525540

A Deep Dive Into the Transformer Architecture

Transformers for Natural Language Processing

It may seem like a long time since the world of natural language processing (NLP) was transformed by the seminal “Attention is All You Need” paper by Vaswani et al., but in fact that was less than 3 years ago. The relative recency of the introduction of transformer architectures and the ubiquity with which they have upended language tasks speaks to the rapid rate of progress in machine learning and artificial intelligence. There’s no better time than now to gain a deep understanding of the inner workings of transformer architectures, especially with transformer models making big inroads into diverse new applications like predicting chemical reactions and reinforcement learning.

Whether you’re an old hand or you’re only paying attention to transformer style architecture for the first time, this article should offer something for you. First, we’ll dive deep into the fundamental concepts used to build the original 2017 Transformer. Then we’ll touch on some of the developments implemented in subsequent transformer models. Where appropriate we’ll point out some limitations and how modern models inheriting ideas from the original Transformer are trying to overcome various shortcomings or improve performance.

What Do Transformers Do?

Transformers are the current state-of-the-art type of model for dealing with sequences. Perhaps the most prominent application of these models is in text processing tasks, and the most prominent of these is machine translation. In fact, transformers and their conceptual progeny have infiltrated just about every benchmark leaderboard in natural language processing (NLP), from question answering to grammar correction. In many ways transformer architectures are undergoing a surge in development similar to what we saw with convolutional neural networks following the 2012 ImageNet competition, for better and for worse.

#natural language processing #ai artificial intelligence #transformers #transformer architecture #transformer models #ai