Daniel  Hughes

Daniel Hughes


Tools and Modeling Code for The MASSIVE Dataset with Python


👉 Join the MMNLU-22 slack workspace here 👈


  • 12 Aug: We welcome submissions until Sep 2nd for the MMNLU-22 Organizers’ Choice Award, as well as direct paper submissions until Sep 7th. The Organizers’ Choice Award is based primarily on our assessment of the promise of an approach, not only on the evaluation scores. To be eligible, please (a) make a submission on eval.ai to either MMNLU-22 task and (b) send a brief (<1 page) writeup of your approach to mmnlu-22@amazon.com describing the following:
    • Your architecture,
    • Any changes to training data, use of non-public data, or use of public data,
    • How dev data was used and what hyperparameter tuning was performed,
    • Model input and output formats,
    • What tools and libraries you used, and
    • Any additional training techniques you used, such as knowledge distillation.
  • 12 Aug: We are pleased to declare the HIT-SCIR team as the winner of the MMNLU-22 Competition Full Dataset Task. Congratulations to Bo Zheng, Zhuoyang Li, Fuxuan Wei, Qiguang Chen, Libo Qin, and Wanxiang Che from the Research Center for Social Computing and Information Retrieval, Harbin Institute of Technology. The team has been invited to speak at the MMNLU-22 workshop on Dec 7th, where you can learn more about their approach.
  • 12 Aug: We are pleased to declare the FabT5 team as the winner of the MMNLU-22 Competition Zero-Shot Task. Congratulations to Massimo Nicosia and Francesco Piccinno from Google. They have been invited to speak at the MMNLU-22 workshop on Dec 7th, where you can learn more about their approach.

Quick Links


MASSIVE is a parallel dataset of > 1M utterances across 51 languages with annotations for the Natural Language Understanding tasks of intent prediction and slot annotation. Utterances span 60 intents and include 55 slot types. MASSIVE was created by localizing the SLURP dataset, composed of general Intelligent Voice Assistant single-shot interactions.

Accessing and Processing the Data

The dataset can be downloaded here.

The unlabeled MMNLU-22 eval data can be downloaded here

$ curl https://amazon-massive-nlu-dataset.s3.amazonaws.com/amazon-massive-dataset-1.0.tar.gz --output amazon-massive-dataset-1.0.tar.gz
$ tar -xzvf amazon-massive-dataset-1.0.tar.gz
$ tree 1.0
└── data
    ├── af-ZA.jsonl
    ├── am-ET.jsonl
    ├── ar-SA.jsonl

The dataset is organized into files of JSON lines. Each locale (according to ISO-639-1 and ISO-3166 conventions) has its own file containing all dataset partitions. An example JSON line for de-DE has the following:

  "id": "0",
  "locale": "de-DE",
  "partition": "test",
  "scenario": "alarm",
  "intent": "alarm_set",
  "utt": "weck mich diese woche um fünf uhr morgens auf",
  "annot_utt": "weck mich [date : diese woche] um [time : fünf uhr morgens] auf",
  "worker_id": "8",
  "slot_method": [
      "slot": "time",
      "method": "translation"
      "slot": "date",
      "method": "translation"
  "judgments": [
      "worker_id": "32",
      "intent_score": 1,
      "slots_score": 0,
      "grammar_score": 4,
      "spelling_score": 2,
      "language_identification": "target"
      "worker_id": "8",
      "intent_score": 1,
      "slots_score": 1,
      "grammar_score": 4,
      "spelling_score": 2,
      "language_identification": "target"
      "worker_id": "28",
      "intent_score": 1,
      "slots_score": 1,
      "grammar_score": 4,
      "spelling_score": 2,
      "language_identification": "target"

id: maps to the original ID in the SLURP collection. Mapping back to the SLURP en-US utterance, this utterance served as the basis for this localization.

locale: is the language and country code accoring to ISO-639-1 and ISO-3166.

partition: is either train, dev, or test, according to the original split in SLURP.

scenario: is the general domain, aka "scenario" in SLURP terminology, of an utterance

intent: is the specific intent of an utterance within a domain formatted as {scenario}_{intent}

utt: the raw utterance text without annotations

annot_utt: the text from utt with slot annotations formatted as [{label} : {entity}]

worker_id: The obfuscated worker ID from MTurk of the worker completing the localization of the utterance. Worker IDs are specific to a locale and do not map across locales.

slot_method: for each slot in the utterance, whether that slot was a translation (i.e., same expression just in the target language), localization (i.e., not the same expression but a different expression was chosen more suitable to the phrase in that locale), or unchanged (i.e., the original en-US slot value was copied over without modification).

judgments: Each judgment collected for the localized utterance has 6 keys. worker_id is the obfuscated worker ID from MTurk of the worker completing the judgment. Worker IDs are specific to a locale and do not map across locales, but are consistent across the localization tasks and the judgment tasks, e.g., judgment worker ID 32 in the example above may appear as the localization worker ID for the localization of a different de-DE utterance, in which case it would be the same worker.

intent_score : "Does the sentence match the intent?"
  0: No
  1: Yes
  2: It is a reasonable interpretation of the goal

slots_score : "Do all these terms match the categories in square brackets?"
  0: No
  1: Yes
  2: There are no words in square brackets (utterance without a slot)

grammar_score : "Read the sentence out loud. Ignore any spelling, punctuation, or capitalization errors. Does it sound natural?"
  0: Completely unnatural (nonsensical, cannot be understood at all)
  1: Severe errors (the meaning cannot be understood and doesn't sound natural in your language)
  2: Some errors (the meaning can be understood but it doesn't sound natural in your language)
  3: Good enough (easily understood and sounds almost natural in your language)
  4: Perfect (sounds natural in your language)

spelling_score : "Are all words spelled correctly? Ignore any spelling variances that may be due to differences in dialect. Missing spaces should be marked as a spelling error."
  0: There are more than 2 spelling errors
  1: There are 1-2 spelling errors
  2: All words are spelled correctly

language_identification : "The following sentence contains words in the following languages (check all that apply)"
  1: target
  2: english
  3: other
  4: target & english
  5: target & other
  6: english & other
  7: target & english & other

Note that the en-US JSON lines will not have the slot_method or judgment keys, as there was no localization performed. The worker_id key in the en-US file corresponds to the worker ID from SLURP.

  "id": "0",
  "locale": "en-US",
  "partition": "test",
  "scenario": "alarm",
  "intent": "alarm_set",
  "utt": "wake me up at five am this week",
  "annot_utt": "wake me up at [time : five am] [date : this week]",
  "worker_id": "1"

Preparing the Data in datasets format (Apache Arrow)

The data can be prepared in the datasets Apache Arrow format using our script:

python scripts/create_hf_dataset.py -d /path/to/jsonl/files -o /output/path/and/prefix

If you already have number-to-intent and number-to-slot mappings, those can be used when creating the datasets-style dataset:

python scripts/create_hf_dataset.py \
    -d /path/to/jsonl/files \
    -o /output/path/and/prefix \
    --intent-map /path/to/intentmap \
    --slot-map /path/to/slotmap

Training an Encoder Model

We have included intent classification and slot-filling models based on the pretrained XLM-R Base or mT5 encoders coupled with JointBERT-style classification heads. Training can be conducted using the Trainer from transformers.

We have provided some helper functions in massive.utils.training_utils, described below:

  • create_compute_metrics creates the compute_metrics function, which is used to calculate evaluation metrics.
  • init_model is used to initialize one of our provided models.
  • init_tokeinzer initializes one of the pretrained tokenizers.
  • prepare_collator prepares a collator with user-specified max length and padding strategy.
  • prepare_train_dev_datasets, which loads the datasets prepared as described above.
  • output_predictions, which outputs the final predictions when running test.

Training is configured in a yaml file. Examples are given in examples/. A given yaml file fully describes its respective experiment.

Once an experiment configuration file is created, training can be performed using our provided training script. We also have provided a conda environment configuration file with the necessary dependencies that you may choose to use.

conda env create -f conda_env.yml
conda activate massive

Set the PYTHONPATH if needed:

export PYTHONPATH=${PYTHONPATH}:/PATH/TO/massive/src/

Then run training:

python scripts/train.py -c YOUR/CONFIG/FILE.yml

Distributed training can be run using torchrun for PyTorch v1.10 or later or torch.distributed.launch for earlier PyTorch versions. For example:

torchrun --nproc_per_node=8 scripts/train.py -c YOUR/CONFIG/FILE.yml


python -m torch.distributed.launch --nproc_per_node=8 scripts/train.py -c YOUR/CONFIG/FILE.yml

Seq2Seq Model Training

Sequence-to-sequence (Seq2Seq) model training is performed using the MASSIVESeq2SeqTrainer class. This class inherits from Seq2SeqTrainer from transformers. The primary difference with this class is that autoregressive generation is performed during validation, which is turned on using the predict_with_generate training argument. Seq2Seq models use teacher forcing during training.

For text-to-text modeling, we have included the following functions in massive.utils.training_utils:

  • convert_input_to_t2t
  • convert_intents_slots_to_t2t
  • convert_t2t_batch_to_intents_slots

For example, mT5 Base can be trained on an 8-GPU instance as follows:

For PyTorch v1.10 or later:

torchrun --nproc_per_node=8 scripts/train.py -c examples/mt5_base_t2t_20220411.yml 2>&1 | tee /PATH/TO/LOG/FILE

Or on older PyTorch versions:

python -m torch.distributed.launch --nproc_per_node=8 scripts/train.py -c examples/mt5_base_t2t_20220411.yml 2>&1 | tee /PATH/TO/LOG/FILE

Performing Inference on the Test Set

Test inference requires a test block in the configuration. See examples/xlmr_base_test_20220411.yml for an example. Test inference, including evaluation and output of all predictions, can be executed using the scripts/test.py script. For example:

For PyTorch v1.10 or later:

torchrun --nproc_per_node=8 scripts/test.py -c examples/xlmr_base_test_20220411.yml 2>&1 | tee /PATH/TO/LOG/FILE

Or on older PyTorch versions:

python -m torch.distributed.launch --nproc_per_node=8 scripts/test.py -c examples/xlmr_base_test_20220411.yml 2>&1 | tee /PATH/TO/LOG/FILE

Be sure to include a test.predictions_file in the config to output the predictions.

For official test results, please upload your predictions to the eval.ai leaderboard.

MMNLU-22 Eval

To create predictions for the Massively Multilingual NLU 2022 competition on eval.ai, you can follow these example steps using the model you've already trained. An example config is given at examples/mt5_base_t2t_mmnlu_20220720.yml.

Download and untar:

curl https://amazon-massive-nlu-dataset.s3.amazonaws.com/amazon-massive-dataset-heldout-MMNLU-1.0.tar.gz --output amazon-massive-dataset-heldout-MMNLU-1.0.tar.gz

tar -xzvf amazon-massive-dataset-heldout-MMNLU-1.0.tar.gz

Create the huggingface version of the dataset using the mapping files used when training the model.

python scripts/create_hf_dataset.py \
    -d /PATH/TO/mmnlu-eval/data \
    -o /PATH/TO/hf-mmnlu-eval \
    --intent-map /PATH/TO/massive_1.0_hf_format/massive_1.0.intents \
    --slot-map /PATH/TO/massive_1.0_hf_format/massive_1.0.slots

Create a config file similar to examples/mt5_base_t2t_mmnlu_20220720.yml.

Kick off inference from within your environment with dependencies loaded, etc:

For PyTorch v1.10 or later:

torchrun --nproc_per_node=8 scripts/predict.py -c PATH/TO/YOUR/CONFIG.yml 2>&1 | tee PATH/TO/LOG

Or on older PyTorch versions:

python -m torch.distributed.launch --nproc_per_node=8 scripts/predict.py -c PATH/TO/YOUR/CONFIG.yml 2>&1 | tee PATH/TO/LOG

Upload results to the MMNLU-22 Phase on eval.ai.

Hyperparameter Tuning

Hyperparameter tuning can be performed using the Trainer from transformers. Similarly to training, we combine all configurations into a single yaml file. An example is given here: example/xlmr_base_hptuning_20220411.yml.

Once a configuration file has been made, the hyperparameter tuning run can be initiated using our provided scripts/run_hpo.py script. Relative to train.py, this script uses an additional function called prepare_hp_search_args, which converts the hyperparameter search space provided in the configuration into an instantiated ray search space.




We ask that you cite both our MASSIVE paper and the paper for SLURP, given that MASSIVE used English data from SLURP as seed data.

MASSIVE paper:

      title={MASSIVE: A 1M-Example Multilingual Natural Language Understanding Dataset with 51 Typologically-Diverse Languages}, 
      author={Jack FitzGerald and Christopher Hench and Charith Peris and Scott Mackie and Kay Rottmann and Ana Sanchez and Aaron Nash and Liam Urbach and Vishesh Kakarala and Richa Singh and Swetha Ranganath and Laurie Crist and Misha Britan and Wouter Leeuwis and Gokhan Tur and Prem Natarajan},

SLURP paper:

    title = "{SLURP}: A Spoken Language Understanding Resource Package",
    author = "Bastianelli, Emanuele  and
      Vanzo, Andrea  and
      Swietojanski, Pawel  and
      Rieser, Verena",
    booktitle = "Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)",
    month = nov,
    year = "2020",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2020.emnlp-main.588",
    doi = "10.18653/v1/2020.emnlp-main.588",
    pages = "7252--7262",
    abstract = "Spoken Language Understanding infers semantic meaning directly from audio data, and thus promises to reduce error propagation and misunderstandings in end-user applications. However, publicly available SLU resources are limited. In this paper, we release SLURP, a new SLU package containing the following: (1) A new challenging dataset in English spanning 18 domains, which is substantially bigger and linguistically more diverse than existing datasets; (2) Competitive baselines based on state-of-the-art NLU and ASR systems; (3) A new transparent metric for entity labelling which enables a detailed error analysis for identifying potential areas of improvement. SLURP is available at https://github.com/pswietojanski/slurp."

Old News

  • 30 Jul: Based on compelling feedback, we have updated our rules as follows: Contestants for the top-scoring model awards must submit their predictions on the evaluation set by the original deadline of Aug 8th. Contestants for the "organizers' choice award" can submit their predictions until Sep 2nd. The organizers' choice award will be based primarily on the promise of the approach, but we will also consider evaluation scores.
  • 29 Jul 2022: (Outdated -- see above) We have extended the deadline for MMNLU-22 evaluation to Sep 2nd. Additionally, besides the winners of the “full dataset” and “zero-shot” categories, we plan to select one team (“organizer’s choice award”) to present their findings at the workshop. This choice will be made based on the promise of the approach, not just on model evaluation scores.
  • 25 Jul 2022: The unlabeled evaluation set for the Massively Multilingual NLU 2022 Competition has been released. Please note that (1) the eval data is unlabeled, meaning that the keys scenario, intent, and annot_utt are not present, as well as any judgment data, and (2) the intent and slot maps from your previous training run should be used when creating a new huggingface-style dataset using create_hf_dataset.py. More details can be found in the section with heading "MMNLU-22 Eval" below.
  • 7 Jul 2022: Get ready! The unlabeled evaluation data for the Massively Multilingual NLU 2022 Competition will be released on July 25th. Scores can be submitted to the MMNLU-22 leaderboard until Aug 8th. Winners will be invited to speak at the workshop, colocated with EMNLP.
  • 30 Jun 2022: (CFP) Paper submissions for Massively Multilingual NLU 2022, a workshop at EMNLP 2022, are now being accepted. MASSIVE is the shared task for the workshop.
  • 22 Jun 2022: We updated the evaluation code to fix bugs identified by @yichaopku and @bozheng-hit (Issues 13 and 21, PRs 14 and 22). Please pull commit 3932705 or later to use the remedied evaluation code. The baseline results on the leaderboard have been updated, as well as the preprint paper on arXiv.
  • 20 Apr 2022: Launch and release of the MASSIVE dataset, this repo, the MASSIVE paper, the leaderboard, and the Massively Multilingual NLU 2022 workshop and competition.

Download Details:

Author: alexa
Source Code: https://github.com/alexa/massive

License: View license


What is GEEK

Buddha Community

Tools and Modeling Code for The MASSIVE Dataset with Python
Ray  Patel

Ray Patel


Lambda, Map, Filter functions in python

Welcome to my Blog, In this article, we will learn python lambda function, Map function, and filter function.

Lambda function in python: Lambda is a one line anonymous function and lambda takes any number of arguments but can only have one expression and python lambda syntax is

Syntax: x = lambda arguments : expression

Now i will show you some python lambda function examples:

#python #anonymous function python #filter function in python #lambda #lambda python 3 #map python #python filter #python filter lambda #python lambda #python lambda examples #python map

Shardul Bhatt

Shardul Bhatt


Why use Python for Software Development

No programming language is pretty much as diverse as Python. It enables building cutting edge applications effortlessly. Developers are as yet investigating the full capability of end-to-end Python development services in various areas. 

By areas, we mean FinTech, HealthTech, InsureTech, Cybersecurity, and that's just the beginning. These are New Economy areas, and Python has the ability to serve every one of them. The vast majority of them require massive computational abilities. Python's code is dynamic and powerful - equipped for taking care of the heavy traffic and substantial algorithmic capacities. 

Programming advancement is multidimensional today. Endeavor programming requires an intelligent application with AI and ML capacities. Shopper based applications require information examination to convey a superior client experience. Netflix, Trello, and Amazon are genuine instances of such applications. Python assists with building them effortlessly. 

5 Reasons to Utilize Python for Programming Web Apps 

Python can do such numerous things that developers can't discover enough reasons to admire it. Python application development isn't restricted to web and enterprise applications. It is exceptionally adaptable and superb for a wide range of uses.

Robust frameworks 

Python is known for its tools and frameworks. There's a structure for everything. Django is helpful for building web applications, venture applications, logical applications, and mathematical processing. Flask is another web improvement framework with no conditions. 

Web2Py, CherryPy, and Falcon offer incredible capabilities to customize Python development services. A large portion of them are open-source frameworks that allow quick turn of events. 

Simple to read and compose 

Python has an improved sentence structure - one that is like the English language. New engineers for Python can undoubtedly understand where they stand in the development process. The simplicity of composing allows quick application building. 

The motivation behind building Python, as said by its maker Guido Van Rossum, was to empower even beginner engineers to comprehend the programming language. The simple coding likewise permits developers to roll out speedy improvements without getting confused by pointless subtleties. 

Utilized by the best 

Alright - Python isn't simply one more programming language. It should have something, which is the reason the business giants use it. Furthermore, that too for different purposes. Developers at Google use Python to assemble framework organization systems, parallel information pusher, code audit, testing and QA, and substantially more. Netflix utilizes Python web development services for its recommendation algorithm and media player. 

Massive community support 

Python has a steadily developing community that offers enormous help. From amateurs to specialists, there's everybody. There are a lot of instructional exercises, documentation, and guides accessible for Python web development solutions. 

Today, numerous universities start with Python, adding to the quantity of individuals in the community. Frequently, Python designers team up on various tasks and help each other with algorithmic, utilitarian, and application critical thinking. 

Progressive applications 

Python is the greatest supporter of data science, Machine Learning, and Artificial Intelligence at any enterprise software development company. Its utilization cases in cutting edge applications are the most compelling motivation for its prosperity. Python is the second most well known tool after R for data analytics.

The simplicity of getting sorted out, overseeing, and visualizing information through unique libraries makes it ideal for data based applications. TensorFlow for neural networks and OpenCV for computer vision are two of Python's most well known use cases for Machine learning applications.


Thinking about the advances in programming and innovation, Python is a YES for an assorted scope of utilizations. Game development, web application development services, GUI advancement, ML and AI improvement, Enterprise and customer applications - every one of them uses Python to its full potential. 

The disadvantages of Python web improvement arrangements are regularly disregarded by developers and organizations because of the advantages it gives. They focus on quality over speed and performance over blunders. That is the reason it's a good idea to utilize Python for building the applications of the future.

#python development services #python development company #python app development #python development #python in web development #python software development

Tyrique  Littel

Tyrique Littel


Static Code Analysis: What It Is? How to Use It?

Static code analysis refers to the technique of approximating the runtime behavior of a program. In other words, it is the process of predicting the output of a program without actually executing it.

Lately, however, the term “Static Code Analysis” is more commonly used to refer to one of the applications of this technique rather than the technique itself — program comprehension — understanding the program and detecting issues in it (anything from syntax errors to type mismatches, performance hogs likely bugs, security loopholes, etc.). This is the usage we’d be referring to throughout this post.

“The refinement of techniques for the prompt discovery of error serves as well as any other as a hallmark of what we mean by science.”

  • J. Robert Oppenheimer


We cover a lot of ground in this post. The aim is to build an understanding of static code analysis and to equip you with the basic theory, and the right tools so that you can write analyzers on your own.

We start our journey with laying down the essential parts of the pipeline which a compiler follows to understand what a piece of code does. We learn where to tap points in this pipeline to plug in our analyzers and extract meaningful information. In the latter half, we get our feet wet, and write four such static analyzers, completely from scratch, in Python.

Note that although the ideas here are discussed in light of Python, static code analyzers across all programming languages are carved out along similar lines. We chose Python because of the availability of an easy to use ast module, and wide adoption of the language itself.

How does it all work?

Before a computer can finally “understand” and execute a piece of code, it goes through a series of complicated transformations:

static analysis workflow

As you can see in the diagram (go ahead, zoom it!), the static analyzers feed on the output of these stages. To be able to better understand the static analysis techniques, let’s look at each of these steps in some more detail:


The first thing that a compiler does when trying to understand a piece of code is to break it down into smaller chunks, also known as tokens. Tokens are akin to what words are in a language.

A token might consist of either a single character, like (, or literals (like integers, strings, e.g., 7Bob, etc.), or reserved keywords of that language (e.g, def in Python). Characters which do not contribute towards the semantics of a program, like trailing whitespace, comments, etc. are often discarded by the scanner.

Python provides the tokenize module in its standard library to let you play around with tokens:



import io


import tokenize



code = b"color = input('Enter your favourite color: ')"



for token in tokenize.tokenize(io.BytesIO(code).readline):





TokenInfo(type=62 (ENCODING),  string='utf-8')


TokenInfo(type=1  (NAME),      string='color')


TokenInfo(type=54 (OP),        string='=')


TokenInfo(type=1  (NAME),      string='input')


TokenInfo(type=54 (OP),        string='(')


TokenInfo(type=3  (STRING),    string="'Enter your favourite color: '")


TokenInfo(type=54 (OP),        string=')')


TokenInfo(type=4  (NEWLINE),   string='')


TokenInfo(type=0  (ENDMARKER), string='')

(Note that for the sake of readability, I’ve omitted a few columns from the result above — metadata like starting index, ending index, a copy of the line on which a token occurs, etc.)

#code quality #code review #static analysis #static code analysis #code analysis #static analysis tools #code review tips #static code analyzer #static code analysis tool #static analyzer

Ray  Patel

Ray Patel


50+ Basic Python Code Examples

List, strings, score calculation and more…

1. How to print “Hello World” on Python?

2. How to print “Hello + Username” with the user’s name on Python?

3. How to add 2 numbers entered on Python?

4. How to find the Average of 2 Entered Numbers on Python?

5. How to calculate the Entered Visa and Final Grade Average on Python?

6. How to find the Average of 3 Written Grades entered on Python?

7. How to show the Class Pass Status (PASSED — FAILED) of the Student whose Written Average Has Been Entered on Python?

8. How to find out if the entered number is odd or even on Python?

9. How to find out if the entered number is Positive, Negative, or 0 on Python?

#programming #python #coding #50+ basic python code examples #python programming examples #python code

Ray  Patel

Ray Patel


Common Anti-Patterns in Python

Improve and streamline your code by learning about these common anti-patterns that will save you time and effort. Examples of good and bad practices included.

1. Not Using with to Open Files

When you open a file without the with statement, you need to remember closing the file via calling close() explicitly when finished with processing it. Even while explicitly closing the resource, there are chances of exceptions before the resource is actually released. This can cause inconsistencies, or lead the file to be corrupted. Opening a file via with implements the context manager protocol that releases the resource when execution is outside of the with block.

2. Using list/dict/set Comprehension Unnecessarily

3. Unnecessary Use of Generators

4. Returning More Than One Object Type in a Function Call

5. Not Using get() to Return Default Values From a Dictionary

#code reviews #python programming #debugger #code review tips #python coding #python code #code debugging