Metadata-Version: 2.1
Name: optimum
Version: 1.18.1
Summary: Optimum Library is an extension of the Hugging Face Transformers library, providing a framework to integrate third-party libraries from Hardware Partners and interface with their specific functionality.
Home-page: https://github.com/huggingface/optimum
Author: HuggingFace Inc. Special Ops Team
Author-email: hardware@huggingface.co
License: Apache
Keywords: transformers,quantization,pruning,optimization,training,inference,onnx,onnx runtime,intel,habana,graphcore,neural compressor,ipu,hpu
Classifier: Development Status :: 5 - Production/Stable
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Education
Classifier: Intended Audience :: Science/Research
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3.7
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: >=3.7.0
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: coloredlogs
Requires-Dist: sympy
Requires-Dist: transformers[sentencepiece]<4.40.0,>=4.26.0
Requires-Dist: torch>=1.11
Requires-Dist: packaging
Requires-Dist: numpy
Requires-Dist: huggingface-hub>=0.8.0
Requires-Dist: datasets
Provides-Extra: amd
Requires-Dist: optimum-amd; extra == "amd"
Provides-Extra: benchmark
Requires-Dist: optuna; extra == "benchmark"
Requires-Dist: tqdm; extra == "benchmark"
Requires-Dist: scikit-learn; extra == "benchmark"
Requires-Dist: seqeval; extra == "benchmark"
Requires-Dist: torchvision; extra == "benchmark"
Requires-Dist: evaluate>=0.2.0; extra == "benchmark"
Provides-Extra: dev
Requires-Dist: accelerate; extra == "dev"
Requires-Dist: pytest<=8.0.0; extra == "dev"
Requires-Dist: requests; extra == "dev"
Requires-Dist: parameterized; extra == "dev"
Requires-Dist: pytest-xdist; extra == "dev"
Requires-Dist: Pillow; extra == "dev"
Requires-Dist: sacremoses; extra == "dev"
Requires-Dist: torchvision; extra == "dev"
Requires-Dist: diffusers>=0.17.0; extra == "dev"
Requires-Dist: torchaudio; extra == "dev"
Requires-Dist: einops; extra == "dev"
Requires-Dist: invisible-watermark; extra == "dev"
Requires-Dist: timm; extra == "dev"
Requires-Dist: scikit-learn; extra == "dev"
Requires-Dist: rjieba; extra == "dev"
Requires-Dist: black~=23.1; extra == "dev"
Requires-Dist: ruff==0.1.5; extra == "dev"
Provides-Extra: diffusers
Requires-Dist: diffusers; extra == "diffusers"
Provides-Extra: doc-build
Requires-Dist: accelerate; extra == "doc-build"
Provides-Extra: exporters
Requires-Dist: onnx; extra == "exporters"
Requires-Dist: onnxruntime; extra == "exporters"
Requires-Dist: timm; extra == "exporters"
Provides-Extra: exporters-gpu
Requires-Dist: onnx; extra == "exporters-gpu"
Requires-Dist: onnxruntime-gpu; extra == "exporters-gpu"
Requires-Dist: timm; extra == "exporters-gpu"
Provides-Extra: exporters-tf
Requires-Dist: tensorflow<=2.12.1,>=2.4; extra == "exporters-tf"
Requires-Dist: tf2onnx; extra == "exporters-tf"
Requires-Dist: onnx; extra == "exporters-tf"
Requires-Dist: onnxruntime; extra == "exporters-tf"
Requires-Dist: timm; extra == "exporters-tf"
Requires-Dist: h5py; extra == "exporters-tf"
Requires-Dist: numpy<1.24.0; extra == "exporters-tf"
Requires-Dist: transformers[sentencepiece]<4.38.0,>=4.26.0; extra == "exporters-tf"
Provides-Extra: furiosa
Requires-Dist: optimum-furiosa; extra == "furiosa"
Provides-Extra: graphcore
Requires-Dist: optimum-graphcore; extra == "graphcore"
Provides-Extra: habana
Requires-Dist: optimum-habana; extra == "habana"
Requires-Dist: transformers<4.38.0,>=4.37.0; extra == "habana"
Provides-Extra: intel
Requires-Dist: optimum-intel>=1.15.0; extra == "intel"
Provides-Extra: neural-compressor
Requires-Dist: optimum-intel[neural-compressor]>=1.15.0; extra == "neural-compressor"
Provides-Extra: neuron
Requires-Dist: optimum-neuron[neuron]>=0.0.20; extra == "neuron"
Requires-Dist: transformers==4.36.2; extra == "neuron"
Provides-Extra: neuronx
Requires-Dist: optimum-neuron[neuronx]>=0.0.20; extra == "neuronx"
Requires-Dist: transformers==4.36.2; extra == "neuronx"
Provides-Extra: nncf
Requires-Dist: optimum-intel[nncf]>=1.15.0; extra == "nncf"
Provides-Extra: onnxruntime
Requires-Dist: onnx; extra == "onnxruntime"
Requires-Dist: onnxruntime>=1.11.0; extra == "onnxruntime"
Requires-Dist: datasets>=1.2.1; extra == "onnxruntime"
Requires-Dist: evaluate; extra == "onnxruntime"
Requires-Dist: protobuf>=3.20.1; extra == "onnxruntime"
Provides-Extra: onnxruntime-gpu
Requires-Dist: onnx; extra == "onnxruntime-gpu"
Requires-Dist: onnxruntime-gpu>=1.11.0; extra == "onnxruntime-gpu"
Requires-Dist: datasets>=1.2.1; extra == "onnxruntime-gpu"
Requires-Dist: evaluate; extra == "onnxruntime-gpu"
Requires-Dist: protobuf>=3.20.1; extra == "onnxruntime-gpu"
Requires-Dist: accelerate; extra == "onnxruntime-gpu"
Provides-Extra: openvino
Requires-Dist: optimum-intel[openvino]>=1.15.0; extra == "openvino"
Provides-Extra: quality
Requires-Dist: black~=23.1; extra == "quality"
Requires-Dist: ruff==0.1.5; extra == "quality"
Provides-Extra: tests
Requires-Dist: accelerate; extra == "tests"
Requires-Dist: pytest<=8.0.0; extra == "tests"
Requires-Dist: requests; extra == "tests"
Requires-Dist: parameterized; extra == "tests"
Requires-Dist: pytest-xdist; extra == "tests"
Requires-Dist: Pillow; extra == "tests"
Requires-Dist: sacremoses; extra == "tests"
Requires-Dist: torchvision; extra == "tests"
Requires-Dist: diffusers>=0.17.0; extra == "tests"
Requires-Dist: torchaudio; extra == "tests"
Requires-Dist: einops; extra == "tests"
Requires-Dist: invisible-watermark; extra == "tests"
Requires-Dist: timm; extra == "tests"
Requires-Dist: scikit-learn; extra == "tests"
Requires-Dist: rjieba; extra == "tests"

[![ONNX Runtime](https://github.com/huggingface/optimum/actions/workflows/test_onnxruntime.yml/badge.svg)](https://github.com/huggingface/optimum/actions/workflows/test_onnxruntime.yml)

# Hugging Face Optimum

🤗 Optimum is an extension of 🤗 Transformers and Diffusers, providing a set of optimization tools enabling maximum efficiency to train and run models on targeted hardware, while keeping things easy to use.

## Installation

🤗 Optimum can be installed using `pip` as follows:

```bash
python -m pip install optimum
```

If you'd like to use the accelerator-specific features of 🤗 Optimum, you can install the required dependencies according to the table below:

| Accelerator                                                                                                            | Installation                                      |
|:-----------------------------------------------------------------------------------------------------------------------|:--------------------------------------------------|
| [ONNX Runtime](https://huggingface.co/docs/optimum/onnxruntime/overview)                                                                           | `pip install --upgrade-strategy eager optimum[onnxruntime]`       |
| [Intel Neural Compressor](https://huggingface.co/docs/optimum/intel/index)       | `pip install --upgrade-strategy eager optimum[neural-compressor]`|
| [OpenVINO](https://huggingface.co/docs/optimum/intel/index)                                                                 | `pip install --upgrade-strategy eager optimum[openvino,nncf]`    |
| [AMD Instinct GPUs and Ryzen AI NPU](https://huggingface.co/docs/optimum/amd/index)                     | `pip install --upgrade-strategy eager optimum[amd]`              |
| [Habana Gaudi Processor (HPU)](https://huggingface.co/docs/optimum/habana/index)                                                            | `pip install --upgrade-strategy eager optimum[habana]`           |
| [FuriosaAI](https://huggingface.co/docs/optimum/furiosa/index)                                                                                   | `pip install --upgrade-strategy eager optimum[furiosa]`          |

The `--upgrade-strategy eager` option is needed to ensure the different packages are upgraded to the latest possible version.

To install from source:

```bash
python -m pip install git+https://github.com/huggingface/optimum.git
```

For the accelerator-specific features, append `optimum[accelerator_type]` to the above command:

```bash
python -m pip install optimum[onnxruntime]@git+https://github.com/huggingface/optimum.git
```

## Accelerated Inference

🤗 Optimum provides multiple tools to export and run optimized models on various ecosystems:

- [ONNX](https://huggingface.co/docs/optimum/exporters/onnx/usage_guides/export_a_model) / [ONNX Runtime](https://huggingface.co/docs/optimum/onnxruntime/usage_guides/models)
- TensorFlow Lite
- [OpenVINO](https://huggingface.co/docs/optimum/intel/inference)
- Habana first-gen Gaudi / Gaudi2, more details [here](https://huggingface.co/docs/optimum/main/en/habana/usage_guides/accelerate_inference)

The [export](https://huggingface.co/docs/optimum/exporters/overview) and optimizations can be done both programmatically and with a command line.

### Features summary

| Features                           | [ONNX Runtime](https://huggingface.co/docs/optimum/main/en/onnxruntime/overview)| [Neural Compressor](https://huggingface.co/docs/optimum/main/en/intel/optimization_inc)| [OpenVINO](https://huggingface.co/docs/optimum/main/en/intel/inference)| [TensorFlow Lite](https://huggingface.co/docs/optimum/main/en/exporters/tflite/overview)|
|:----------------------------------:|:------------------:|:------------------:|:------------------:|:------------------:|
| Graph optimization                 | :heavy_check_mark: | N/A                | :heavy_check_mark: | N/A                |
| Post-training dynamic quantization | :heavy_check_mark: | :heavy_check_mark: | N/A                | :heavy_check_mark: |
| Post-training static quantization  | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: | :heavy_check_mark: |
| Quantization Aware Training (QAT)  | N/A                | :heavy_check_mark: | :heavy_check_mark: | N/A                |
| FP16 (half precision)              | :heavy_check_mark: | N/A                | :heavy_check_mark: | :heavy_check_mark: |
| Pruning                            | N/A                | :heavy_check_mark: | :heavy_check_mark: | N/A                |
| Knowledge Distillation             | N/A                | :heavy_check_mark: | :heavy_check_mark: | N/A                |


### OpenVINO

Before you begin, make sure you have all the necessary libraries installed :

```bash
pip install --upgrade-strategy eager optimum[openvino,nncf]
```

It is possible to export 🤗 Transformers and Diffusers models to the OpenVINO format easily:

```bash
optimum-cli export openvino --model distilbert-base-uncased-finetuned-sst-2-english distilbert_sst2_ov
```

If you add `--int8`, the weights will be quantized to INT8. Static quantization can also be applied on the activations using [NNCF](https://github.com/openvinotoolkit/nncf), more information can be found in the [documentation](https://huggingface.co/docs/optimum/main/en/intel/optimization_ov).

To load a model and run inference with OpenVINO Runtime, you can just replace your `AutoModelForXxx` class with the corresponding `OVModelForXxx` class. To load a PyTorch checkpoint and convert it to the OpenVINO format on-the-fly, you can set `export=True` when loading your model.

```diff
- from transformers import AutoModelForSequenceClassification
+ from optimum.intel import OVModelForSequenceClassification
  from transformers import AutoTokenizer, pipeline

  model_id = "distilbert-base-uncased-finetuned-sst-2-english"
  tokenizer = AutoTokenizer.from_pretrained(model_id)
- model = AutoModelForSequenceClassification.from_pretrained(model_id)
+ model = OVModelForSequenceClassification.from_pretrained("distilbert_sst2_ov")

  classifier = pipeline("text-classification", model=model, tokenizer=tokenizer)
  results = classifier("He's a dreadful magician.")
```

You can find more examples in the [documentation](https://huggingface.co/docs/optimum/intel/inference) and in the [examples](https://github.com/huggingface/optimum-intel/tree/main/examples/openvino).

### Neural Compressor

Before you begin, make sure you have all the necessary libraries installed :

```bash
pip install --upgrade-strategy eager optimum[neural-compressor]
```

Dynamic quantization can be applied on your model:

```bash
optimum-cli inc quantize --model distilbert-base-cased-distilled-squad --output ./quantized_distilbert
```

To load a model quantized with Intel Neural Compressor, hosted locally or on the 🤗 hub, you can do as follows :
```python
from optimum.intel import INCModelForSequenceClassification

model_id = "Intel/distilbert-base-uncased-finetuned-sst-2-english-int8-dynamic"
model = INCModelForSequenceClassification.from_pretrained(model_id)
```

You can find more examples in the [documentation](https://huggingface.co/docs/optimum/intel/optimization_inc) and in the [examples](https://github.com/huggingface/optimum-intel/tree/main/examples/neural_compressor).

### ONNX + ONNX Runtime

Before you begin, make sure you have all the necessary libraries installed :

```bash
pip install optimum[exporters,onnxruntime]
```

It is possible to export 🤗 Transformers and Diffusers models to the [ONNX](https://onnx.ai/) format and perform graph optimization as well as quantization easily:

```plain
optimum-cli export onnx -m deepset/roberta-base-squad2 --optimize O2 roberta_base_qa_onnx
```

The model can then be quantized using `onnxruntime`:

```bash
optimum-cli onnxruntime quantize \
  --avx512 \
  --onnx_model roberta_base_qa_onnx \
  -o quantized_roberta_base_qa_onnx
```

These commands will export `deepset/roberta-base-squad2` and perform [O2 graph optimization](https://huggingface.co/docs/optimum/onnxruntime/usage_guides/optimization#optimization-configuration) on the exported model, and finally quantize it with the [avx512 configuration](https://huggingface.co/docs/optimum/main/en/onnxruntime/package_reference/configuration#optimum.onnxruntime.AutoQuantizationConfig.avx512).

For more information on the ONNX export, please check the [documentation](https://huggingface.co/docs/optimum/exporters/onnx/usage_guides/export_a_model).

#### Run the exported model using ONNX Runtime

Once the model is exported to the ONNX format, we provide Python classes enabling you to run the exported ONNX model in a seemless manner using [ONNX Runtime](https://onnxruntime.ai/) in the backend:

```diff
- from transformers import AutoModelForQuestionAnswering
+ from optimum.onnxruntime import ORTModelForQuestionAnswering
  from transformers import AutoTokenizer, pipeline

  model_id = "deepset/roberta-base-squad2"
  tokenizer = AutoTokenizer.from_pretrained(model_id)
- model = AutoModelForQuestionAnswering.from_pretrained(model_id)
+ model = ORTModelForQuestionAnswering.from_pretrained("roberta_base_qa_onnx")
  qa_pipe = pipeline("question-answering", model=model, tokenizer=tokenizer)
  question = "What's Optimum?"
  context = "Optimum is an awesome library everyone should use!"
  results = qa_pipe(question=question, context=context)
```

More details on how to run ONNX models with `ORTModelForXXX` classes [here](https://huggingface.co/docs/optimum/main/en/onnxruntime/usage_guides/models).

### TensorFlow Lite

Before you begin, make sure you have all the necessary libraries installed :

```bash
pip install optimum[exporters-tf]
```

Just as for ONNX, it is possible to export models to [TensorFlow Lite](https://www.tensorflow.org/lite) and quantize them:

```plain
optimum-cli export tflite \
  -m deepset/roberta-base-squad2 \
  --sequence_length 384  \
  --quantize int8-dynamic roberta_tflite_model
```

## Accelerated training

🤗 Optimum provides wrappers around the original 🤗 Transformers [Trainer](https://huggingface.co/docs/transformers/main_classes/trainer) to enable training on powerful hardware easily.
We support many providers:

- Habana's Gaudi processors
- ONNX Runtime (optimized for GPUs)

### Habana

Before you begin, make sure you have all the necessary libraries installed :

```bash
pip install --upgrade-strategy eager optimum[habana]
```

```diff
- from transformers import Trainer, TrainingArguments
+ from optimum.habana import GaudiTrainer, GaudiTrainingArguments

  # Download a pretrained model from the Hub
  model = AutoModelForXxx.from_pretrained("bert-base-uncased")

  # Define the training arguments
- training_args = TrainingArguments(
+ training_args = GaudiTrainingArguments(
      output_dir="path/to/save/folder/",
+     use_habana=True,
+     use_lazy_mode=True,
+     gaudi_config_name="Habana/bert-base-uncased",
      ...
  )

  # Initialize the trainer
- trainer = Trainer(
+ trainer = GaudiTrainer(
      model=model,
      args=training_args,
      train_dataset=train_dataset,
      ...
  )

  # Use Habana Gaudi processor for training!
  trainer.train()
```

You can find more examples in the [documentation](https://huggingface.co/docs/optimum/habana/quickstart) and in the [examples](https://github.com/huggingface/optimum-habana/tree/main/examples).

### ONNX Runtime

```diff
- from transformers import Trainer, TrainingArguments
+ from optimum.onnxruntime import ORTTrainer, ORTTrainingArguments

  # Download a pretrained model from the Hub
  model = AutoModelForSequenceClassification.from_pretrained("bert-base-uncased")

  # Define the training arguments
- training_args = TrainingArguments(
+ training_args = ORTTrainingArguments(
      output_dir="path/to/save/folder/",
      optim="adamw_ort_fused",
      ...
  )

  # Create a ONNX Runtime Trainer
- trainer = Trainer(
+ trainer = ORTTrainer(
      model=model,
      args=training_args,
      train_dataset=train_dataset,
      ...
  )

  # Use ONNX Runtime for training!
  trainer.train()
```

You can find more examples in the [documentation](https://huggingface.co/docs/optimum/onnxruntime/usage_guides/trainer) and in the [examples](https://github.com/huggingface/optimum/tree/main/examples/onnxruntime/training).
