Metadata-Version: 2.1
Name: asian-mtl
Version: 0.1.1
Summary: Seamlessly translate your novels with deep learning models.
Home-page: https://github.com/EasierMTL/asian_mtl
Keywords: nlp,translation
Author: Joseph Chen
Author-email: jchen42703@gmail.com
Requires-Python: >=3.8,<4.0
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Requires-Dist: boto3 (>=1.21.40,<2.0.0)
Requires-Dist: botocore (>=1.24.43,<2.0.0)
Requires-Dist: dacite (>=1.6.0,<2.0.0)
Requires-Dist: datasets (>=2.1.0,<3.0.0)
Requires-Dist: gdown (>=4.4.0,<5.0.0)
Requires-Dist: optimum (>=1.1.0,<2.0.0)
Requires-Dist: psutil (>=5.9.4,<6.0.0)
Requires-Dist: pydantic (>=1.9.0,<2.0.0)
Requires-Dist: pyquery (>=1.4.3,<2.0.0)
Requires-Dist: sentencepiece (>=0.1.96,<0.2.0)
Requires-Dist: torch (>=1.11.0,<2.0.0)
Requires-Dist: tqdm (>=4.64.0,<5.0.0)
Requires-Dist: transformers (>=4.18.0,<5.0.0)
Project-URL: Repository, https://github.com/EasierMTL/asian_mtl
Description-Content-Type: text/markdown

# `asian_mtl`

This repository contains the code and documentation for the machine translation models used for EasierMTL's API.

Improved version of the models in the original repository: [EasierMTL/chinese-translation-app](https://github.com/EasierMTL/chinese-translation-app/tree/main/server/chinese_translation_api)

## Supported Translators

All translators support dynamic quantization! [Our benchmarks](#benchmarks) indicate that they 2x inference speeds, while losing <1% BLEU.

- `ChineseToEnglishTranslator()`
- `EnglishToChineseTranslator()`

## Getting Started

```bash
pip install asian_mtl
```

And you're good to go!

If you are contributing, run:

```bash
# https://stackoverflow.com/questions/59882884/vscode-doesnt-show-poetry-virtualenvs-in-select-interpreter-option

poetry config virtualenvs.in-project true

# shows the name of the current environment
poetry env list

poetry install
```

## Usage

When you are using quantized models in this repository, make sure to set `torch.set_num_threads(1)`. This is not set under-the-hood because it could interfere with user setups in an invasive way.

Not doing so will make the quantized models slower than their vanilla counterparts.

## Evaluation

See [`scripts`](./scripts) for evaluation scripts.

To run the scripts, simply run:

```bash
# Running with CLI and config with BERT
python ./scripts/evaluation/eval.py -c ./scripts/evaluation/configs/helsinki.yaml
```

Change the config [`helsinki.yaml`](./scripts/evaluation/configs/helsinki.yaml) to use quantized or your specific use case.

### Benchmarks

Here are some basic benchmarks of models in this repository:

| Model                      | Quantized? | N   | BLEU  | Runtime |
| -------------------------- | ---------- | --- | ----- | ------- |
| Helsinki-NLP/opus-mt-zh-en | No         | 100 | 0.319 | 27s     |
|                            | Yes        | 100 | 0.306 | 13.5s   |

The benchmarks described in the [docs](./docs/evaluation/EVALUATION_REG.md) are a little out-of-date.

