Warning: This is a tech preview of github.com/simple-repository. The service is unsupported and may be removed at any time.

Project Links

Meta

Author: Neuralmagic, Inc.

Classifiers

compressed-tensors

The compressed-tensors library extends the safetensors format, providing a versatile and efficient way to store and manage compressed tensor data. This library supports various quantization and sparsity schemes, making it a unified format for handling different model optimizations like GPTQ, AWQ, SmoothQuant, INT8, FP8, SparseGPT, and more.

Why `compressed-tensors`?

As model compression becomes increasingly important for efficient deployment of LLMs, the landscape of quantization and compression techniques has become increasingly fragmented. Each method often comes with its own storage format and loading procedures, making it challenging to work with multiple techniques or switch between them. compressed-tensors addresses this by providing a single, extensible format that can represent a wide variety of compression schemes.

Unified Checkpoint Format: Supports various compression schemes in a single, consistent format.
Wide Compatibility: Works with popular quantization methods like GPTQ, SmoothQuant, and FP8. See llm-compressor
Flexible Quantization Support:
- Weight-only quantization (e.g., W4A16, W8A16, WnA16)
- Activation quantization (e.g., W8A8)
- KV cache quantization
- Non-uniform schemes (different layers can be quantized in different ways!)
Sparsity Support: Handles both unstructured and semi-structured (e.g., 2:4) sparsity patterns.
Open-Source Integration: Designed to work seamlessly with Hugging Face models and PyTorch.

This allows developers and researchers to easily experiment with composing different quantization methods, simplify model deployment pipelines, and reduce the overhead of supporting multiple compression formats in inference engines.

Installation

From PyPI

Stable release:

pip install compressed-tensors

Nightly release:

pip install --pre compressed-tensors

From Source

git clone https://github.com/neuralmagic/compressed-tensors
cd compressed-tensors
pip install -e .

Getting started

Saving/Loading Compressed Tensors (Bitmask Compression)

The function save_compressed uses the compression_format argument to apply compression to tensors. The function load_compressed reverses the process: converts the compressed weights on disk to decompressed weights in device memory.

from compressed_tensors import save_compressed, load_compressed, BitmaskConfig
from torch import Tensor
from typing import Dict

# the example BitmaskConfig method efficiently compresses 
# tensors with large number of zero entries 
compression_config = BitmaskConfig()

tensors: Dict[str, Tensor] = {"tensor_1": Tensor(
    [[0.0, 0.0, 0.0], 
     [1.0, 1.0, 1.0]]
)}
# compress tensors using BitmaskConfig compression format (save them efficiently on disk)
save_compressed(tensors, "model.safetensors", compression_format=compression_config.format)

# decompress tensors (load_compressed returns a generator for memory efficiency)
decompressed_tensors = {}
for tensor_name, tensor in load_compressed("model.safetensors", compression_config = compression_config):
    decompressed_tensors[tensor_name] = tensor

Saving/Loading Compressed Models (Bitmask Compression)

We can apply bitmask compression to a whole model. For more detailed example see example directory.

from compressed_tensors import save_compressed_model, load_compressed, BitmaskConfig
from transformers import AutoModelForCausalLM

model_name = "neuralmagic/llama2.c-stories110M-pruned50"
model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype="auto")

original_state_dict = model.state_dict()

compression_config = BitmaskConfig()

# save compressed model weights
save_compressed_model(model, "compressed_model.safetensors", compression_format=compression_config.format)

# load compressed model weights (`dict` turns generator into a dictionary)
state_dict = dict(load_compressed("compressed_model.safetensors", compression_config))

For more in-depth tutorial on bitmask compression, refer to the notebook.

Saving a Compressed Model with PTQ

We can use compressed-tensors to run basic post training quantization (PTQ) and save the quantized model compressed on disk

model_name = "TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T"
model = AutoModelForCausalLM.from_pretrained(model_name, device_map="cuda:0", torch_dtype="auto")

config = QuantizationConfig.parse_file("./examples/bit_packing/int4_config.json")
config.quantization_status = QuantizationStatus.CALIBRATION
apply_quantization_config(model, config)

dataset = load_dataset("ptb_text_only")["train"]
tokenizer = AutoTokenizer.from_pretrained(model_name)

def tokenize_function(examples):
    return tokenizer(examples["sentence"], padding=False, truncation=True, max_length=1024)

tokenized_dataset = dataset.map(tokenize_function, batched=True)
data_loader = DataLoader(tokenized_dataset, batch_size=1, collate_fn=DefaultDataCollator())

with torch.no_grad():
    for idx, sample in tqdm(enumerate(data_loader), desc="Running calibration"):
        sample = {key: value.to(device) for key,value in sample.items()}
        _ = model(**sample)

        if idx >= 512:
            break

model.apply(freeze_module_quantization)
model.apply(compress_quantized_weights)

output_dir = "./ex_llama1.1b_w4a16_packed_quantize"
compressor = ModelCompressor(quantization_config=config)
compressed_state_dict = compressor.compress(model)
model.save_pretrained(output_dir, state_dict=compressed_state_dict)

For more in-depth tutorial on quantization compression, refer to the notebook.

0.12.3a20251013 Oct 14, 2025

0.12.3a20251010 Oct 11, 2025

0.12.3a20251009 Oct 10, 2025

0.12.3a20251008 Oct 09, 2025

0.12.3a20251007 Oct 08, 2025

0.12.3a20251003 Oct 07, 2025

0.12.2 Oct 07, 2025

0.12.2a20251003 Oct 05, 2025

0.12.2a20251002 Oct 02, 2025

0.12.1 Oct 02, 2025

0.12.1a20251001 Oct 01, 2025

0.12.0 Oct 01, 2025

0.11.1a20250929 Sep 30, 2025

0.11.1a20250923 Sep 25, 2025

0.11.1a20250918 Sep 19, 2025

0.11.1a20250917 Sep 18, 2025

0.11.1a20250912 Sep 13, 2025

0.11.1a20250911 Sep 12, 2025

0.11.1a20250910 Sep 11, 2025

0.11.1a20250909 Sep 10, 2025

0.11.1a20250908 Sep 09, 2025

0.11.1a20250904 Sep 05, 2025

0.11.1a20250903 Sep 04, 2025

0.11.1a20250902 Sep 03, 2025

0.11.1a20250828 Aug 29, 2025

0.11.1a20250821 Aug 22, 2025

0.11.1a20250820 Aug 21, 2025

0.11.1a20250819 Aug 19, 2025

0.11.0 Aug 19, 2025

0.10.3a20250815 Aug 16, 2025

0.10.3a20250814 Aug 15, 2025

0.10.3a20250812 Aug 13, 2025

0.10.3a20250811 Aug 12, 2025

0.10.3a20250806 Aug 08, 2025

0.10.3a20250805 Aug 06, 2025

0.10.3a20250731 Aug 01, 2025

0.10.3a20250728 Jul 29, 2025

0.10.3a20250724 Jul 25, 2025

0.10.3a20250721 Jul 22, 2025

0.10.3a20250716 Jul 17, 2025

0.10.3a20250715 Jul 16, 2025

0.10.3a20250711 Jul 12, 2025

0.10.3a20250710 Jul 11, 2025

0.10.3a20250709 Jul 10, 2025

0.10.3a20250708 Jul 09, 2025

0.10.3a20250707 Jul 08, 2025

0.10.3a20250703 Jul 04, 2025

0.10.3a20250701 Jul 03, 2025

0.10.3a20250620 Jun 24, 2025

0.10.2 Jun 23, 2025

0.10.2a20250620 Jun 21, 2025

0.10.2a20250617 Jun 18, 2025

0.10.2a20250616 Jun 17, 2025

0.10.2a20250613 Jun 14, 2025

0.10.2a20250612 Jun 13, 2025

0.10.2a20250611 Jun 12, 2025

0.10.2a20250609 Jun 10, 2025

0.10.2a20250606 Jun 06, 2025

0.10.1 Jun 06, 2025

0.10.1a20250605 Jun 06, 2025

0.10.1a20250604 Jun 05, 2025

0.10.0 Jun 05, 2025

0.9.5a20250604 Jun 05, 2025

0.9.5a20250603 Jun 04, 2025

0.9.5a20250602 Jun 03, 2025

0.9.5a20250530 May 31, 2025

0.9.5a20250528 May 29, 2025

0.9.5a20250521 May 22, 2025

0.9.5a20250520 May 21, 2025

0.9.5a20250519 May 20, 2025

0.9.5a20250514 May 15, 2025

0.9.5a20250513 May 14, 2025

0.9.5a20250512 May 13, 2025

0.9.5a20250509 May 10, 2025

0.9.5a20250507 May 08, 2025

0.9.5a20250502 May 03, 2025

0.9.5a20250428 Apr 29, 2025

0.9.5a20250425 Apr 28, 2025

0.9.5a20250424 Apr 25, 2025

0.9.4 Apr 24, 2025

0.9.4a20250421 Apr 23, 2025

0.9.4a20250414 Apr 15, 2025

0.9.4a20250412 Apr 12, 2025

0.9.4a20250410 Apr 11, 2025

0.9.4a20250408 Apr 09, 2025

0.9.3 Apr 02, 2025

0.9.2 Feb 18, 2025

0.9.1 Jan 23, 2025

0.9.0 Jan 15, 2025

0.8.1 Dec 11, 2024

0.8.0 Nov 12, 2024

0.7.1 Oct 17, 2024

0.7.0 Oct 09, 2024

0.6.0 Sep 23, 2024

0.5.0 Aug 08, 2024

0.4.0 Jun 21, 2024

0.3.3 May 07, 2024

0.3.2 Apr 29, 2024

0.3.1 Apr 25, 2024

0.3.0 Apr 25, 2024

Wheel compatibility matrix

Platform	Python 3
any

Files in release

compressed_tensors-0.12.2-py3-none-any.whl (178.8KiB)

compressed_tensors-0.12.2.tar.gz (185.9KiB)

Extras:

Dependencies:

torch (>=1.7.0)

pydantic (>=2.0)