llguidance 1.3.0


pip install llguidance

  Latest version

Released: Oct 20, 2025


Meta
Author: Michal Moskal
Requires Python: >=3.9

Classifiers

Low-level Guidance (llguidance)


Performance results from MaskBench



About

This library implements constrained decoding (also called constrained sampling or structured outputs) for Large Langauge Models (LLMs). It can enforce arbitrary context-free grammar on the output of LLM and is fast - on the order of 50μs of CPU time per token (for 128k tokenizer) with negligible startup costs.

Following grammar formats are supported:

The internal format is most powerful (though Lark-like format is catching up, and there are plans to convert the libraries to use it) and can be generated by the following libraries:

The library can be used from:

Integrations

The library is currently integrated in:

Technical details

See Making Structured Outputs Go Brrr for an overview of the library, including the design decisions, performance, and how it compares to other approaches.

Given a context-free grammar, a tokenizer, and a prefix of tokens, llguidance computes a token mask - a set of tokens from the tokenizer - that, when added to the current token prefix, can lead to a valid string in the language defined by the grammar. Mask computation takes approximately 50μs of single-core CPU time for a tokenizer with 128k tokens. While this timing depends on the exact grammar, it holds, for example, for grammars derived from JSON schemas. There is no significant startup cost.

The library implements a context-free grammar parser using Earley’s algorithm on top of a lexer based on derivatives of regular expressions. Mask computation is achieved by traversing the prefix tree (trie) of all possible tokens, leveraging highly optimized code.

Grammars can be also used to speed up decode via fast-forward tokens.

Comparison and performance

See MaskBench in JSON Schema Bench for detailed performance comparisons.

LM-format-enforcer and llama.cpp grammars are similar to llguidance in that they dynamically build token masks for every step of the decoding process. Both are significantly slower - the former due to clean Python code and the latter due to the lack of a lexer and use of a backtracking parser, which, while elegant, is inefficient.

Outlines builds an automaton from constraints and then pre-computes token masks for all automaton states, potentially making sampling fast but inherently limiting constraint complexity and introducing significant startup cost and memory overhead. Llguidance computes token masks on the fly and has essentially no startup cost. The lexer’s automata in llguidance are built lazily and are typically much smaller, as the context-free grammar imposes the top-level structure.

XGrammar follows an approach similar to llama.cpp (explicit stack-based, character-level parser) with additional pre-computation of certain token masks, similar to Outlines. The pre-computation often runs into seconds, and sometimes minutes. If the pre-computation works well for a given input, the masks are computed quickly (under 8μs in half of masks we tested), however if it doesn't fit the particular input, the mask computation times can run to tens or hundreds of milliseconds.

In llguidance, the full mask computation for a typical JSON schema takes about 1.5ms (for 128k tokenizer). However, very often the "slicer" optimization applies, and thus the avarage mask computation in JSON Schema Bench (2.5M tokens, 10k schemas) is under 50μs, with less than 1% of masks taking longer than 1ms, and 0.001% taking longer than 10ms (but still shorter than 30ms). The optimization doesn't involve any significant pre-computation.

Thus, with 16 cores and a 10ms forward pass, llguidance can handle batch sizes up to 3200 without slowing down the model. (Note that a 10ms forward pass for small batch sizes typically increases to 20ms+ for batch sizes of 100-200.)

Building

If you just need the C or Rust library (llguidance), check the parser directory.

For Python bindings:

  • install python 3.9 or later; very likely you'll need a virtual env/conda
  • run ./scripts/install-deps.sh
  • to build and after any changes, run ./scripts/test-guidance.sh

This builds the Python bindings for the library and runs the tests (which mostly live in the Guidance repo - it will clone it).

Contributing

This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.

When you submit a pull request, a CLA bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact opencode@microsoft.com with any additional questions or comments.

Trademarks

This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft trademarks or logos is subject to and must follow Microsoft's Trademark & Brand Guidelines. Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship. Any use of third-party trademarks or logos are subject to those third-party's policies.

1.3.0 Oct 20, 2025
1.2.0 Aug 12, 2025
1.1.2 Aug 08, 2025
1.1.1 Jul 23, 2025
1.1.0 Jul 18, 2025
1.0.1 Jul 03, 2025
1.0.0 Jun 23, 2025
0.7.30 Jun 23, 2025
0.7.29 Jun 06, 2025
0.7.27 Jun 04, 2025
0.7.26 May 30, 2025
0.7.25 May 28, 2025
0.7.24 May 23, 2025
0.7.23 May 22, 2025
0.7.22 May 21, 2025
0.7.21 May 20, 2025
0.7.20 May 15, 2025
0.7.19 Apr 24, 2025
0.7.18 Apr 22, 2025
0.7.17 Apr 22, 2025
0.7.16 Apr 17, 2025
0.7.15 Apr 16, 2025
0.7.14 Apr 11, 2025
0.7.13 Apr 05, 2025
0.7.12 Apr 04, 2025
0.7.11 Mar 27, 2025
0.7.10 Mar 25, 2025
0.7.9 Mar 24, 2025
0.7.8 Mar 21, 2025
0.7.7 Mar 21, 2025
0.7.6 Mar 21, 2025
0.7.5 Mar 21, 2025
0.7.4 Mar 20, 2025
0.7.3 Mar 20, 2025
0.7.2 Mar 19, 2025
0.7.1 Mar 18, 2025
0.7.0 Mar 08, 2025
0.6.31 Mar 05, 2025
0.6.30 Feb 28, 2025
0.6.29 Feb 25, 2025
0.6.28 Feb 21, 2025
0.6.27 Feb 18, 2025
0.6.26 Feb 14, 2025
0.6.25 Feb 12, 2025
0.6.24 Feb 11, 2025
0.6.22 Feb 07, 2025
0.6.21 Feb 07, 2025
0.6.18 Feb 07, 2025
0.6.17 Feb 07, 2025
0.6.16 Feb 06, 2025
0.6.15 Feb 04, 2025
0.6.14 Feb 04, 2025
0.6.13 Feb 03, 2025
0.6.12 Jan 31, 2025
0.6.11 Jan 29, 2025
0.6.10 Jan 25, 2025
0.6.8 Jan 23, 2025
0.6.7 Jan 20, 2025
0.6.6 Jan 17, 2025
0.6.5 Jan 16, 2025
0.6.4 Jan 10, 2025
0.6.3 Jan 09, 2025
0.6.2 Jan 07, 2025
0.6.1 Jan 07, 2025
0.6.0 Dec 20, 2024
0.5.1 Dec 17, 2024
0.5.1rc0 Dec 17, 2024
0.5.0 Dec 08, 2024
0.4.2rc2 Dec 07, 2024
0.4.1 Nov 24, 2024
0.4.0 Nov 22, 2024
0.3.0 Oct 31, 2024
0.2.0 Sep 09, 2024
0.1.8 Sep 06, 2024
0.1.7 Sep 04, 2024
0.1.6 Jul 30, 2024
0.1.5 Jul 26, 2024
0.1.4 Jul 19, 2024
0.0.1 Jul 17, 2024
No dependencies