Hyper-Connections
Project Links
Meta
Author: Phil Wang
Requires Python: >=3.9
Classifiers
Development Status
- 4 - Beta
Intended Audience
- Developers
License
- OSI Approved :: MIT License
Programming Language
- Python :: 3.9
Topic
- Scientific/Engineering :: Artificial Intelligence

Hyper Connections
Attempt to make multiple residual streams, proposed in Hyper-Connections paper out of Bytedance AI lab, accessible as an easy to use library, as well as for following any new research in this direction.
Write up on mHC from Subhadip Mitra
Install
$ pip install hyper-connections
Usage
import torch
from torch import nn
# a single branch layer
branch = nn.Linear(512, 512)
# before
residual = torch.randn(2, 1024, 512)
residual = branch(residual) + residual
# after, say 4 streams in paper
from hyper_connections import get_init_and_expand_reduce_stream_functions
init_hyper_conn, expand_stream, reduce_stream = get_init_and_expand_reduce_stream_functions(4)
# 1. wrap your branch function
hyper_conn_branch = init_hyper_conn(dim = 512, branch = branch)
# 2. expand to 4 streams, this must be done before your trunk, typically a for-loop with many branch functions
residual = expand_stream(residual)
# 3. forward your residual as usual into the wrapped branch function(s)
residual = hyper_conn_branch(residual)
# 4. reduce 4 streams with a summation, this has to be done after your for-loop trunk. for transformer, unsure whether to do before or after final norm
residual = reduce_stream(residual)
Or doing it manually, as in the paper
import torch
from torch import nn
# a single branch layer
branch = nn.Linear(512, 512)
# before
residual = torch.randn(2, 1024, 512)
residual = branch(residual) + residual
# after, say 4 streams in paper
from hyper_connections import get_init_and_expand_reduce_stream_functions
init_hyper_conn, expand_stream, reduce_stream = get_init_and_expand_reduce_stream_functions(4)
# 1. instantiate hyper connection with correct number of streams (4 in this case) - or use the init function above
hyper_conn = init_hyper_conn(dim = 512)
# 2. expand to 4 streams
residual = expand_stream(residual)
# 3. forward your residual into hyper connection for the branch input + add residual function (learned betas)
branch_input, add_residual = hyper_conn(residual)
branch_output = branch(branch_input)
residual = add_residual(branch_output)
# or you can do it in one line as so -> residual = hyper_conn.decorate_branch(branch)(residual)
# 4. reduce 4 streams with a summation, this has to be done after your for loop trunk
residual = reduce_stream(residual)
To compare hyper connections to plain residual without changing the code, just pass disable = True when fetching the functions
get_init_and_expand_reduce_stream_functions(4, disable = True)
To use the fractionated feature dimensions proposed in a follow up paper by same authors, just instantiate with num_fracs greater than 1 as so
get_init_and_expand_reduce_stream_functions(1, num_fracs = 4) # also allows you to mix streams and fractions of feature dimension
Citation
@article{Zhu2024HyperConnections,
title = {Hyper-Connections},
author = {Defa Zhu and Hongzhi Huang and Zihao Huang and Yutao Zeng and Yunyao Mao and Banggu Wu and Qiyang Min and Xun Zhou},
journal = {ArXiv},
year = {2024},
volume = {abs/2409.19606},
url = {https://api.semanticscholar.org/CorpusID:272987528}
}
@misc{Rubin2024,
author = {Ohad Rubin},
url = {https://medium.com/@ohadrubin/exploring-weight-decay-in-layer-normalization-challenges-and-a-reparameterization-solution-ad4d12c24950}
}
@article{Zhu2025FracConnectionsFE,
title = {Frac-Connections: Fractional Extension of Hyper-Connections},
author = {Defa Zhu and Hongzhi Huang and Jundong Zhou and Zihao Huang and Yutao Zeng and Banggu Wu and Qiyang Min and Xun Zhou},
journal = {ArXiv},
year = {2025},
volume = {abs/2503.14125},
url = {https://api.semanticscholar.org/CorpusID:277104144}
}
@misc{xie2025mhcmanifoldconstrainedhyperconnections,
title = {mHC: Manifold-Constrained Hyper-Connections},
author = {Zhenda Xie and Yixuan Wei and Huanqi Cao and Chenggang Zhao and Chengqi Deng and Jiashi Li and Damai Dai and Huazuo Gao and Jiang Chang and Liang Zhao and Shangyan Zhou and Zhean Xu and Zhengyan Zhang and Wangding Zeng and Shengding Hu and Yuqing Wang and Jingyang Yuan and Lean Wang and Wenfeng Liang},
year = {2025},
eprint = {2512.24880},
archivePrefix = {arXiv},
primaryClass = {cs.CL},
url = {https://arxiv.org/abs/2512.24880},
}
0.4.9
Feb 04, 2026
0.4.8
Feb 04, 2026
0.4.7
Jan 18, 2026
0.4.6
Jan 17, 2026
0.4.5
Jan 17, 2026
0.4.4
Jan 16, 2026
0.4.3
Jan 16, 2026
0.4.2
Jan 15, 2026
0.4.1
Jan 15, 2026
0.4.0
Jan 10, 2026
0.3.16
Jan 09, 2026
0.3.15
Jan 09, 2026
0.3.14
Jan 09, 2026
0.3.12
Jan 08, 2026
0.3.11
Jan 06, 2026
0.3.10
Jan 06, 2026
0.3.9
Jan 06, 2026
0.3.8
Jan 05, 2026
0.3.7
Jan 05, 2026
0.3.6
Jan 05, 2026
0.3.5
Jan 05, 2026
0.3.4
Jan 04, 2026
0.3.3
Jan 04, 2026
0.3.2
Jan 04, 2026
0.3.1
Jan 02, 2026
0.3.0
Jan 02, 2026
0.2.1
Jun 17, 2025
0.2.0
Jun 17, 2025
0.1.15
Feb 15, 2025
0.1.14
Feb 15, 2025
0.1.12
Feb 15, 2025
0.1.11
Jan 30, 2025
0.1.10
Jan 30, 2025
0.1.9
Jan 21, 2025
0.1.8
Jan 05, 2025
0.1.7
Jan 01, 2025
0.1.6
Jan 01, 2025
0.1.5
Dec 29, 2024
0.1.4
Dec 29, 2024
0.1.2
Dec 29, 2024
0.1.1
Dec 29, 2024
0.1.0
Dec 29, 2024
0.0.24
Dec 29, 2024
0.0.23
Dec 29, 2024
0.0.22
Dec 29, 2024
0.0.21
Dec 28, 2024
0.0.20
Dec 28, 2024
0.0.19
Dec 27, 2024
0.0.18
Dec 27, 2024
0.0.17
Dec 27, 2024
0.0.16
Dec 27, 2024
0.0.15
Dec 27, 2024
0.0.14
Dec 26, 2024
0.0.12
Dec 26, 2024
0.0.11
Dec 26, 2024
0.0.10
Dec 26, 2024
0.0.9
Dec 26, 2024
0.0.8
Dec 25, 2024
0.0.7
Dec 25, 2024
0.0.6
Dec 25, 2024
0.0.5
Dec 25, 2024
0.0.4
Dec 25, 2024
0.0.3
Dec 25, 2024
0.0.2
Dec 25, 2024
0.0.1
Dec 24, 2024