Swin Transformers
SwinTransformer
¶
SwinTransformer(
patch_size: tuple[int, int] = (4, 4),
in_chans: int = 3,
embedding_dim: int = 96,
depths: list[int] | None = None,
num_heads: list[int] | None = None,
window_size: int = 7,
mlp_ratio: float = 4.0,
device: device | None = None,
)
Bases: Chain
Swin Transformer (arXiv:2103.14030)
Currently specific to MVANet, only supports square inputs.
Source code in src/refiners/foundationals/swin/swin_transformer.py
WindowAttention
¶
WindowAttention(
dim: int,
window_size: int,
num_heads: int,
shift: bool = False,
device: device | None = None,
)
Bases: Chain
Window-based Multi-head Self-Attenion (W-MSA), optionally shifted (SW-MSA).
It has a trainable relative position bias (RelativePositionBias).
The input projection is stored as a single Linear for q, k and v.
Source code in src/refiners/foundationals/swin/swin_transformer.py
MVANet
¶
MVANet(
embedding_dim: int = 128,
n_logits: int = 1,
depths: list[int] | None = None,
num_heads: list[int] | None = None,
window_size: int = 12,
device: device | None = None,
)
Bases: Chain
Multi-view Aggregation Network for Dichotomous Image Segmentation
See [arXiv:2404.07445] Multi-view Aggregation Network for Dichotomous Image Segmentation for more details.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
embedding_dim |
int
|
embedding dimension |
128
|
n_logits |
int
|
the number of output logits (default to 1) 1 logit is used for alpha matting/foreground-background segmentation/sod segmentation |
1
|
depths |
list[int]
|
see |
None
|
num_heads |
list[int]
|
see |
None
|
window_size |
int
|
default to 12, see |
12
|
device |
device | None
|
the device to use |
None
|