• Home
Name Date Size #Lines LOC

..--

benchmarks/04-Jul-2025-717562

lightning/04-Jul-2025-618463

README.mdD04-Jul-20255.2 KiB146104

__init__.pyD04-Jul-2025174 96

base_data_sparsifier.pyD04-Jul-202513 KiB332261

data_norm_sparsifier.pyD04-Jul-20257.6 KiB204162

quantization_utils.pyD04-Jul-20255.8 KiB151119

README.md

1# Data Sparsifier
2## Intro
3The data sparsifier inherits from the `BaseSparsifier` class. It attempts to sparsify data tensors in general (trainable and non-trainable).
4
5## Implementation Details
6The data sparsifier does not receive a model or a layer to sparsify. Hence, the mask needs to be owned by the data sparsifier. This is achieved by introducing a private container model that registers the data as a parametrized buffer.
7
8The BaseDataSparsifier handles all the housekeeping while allowing the user to just implement the `update_mask` logic in their implementation.
9
10## Supported data
111. torch tensors (torch.Tensor)
122. parameters (nn.Parameter)
133. embedding and embedding bags (nn.Embeddings / nn.EmbeddingBag)
14
15## API details
16`BaseDataSparsifier`: base class with abstract method `update_mask` that computes the new mask for all the data.
17
18`add_data`: Accepts name, data tuple and registers the data as a parametrized buffer inside the container model. Note that the data is always associated to a name. A custom sparse config can be provided along with the name, data pair. If not provided, the default config will be applied while doing the sparsification.
19If the named data already exists, then it is replaced with the new data. The config and mask will be retained for the new data unless not specified to.
20To not the old mask, set `reuse_mask=False`. If the `config` is explicitly passed in, it will be updated.
21
22**Note**: name containing '.' is not a valid name for the data sparsifier
23
24```
25data_sparsifier = ImplementedDataSparsifier()
26data_sparsifier.add_data(name=name, data=data, **some_config)
27```
28
29`step`: applies the update_mask() logic to all the data.
30
31```
32data_sparsifier.step()
33```
34
35`get_mask`: retrieves the mask given the name of the data.
36
37`get_data`: retrieves the data given the `name` argument. Accepts additional argument `return_original` which when set to `True` does not apply the mask while returning
38the data tensor. Example:
39
40```
41original_data = data_sparsifier.get_data(name=name, return_original=True)  # returns data with no mask applied
42sparsified_data = data_sparsifier.get_data(name=name, return_original=False)  # returns data * mask
43```
44
45`squash_mask`: removes the parametrizations on the data and applies mask to the data when `leave_parametrized=True`.Also, accepts list of strings to squash mask for. If none, squashes mask for all the keys.
46```
47data_sparsifier.squash_mask()
48```
49
50`state_dict`: Returns dictionary that can be serialized.
51
52## Write your own data sparsifier.
53The custom data sparsifier should be inherited from the BaseDataSparsifier class and the `update_mask()` should be implemented. For example, the following data sparsifier zeros out all entries of the tensor smaller than some threshold value.
54
55```
56class ImplementedDataSparsifier(BaseDataSparsifier):
57    def __init__(self, threshold):
58        super().__init__(threshold=threshold)
59
60    def update_mask(self, name, data, threshold):
61        mask = self.get_mask(name)
62        mask[torch.abs(data) < threshold] = 0.0
63```
64
65## Using Data Sparsifier
66### Simple example
67
68```
69tensor1 = torch.randn(100, 100)
70param1 = nn.Parameter(torch.randn(200, 32))
71
72my_sparsifier = ImplementedDataSparsifier(threshold=0.2)
73my_sparsifier.add_data(name='tensor1', data=tensor1, threshold=0.5)
74my_sparsifier.add_data(name='param1', data=param1)
75
76my_sparsifier.step()  # computes mask
77
78my_sparsifier.squash_mask()  # applies and removes mask
79```
80
81### Sparsifying model embeddings
82
83```
84class Model(nn.Module):
85    def __init__(self, feature_dim, emb_dim, num_classes):
86        self.emb = nn.EmbeddingBag(feature_dim, emb_dim)
87        self.linear1 = nn.Linear(emb_dim, 32)
88        self.linear2 = nn.Linear(32, num_classes)
89        self.relu = nn.ReLU()
90
91    def forward(self, x):
92        out = self.emb(x)
93        out = self.relu(self.linear1(out))
94        out = self.linear2(out)
95        return out
96
97model = Model(100, 32, 10)
98my_sparsifier = ImplementedDataSparsifier(threshold=0.5)
99my_sparsifier.add_data(name='emb', data=model.emb)
100
101...
102# Train model
103...
104
105my_sparsifier.step()  # creates mask for embeddings
106
107my_sparsifier.squash_mask()  # applies and removes mask
108```
109
110### Using in the context of training data
111Sometimes if the input data can be sparsified before sending it to the model, then we can do so by using the data sparsifier.
112
113The batched input data needs to be attached to the data sparsified before sending it to the model.
114
115```
116model = SomeModel()
117
118data_sparsifier = ImplementedDataSparsifier(threshold=0.2)
119
120data_name = 'train_data'
121
122for x, y in train_data_loader:
123    x = data_sparsifier.add_data(name=data_name, data=x)
124    ...
125    y_out = model(x)
126    ...
127    data_sparsifier.step()
128
129```
130
131
132**Note**:
1331. It is the responsibility of the `BaseDataSparsifier` to call the `self.update_mask` when appropriate.
1342. The mask should be modified in place.
135
136    Some valid inplace operations are:
137    1. Change a portion of a mask: `mask[:10] = torch.zeros(10)`
138    2. Use an inplace operator: `mask *= another_mask`
139    3. Change the underlying data: `mask.data = torch.zeros_like(mask)`
140
141    Non-inplace operations are not valid, and might lead to bugs. For example:
142
143    1. Reassignment of a mask: `mask = torch.zeros_like(mask)`
144    2. Non-inplace arithmetic operations: `mask = mask * another_mask`
1453. Data sparsifier `name` argument cannot have a '.' in it.
146