README.md
1SparseTensor
2============
3
4Sparse Tensors are stored as two dense tensors and a shape:
5
6* `indices`: a `brain::Tensor` storing a matrix, typically `int64`
7* `values`: a `brain::Tensor` storing a vector with values of type T.
8* `shape`: a `TensorShape` storing the bounds of the underlying tensor
9* `order`: (optional) a `gtl::InlinedVector<int64,8>` with the dimensions
10 along which the indices are ordered.
11
12Let
13
14 ix = indices.matrix<int64>()
15 vals = values.vec<T>()
16
17The shape of `ix` is `N x NDIMS`, and each row corresponds to the
18index of a single element of the sparse tensor.
19
20The length of `vals` must be `N`, and `vals(i)` corresponds to the
21value with index `ix(i,:)`.
22
23Shape must be a `TensorShape` with `dims() == NDIMS`.
24The shape is the full shape of the dense tensor these indices
25represent.
26
27To be specific, the representation (pseudocode) is:
28
29 tensor[ix[i,:]] == vals[i] for i = 0, ..., N-1
30
31Ordering
32--------
33
34Indices need not be provided in order. For example, the following
35index matrix is ordered according to dimension order `{0, 1, 2}`.
36
37 [0 0 1]
38 [0 1 1]
39 [2 0 2]
40
41However, you can provide an unordered version:
42
43 [2 0 2]
44 [0 0 1]
45 [0 1 1]
46
47If the SparseTensor is constructed without a provided order, then a
48the default order is `{-1, ..., -1}`. Certain operations will fail or crash
49when the order is not provided.
50
51Resorting the SparseTensor in-place (which resorts the underlying index and
52values tensors in-place) will update the order. The cost of reordering the
53matrix is `O(N*log(N))`, and requires `O(N)` additional temporary space to store
54a reordering index. If the default order is not specified and reordering is not
55performed, the following will happen:
56
57* `group()` will **raise an assertion failure**
58* `IndicesValid()` will **raise an assertion failure**
59
60To update the internal index ordering after construction, call
61`Reorder<T>()` via, e.g., `Reorder<T>({0,1,2})`.
62After this step, all the above methods should work correctly.
63
64The method `IndicesValid()` checks to make sure:
65
66* `0 <= ix(i, d) < shape.dim_size(d)`
67* indices do not repeat
68* indices are in order
69
70Iterating
71---------
72
73### group({grouping dims})
74
75* provides an iterator that groups entries according to
76 dimensions you care about
77* may require a sort if your data isn't presorted in a way that's
78 compatible with grouping_dims
79* for each group, returns the group index (values of the group
80 dims for this iteration), the subset of indices in this group,
81 and the subset of values in this group. these are lazy outputs
82 so to read them individually, copy them as per the example
83 below.
84
85#### **NOTE**
86`group({dim0, ..., dimk})` will **raise an assertion failure** if the
87order of the SparseTensor does not match the dimensions you wish to group by.
88You must either have your indices in the correct order and construct the
89SparseTensor with
90
91 order = {dim0, ..., dimk, ...}
92
93or call
94
95 Reorder<T>({dim0, .., dimk, ...})
96
97to sort the SparseTensor before grouping.
98
99Example of grouping:
100
101 Tensor indices(DT_INT64, TensorShape({N, NDIMS});
102 Tensor values(DT_STRING, TensorShape({N});
103 TensorShape shape({dim0,...});
104 SparseTensor sp(indices, vals, shape);
105 sp.Reorder<tstring>({1, 2, 0, 3, ...}); // Must provide NDIMS dims.
106 // group according to dims 1 and 2
107 for (const auto& g : sp.group({1, 2})) {
108 cout << "vals of ix[:, 1,2] for this group: "
109 << g.group()[0] << ", " << g.group()[1];
110 cout << "full indices of group:\n" << g.indices();
111 cout << "values of group:\n" << g.values();
112
113 TTypes<int64>::UnalignedMatrix g_ix = g.indices();
114 TTypes<tstring>::UnalignedVec g_v = g.values();
115 ASSERT(g_ix.dimension(0) == g_v.size()); // number of elements match.
116 }
117
118
119ToDense
120--------
121
122Converts sparse tensor to dense. You must provide a pointer to the
123dense tensor (preallocated). `ToDense()` will optionally
124preinitialize the tensor with zeros.
125
126Shape checking is performed, as is boundary checking.
127
128 Tensor indices(DT_INT64, TensorShape({N, NDIMS});
129 Tensor values(DT_STRING, TensorShape({N});
130 TensorShape shape({dim0,...});
131 SparseTensor sp(indices, vals, shape);
132 ASSERT(sp.IndicesValid()); // checks ordering & index bounds.
133
134 Tensor dense(DT_STRING, shape);
135 // initialize other indices to zero. copy.
136 ASSERT(sp.ToDense<tstring>(&dense, true));
137
138
139Concat
140--------
141
142Concatenates multiple SparseTensors and returns a new SparseTensor.
143This concatenation is with respect to the "dense" versions of these
144SparseTensors. Concatenation is performed along dimension order[0]
145of all tensors. As a result, shape[order[0]] may differ across
146the inputs, but shape[d] for d != order[0] must match across all inputs.
147
148We call order[0] the **primary dimension**.
149
150**Prerequisites**
151
152* The inputs' ranks must all match.
153* The inputs' order[0] must all match.
154* The inputs' shapes must all match except for dimension order[0].
155* The inputs' values must all be of the same type.
156
157If any of these are false, concat will die with an assertion failure.
158
159Example:
160Concatenate two sparse matrices along columns.
161
162Matrix 1:
163
164 [0 0 1]
165 [2 0 0]
166 [3 0 4]
167
168Matrix 2:
169
170 [0 0 0 0 0]
171 [0 1 0 0 0]
172 [2 0 0 1 0]
173
174Concatenated Matrix:
175
176 [0 0 1 0 0 0 0 0]
177 [2 0 0 0 1 0 0 0]
178 [3 0 4 2 0 0 1 0]
179
180Expected input shapes, orders, and `nnz()`:
181
182 shape_1 = TensorShape({3, 3})
183 shape_2 = TensorShape({3, 8})
184 order_1 = {1, 0} // primary order is 1, columns
185 order_2 = {1, 0} // primary order is 1, must match
186 nnz_1 = 4
187 nnz_2 = 3
188
189Output shapes and orders:
190
191 conc_shape = TensorShape({3, 11}) // primary dim increased, others same
192 conc_order = {1, 0} // Orders match along all inputs
193 conc_nnz = 7 // Sum of nonzeros of inputs
194
195Coding Example:
196
197 Tensor ix1(DT_INT64, TensorShape({N1, 3});
198 Tensor vals1(DT_STRING, TensorShape({N1, 3});
199 Tensor ix2(DT_INT64, TensorShape({N2, 3});
200 Tensor vals2(DT_STRING, TensorShape({N2, 3});
201 Tensor ix3(DT_INT64, TensorShape({N3, 3});
202 Tensor vals3(DT_STRING, TensorShape({N3, 3});
203
204 SparseTensor st1(ix1, vals1, TensorShape({10, 20, 5}), {1, 0, 2});
205 SparseTensor st2(ix2, vals2, TensorShape({10, 10, 5}), {1, 0, 2});
206 // For kicks, st3 indices are out of order, but order[0] matches so we
207 // can still concatenate along this dimension.
208 SparseTensor st3(ix3, vals3, TensorShape({10, 30, 5}), {1, 2, 0});
209
210 SparseTensor conc = SparseTensor::Concat<string>({st1, st2, st3});
211 Tensor ix_conc = conc.indices();
212 Tensor vals_conc = conc.values();
213 EXPECT_EQ(conc.nnz(), st1.nnz() + st2.nnz() + st3.nnz());
214 EXPECT_EQ(conc.Shape(), TensorShape({10, 60, 5}));
215 EXPECT_EQ(conc.Order(), {-1, -1, -1});
216
217 // Reorder st3 so all input tensors have the exact same orders.
218 st3.Reorder<tstring>({1, 0, 2});
219 SparseTensor conc2 = SparseTensor::Concat<string>({st1, st2, st3});
220 EXPECT_EQ(conc2.Order(), {1, 0, 2});
221 // All indices' orders matched, so output is in order.
222 EXPECT_TRUE(conc2.IndicesValid());
223