• Home
Name Date Size #Lines LOC

..--

BUILDD03-May-20241.1 KiB5346

README.mdD03-May-20246.9 KiB223162

dim_comparator.hD03-May-20244 KiB12074

group_iterator.ccD03-May-20242.1 KiB7146

group_iterator.hD03-May-20244.7 KiB15489

sparse_tensor.ccD03-May-202411.2 KiB313232

sparse_tensor.hD03-May-202424.2 KiB666477

sparse_tensor_test.ccD03-May-202425.9 KiB851666

README.md

1SparseTensor
2============
3
4Sparse Tensors are stored as two dense tensors and a shape:
5
6*  `indices`: a `brain::Tensor` storing a matrix, typically `int64`
7*  `values`: a `brain::Tensor` storing a vector with values of type T.
8*  `shape`: a `TensorShape` storing the bounds of the underlying tensor
9*  `order`: (optional) a `gtl::InlinedVector<int64,8>` with the dimensions
10            along which the indices are ordered.
11
12Let
13
14    ix = indices.matrix<int64>()
15    vals = values.vec<T>()
16
17The shape of `ix` is `N x NDIMS`, and each row corresponds to the
18index of a single element of the sparse tensor.
19
20The length of `vals` must be `N`, and `vals(i)` corresponds to the
21value with index `ix(i,:)`.
22
23Shape must be a `TensorShape` with `dims() == NDIMS`.
24The shape is the full shape of the dense tensor these indices
25represent.
26
27To be specific, the representation (pseudocode) is:
28
29    tensor[ix[i,:]] == vals[i] for i = 0, ..., N-1
30
31Ordering
32--------
33
34Indices need not be provided in order.  For example, the following
35index matrix is ordered according to dimension order `{0, 1, 2}`.
36
37    [0 0 1]
38    [0 1 1]
39    [2 0 2]
40
41However, you can provide an unordered version:
42
43    [2 0 2]
44    [0 0 1]
45    [0 1 1]
46
47If the SparseTensor is constructed without a provided order, then a
48the default order is `{-1, ..., -1}`.  Certain operations will fail or crash
49when the order is not provided.
50
51Resorting the SparseTensor in-place (which resorts the underlying index and
52values tensors in-place) will update the order.  The cost of reordering the
53matrix is `O(N*log(N))`, and requires `O(N)` additional temporary space to store
54a reordering index.  If the default order is not specified and reordering is not
55performed, the following will happen:
56
57*  `group()` will **raise an assertion failure**
58*  `IndicesValid()` will **raise an assertion failure**
59
60To update the internal index ordering after construction, call
61`Reorder<T>()` via, e.g., `Reorder<T>({0,1,2})`.
62After this step, all the above methods should work correctly.
63
64The method `IndicesValid()` checks to make sure:
65
66*  `0 <= ix(i, d) < shape.dim_size(d)`
67*  indices do not repeat
68*  indices are in order
69
70Iterating
71---------
72
73### group({grouping dims})
74
75*  provides an iterator that groups entries according to
76   dimensions you care about
77*  may require a sort if your data isn't presorted in a way that's
78   compatible with grouping_dims
79*  for each group, returns the group index (values of the group
80   dims for this iteration), the subset of indices in this group,
81   and the subset of values in this group.  these are lazy outputs
82   so to read them individually, copy them as per the example
83   below.
84
85#### **NOTE**
86`group({dim0, ..., dimk})` will **raise an assertion failure** if the
87order of the SparseTensor does not match the dimensions you wish to group by.
88You must either have your indices in the correct order and construct the
89SparseTensor with
90
91    order = {dim0, ..., dimk, ...}
92
93or call
94
95    Reorder<T>({dim0, .., dimk, ...})
96
97to sort the SparseTensor before grouping.
98
99Example of grouping:
100
101    Tensor indices(DT_INT64, TensorShape({N, NDIMS});
102    Tensor values(DT_STRING, TensorShape({N});
103    TensorShape shape({dim0,...});
104    SparseTensor sp(indices, vals, shape);
105    sp.Reorder<tstring>({1, 2, 0, 3, ...}); // Must provide NDIMS dims.
106    // group according to dims 1 and 2
107    for (const auto& g : sp.group({1, 2})) {
108      cout << "vals of ix[:, 1,2] for this group: "
109           << g.group()[0] << ", " << g.group()[1];
110      cout << "full indices of group:\n" << g.indices();
111      cout << "values of group:\n" << g.values();
112
113      TTypes<int64>::UnalignedMatrix g_ix = g.indices();
114      TTypes<tstring>::UnalignedVec g_v = g.values();
115      ASSERT(g_ix.dimension(0) == g_v.size());  // number of elements match.
116    }
117
118
119ToDense
120--------
121
122Converts sparse tensor to dense.  You must provide a pointer to the
123dense tensor (preallocated).  `ToDense()` will optionally
124preinitialize the tensor with zeros.
125
126Shape checking is performed, as is boundary checking.
127
128    Tensor indices(DT_INT64, TensorShape({N, NDIMS});
129    Tensor values(DT_STRING, TensorShape({N});
130    TensorShape shape({dim0,...});
131    SparseTensor sp(indices, vals, shape);
132    ASSERT(sp.IndicesValid());  // checks ordering & index bounds.
133
134    Tensor dense(DT_STRING, shape);
135    // initialize other indices to zero.  copy.
136    ASSERT(sp.ToDense<tstring>(&dense, true));
137
138
139Concat
140--------
141
142Concatenates multiple SparseTensors and returns a new SparseTensor.
143This concatenation is with respect to the "dense" versions of these
144SparseTensors.  Concatenation is performed along dimension order[0]
145of all tensors.  As a result, shape[order[0]] may differ across
146the inputs, but shape[d] for d != order[0] must match across all inputs.
147
148We call order[0] the **primary dimension**.
149
150**Prerequisites**
151
152*  The inputs' ranks must all match.
153*  The inputs' order[0] must all match.
154*  The inputs' shapes must all match except for dimension order[0].
155*  The inputs' values must all be of the same type.
156
157If any of these are false, concat will die with an assertion failure.
158
159Example:
160Concatenate two sparse matrices along columns.
161
162Matrix 1:
163
164    [0 0 1]
165    [2 0 0]
166    [3 0 4]
167
168Matrix 2:
169
170    [0 0 0 0 0]
171    [0 1 0 0 0]
172    [2 0 0 1 0]
173
174Concatenated Matrix:
175
176    [0 0 1 0 0 0 0 0]
177    [2 0 0 0 1 0 0 0]
178    [3 0 4 2 0 0 1 0]
179
180Expected input shapes, orders, and `nnz()`:
181
182    shape_1 = TensorShape({3, 3})
183    shape_2 = TensorShape({3, 8})
184    order_1 = {1, 0}  // primary order is 1, columns
185    order_2 = {1, 0}  // primary order is 1, must match
186    nnz_1 = 4
187    nnz_2 = 3
188
189Output shapes and orders:
190
191    conc_shape = TensorShape({3, 11})  // primary dim increased, others same
192    conc_order = {1, 0}  // Orders match along all inputs
193    conc_nnz = 7  // Sum of nonzeros of inputs
194
195Coding Example:
196
197    Tensor ix1(DT_INT64, TensorShape({N1, 3});
198    Tensor vals1(DT_STRING, TensorShape({N1, 3});
199    Tensor ix2(DT_INT64, TensorShape({N2, 3});
200    Tensor vals2(DT_STRING, TensorShape({N2, 3});
201    Tensor ix3(DT_INT64, TensorShape({N3, 3});
202    Tensor vals3(DT_STRING, TensorShape({N3, 3});
203
204    SparseTensor st1(ix1, vals1, TensorShape({10, 20, 5}), {1, 0, 2});
205    SparseTensor st2(ix2, vals2, TensorShape({10, 10, 5}), {1, 0, 2});
206    // For kicks, st3 indices are out of order, but order[0] matches so we
207    // can still concatenate along this dimension.
208    SparseTensor st3(ix3, vals3, TensorShape({10, 30, 5}), {1, 2, 0});
209
210    SparseTensor conc = SparseTensor::Concat<string>({st1, st2, st3});
211    Tensor ix_conc = conc.indices();
212    Tensor vals_conc = conc.values();
213    EXPECT_EQ(conc.nnz(), st1.nnz() + st2.nnz() + st3.nnz());
214    EXPECT_EQ(conc.Shape(), TensorShape({10, 60, 5}));
215    EXPECT_EQ(conc.Order(), {-1, -1, -1});
216
217    // Reorder st3 so all input tensors have the exact same orders.
218    st3.Reorder<tstring>({1, 0, 2});
219    SparseTensor conc2 = SparseTensor::Concat<string>({st1, st2, st3});
220    EXPECT_EQ(conc2.Order(), {1, 0, 2});
221    // All indices' orders matched, so output is in order.
222    EXPECT_TRUE(conc2.IndicesValid());
223