Clustering Boolean Tensors
Much of the computational complexity of Boolean CP and Tucker3 tensor decompositions comes from the fact that the rank1 components can overlap. Yet, in many realworld applications, at least one mode can be considered nonoverlapping (for example, in subject–relation–object data the relations behave differently to subjects and objects). It is therefore natural to ask what can be gained (and what will be lost) if we restrict the Boolean CP decomposition by requiring one factor matrix to be a cluster assignment matrix, that is, a binary matrix where each row has exactly one nonzero.
We [1] phrase the problem as a maximumsimilarity problem (see Figure 1): Given a binary \(n\)by\(m\)by\(l\) tensor \(\tens{X}\) and an integer \(k\), our goal is to find matrices \(A\in\{0,1\}^{n\times k}\), \(B\in\{0,1\}^{m\times k}\), and \(C\in\{0,1\}^{l\times k}\), where \(C\) is a cluster assignment matrix, such that \(\tilde{\tens{X}} = A\otimes B\otimes C\) (\(\otimes\) denotes the outer product) is as similar to \(\tens{X}\) as possible.
Given a tensor \(\tens{X}\), for the optimal solution of the problem, we need binary matrices \(A\), \(B\), and \(C\) that maximize the similarity between \(\tens{X}\) and \(A\otimes B\otimes C\), which can be expressed as \(C(B\odot A)^T\) (\(\odot\) denotes the KhatriRao product) when we compare to the matricization of \(\tens{X}\) along the third mode, \(X_{(3)}\): \(\text{sim}(X_{(3)}C(B\odot A)^T)\). If we replace \(B\odot A\) with an arbitrary binary matrix, this would be equal to the Hypercube segmentation problem defined by Kleinberg et al. [2004]: Given a set \(S\) of \(l\) vertices of the \(d\)dimensional cube \(\{0,1\}^d\), find \(k\) vertices \(P_1,\ldots,P_k\in \{0,1\}^d\) and a partition of \(S\) into \(k\) segments to maximize \(\sum_{i=1}^k\sum_{c\in S} \text{sim}(P_i, c)\). Therefore we employ an algorithm that resembles those for hypercube segmentation, with the added restrictions to the centroids.
Our algorithm, SaBoTeur (Sampling for Boolean Tensor clustering), acts on the unfolded tensor \(X_{(3)}\). That is the tensor \(\tens{X}\) turned into a matrix by arranging its tubes (fibers of the third mode) as columns of a matrix. In each iteration, SaBoTeur samples \(k\) rows of \(X_{(3)}\) as the initial, unrestricted centroids. The next step is to turn these centroids into the restricted type. This means, the maximumsimilarity binary rank1 decomposition of each centroid has to be obtained. Then, SaBoTeur assigns each row of \(X_{(3)}\), i.e. each slice of \(\tens{X}\) to its closest restricted centroid. The sampling is repeated multiple times, and in the end the factors that gave highest similarity are returned.
We show that, like the hypercube segmentation problem, the maximum similarity binary rank1 decomposition also admits a PTAS. For hypercube segmentation, Alon and Sudakov [1999] present a randomized algorithm that on expectation attains a similarity within \((1  \epsilon)\) of the optimum and can be derandomized. To show the approximabilty of the maximum similarity binary rank1 decomposition, we proof that a variation of Alon's and Sudakov's algorithm solves the decomposition while the approximation bounds are maintained.
Source code
References

Clustering Boolean Tensors.
Data Mining and Knowledge Discovery
29(5),
,
1343–1373.
10.1007/s1061801504203
[tech. rep.  manuscript  pdf (Springer)  source code] 
Clustering Boolean Tensors.
arXiv:1501.00696 [cs.NA]
.
[pdf (arXiv)]