Material
The slides are available as a PDF.
Outline of the tutorial
Part I: Definitions and Theory
- Background
- Binary factorisations under normal algebra
- Binary factorisations under Boolean algebra
- Binary factorisations under modulo-2 algebra
- Matrix ranks
- Different views on Boolean rank
- A note on inverses
- Computational complexity
- Open problems
Part II: Algorithms and Extensions
- Algorithms rank-1 factorisations
- Algorithms for normal algebra
- Algorithms for Boolean algebra
- Algorithms for modulo-2 algebra
- Selecting the rank
- Sparse matrices
- Open problems
Bibliography
Background
- Beasley, L.B. & Guterman, A.E., 2005. Rank inequalities over semirings. Journal of the Korean Mathematical Society, 42(2), pp. 223–241.
- Beasley, L.B. & Pullman, N.J., 1984. Boolean-rank-preserving operators and Boolean-rank-1 spaces. Linear Algebra and its Applications, 59, pp. 55–77.
- Beasley, L.B. & Pullman, N.J., 1988. Semiring rank versus column rank. Linear Algebra and its Applications, 101, pp. 33–48.
- Cayley, A., 1858. A memoir on the theory of matrices. Philosophical transactions of the Royal society of London, 148, pp. 17–37.
- Doherty, F.C.C., Lundgren, J.R. & Siewert, D.J., 1999. Biclique covers and partitions of bipartite graphs and digraphs and related matrix ranks of {0, 1}-matrices. Congressus Numerantium, 136, pp. 73–96.
- Gregory, D.A. & Pullman, N.J., 1983. Semiring rank: Boolean rank and nonnegative rank factorizations. Journal of Combinatorics, Information & System Sciences, 8(3), pp. 223–233.
- Kim, K.H., 1982. Boolean matrix theory and applications, New York: Marcel Dekker.
- Miettinen, P., 2009. Matrix Decomposition Methods for Data Mining: Computational Complexity and Algorithms. Department of Computer Science, University of Helsinki.
- Monson, S.D., Pullman, N.J. & Rees, R., 1995. A Survey of Clique and Biclique Coverings and Factorizations of (0,1)-Matrices. Bulletin of the ICA, 14, pp. 17–86.
- Peirce, C.S., 1873. Description of a notation for the logic of relatives, resulting from an amplification of the conceptions of Boole's calculus of logic. Memoirs of the American academy of arts and sciences, 9(2), pp. 317–378.
Computational Complexity
- Alon, N., Panigrahy, R. & Yekhanin, S., 2009. Deterministic Approximation Algorithms for the Nearest Codeword Problem. In 12th Intl. Workshop on Approximation Algorithms for Combinatorial Optimization Problems and 13th Intl. Workshop on Randomization and Computation. Springer Berlin Heidelberg, pp. 339–351.
- Amaldi, E. & Kann, V., 1995. The complexity and approximability of finding maximum feasible subsystems of linear relations. Theoretical Computer Science, 147(1-2), pp. 181–210.
- Amilhastre, J., Vilarem, M.-C. & Janssen, P., 1998. Complexity of Minimum Biclique Cover and Minimum Biclique Decomposition for Bipartite Domino-free Graphs. Discrete Applied Mathematics, 86(2-3), pp.125–144.
- Arora, S. et al., 1993. The Hardness of Approximate Optima in Lattices, Codes, and Systems of Linear Equations. In 34th Annual IEEE Symposium on Foundations of Computer Science. IEEE, pp. 724–733.
- Binkele-Raible, D. et al., 2010. Exact exponential-time algorithms for finding bicliques. Information Processing Letters, 111(2), pp. 64–67.
- Carr, R.D. et al., 2000. On the Red-Blue Set Cover Problem. In 11th Annual ACM-SIAM symposium on Discrete algorithms. pp. 345–353.
- Dinur, I. et al., 2003. Approximating CVP to Within Almost-Polynomial Factors is NP-Hard. Combinatorica, 23(2), pp. 205–243.
- Hochbaum, D.S., 1998. Approximating clique and biclique problems. Journal of Algorithms, 29(1), pp. 174–200.
- Miettinen, P., 2009. Matrix Decomposition Methods for Data Mining: Computational Complexity and Algorithms. Department of Computer Science, University of Helsinki.
- Miettinen, P., 2008. On the positive-negative partial set cover problem. Information Processing Letters, 108(4), pp. 219–221.
- Nau, D.S. et al., 1978. A Mathematical Analysis of Human Leukocyte Antigen Serology. Mathematical Biosciences, 40, pp. 243–270.
- Orlin, J., 1977. Contentment in graph theory: covering graphs with cliques. Indagationes Mathematicae, 80(5), pp. 406–424.
- Peeters, R., 2003. The maximum edge biclique problem is NP-complete. Discrete Applied Mathematics, 131(3), pp. 651–654.
- Peleg, D., 2007. Approximation algorithms for the Label-CoverMAX and Red-Blue Set Cover problems. Journal of Discrete Algorithms, 5(1), pp. 55–64.
- Simon, H.U., 1990. On approximate solutions for combinatorial optimization problems. SIAM Journal on Discrete Mathematics, 3(2), pp. 294–310.
- Stockmeyer, L.J., 1975. The Set Basis Problem is NP-complete, IBM Thomas J. Watson Research Center.
Algorithms
- Bělohlávek, R. & Vychodil, V., 2010. Discovery of optimal factors in binary data via a novel method of matrix decomposition. Journal of Computer and System Sciences, 76(1), pp. 3–20.
- Koyutürk, M. & Grama, A., 2003. PROXIMUS: a framework for analyzing very high dimensional discrete-attributed datasets. In 9th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM Request Permissions, pp. 147–156.
- Lu, H., 2011. Boolean matrix decomposition and extension with applications. Rutgers University.
- Lu, H., Vaidya, J. & Atluri, V., 2008. Optimal Boolean Matrix Decomposition: Application to Role Engineering. In 24th IEEE International Conference on Data Engineering. pp. 297–306.
- Lucchese, C., Orlando, S. & Perego, R., 2010. Mining Top-K Patterns from Binary Datasets in presence of Noise. In 2010 SIAM International Conference on Data Mining. pp. 165–176.
- Miettinen, P. et al., 2008. The Discrete Basis Problem. IEEE Transactions on Knowledge and Data Engineering, 20(10), pp. 1348–1362.
- Peleg, D., 2007. Approximation algorithms for the Label-CoverMAX and Red-Blue Set Cover problems. Journal of Discrete Algorithms, 5(1), pp. 55–64.
- Shen, B.-H., Ji, S. & Ye, J., 2009. Mining Discrete Patterns via Binary Matrix Factorization. In 15th ACM SIGKDD international conference on Knowledge discovery and data mining. New York, New York, USA: ACM Press, pp. 757–765.
- Zhang, Z.-Y. et al., 2010. Binary matrix factorization for analyzing gene expression data. Data Mining and Knowledge Discovery, 20(1), pp. 28–52.
Miscellaneous
- Bělohlávek, R. & Vychodil, V., 2010. Discovery of optimal factors in binary data via a novel method of matrix decomposition. Journal of Computer and System Sciences, 76(1), pp. 3–20.
- Cattell, R.B., 1966. The Scree Test For The Number Of Factors. Multivariate Behavioral Research, 1(2), pp. 245–276.
- Couturier, J.-F. & Kratsch, D., 2012. Bicolored independent sets and bicliques. Information Processing Letters, 112(8-9), pp. 329–334.
- Frank, M., Chehreghani, M.H. & Buhmann, J.M., 2011. The Minimum Transfer Cost Principle for Model-Order Selection. In D. Hutchison et al., eds. 2011 European Conference on Machine Learning and Knowledge Discovery in Databases – Part III. Springer Berlin Heidelberg, pp. 423–438.
- Miettinen, P., 2010. Sparse Boolean Matrix Factorizations. In 10th IEEE International Conference on Data Mining. pp. 935–940.
- Miettinen, P. & Vreeken, J., 2011. Model Order Selection for Boolean Matrix Factorization. In 20th ACM SIGKDD international conference on Knowledge discovery and data mining. pp. 51–59.
- Owen, A.B. & Perry, P.O., 2009. Bi-cross-validation of the SVD and the nonnegative matrix factorization. Annals of Applied Statistics, 3(2), pp. 564–594.
- Yeomans, K.A. & Golder, P.A., 1982. The Guttman–Kaiser criterion as a predictor of the number of common factors. The Statistician, 31(3), pp. 221–229.
Related Topics
- Besson, J. et al., 2006. Constraint-Based Mining of Fault-Tolerant Patterns from Boolean Data. In 4th International Workshop on Knowledge Discovery in Inductive Databases. pp. 55–71.
- Geerts, F., Goethals, B. & Mielikäinen, T., 2004. Tiling databases. In 7th International Conference on Discovery Science. pp. 77–122.
- Gionis, A., Mannila, H. & Seppänen, J.K., 2004. Geometric and Combinatorial Tiles in 0–1 Data. In 8th European Conference on Principles and Practice of Knowledge Discovery in Databases. pp. 173–184.
- Hochbaum, D.S., 1998. Approximating clique and biclique problems. Journal of Algorithms, 29(1), pp. 174–200.
- Kontonasios, K.-N. & De Bie, T., 2010. An information-theoretic approach to finding informative noisy tiles in binary databases. In 2010 SIAM International Conference on Data Mining. pp. 153–164.
- Lempel, A., 1975. Matrix Factorization over GF(2) and Trace-Orthogonal Bases of GF(2n). SIAM Journal on Computing, 4(2), pp. 175–186.
- Lu, H. et al., 2012. Constraint-Aware Role Mining Via Extended Boolean Matrix Decomposition. IEEE Transactions on Dependable and Secure Computing.
- Lu, H. et al., 2009. Extended Boolean Matrix Decomposition. In 9th IEEE International Conference on Data Mining. IEEE, pp. 317–326.
- Miettinen, P., 2012. On Finding Joint Subspace Boolean Matrix Factorizations. In 2012 SIAM International Conference on Data Mining. pp. 954–965.
- Miettinen, P., 2008. The Boolean Column and Column-Row Matrix Decompositions. Data Mining and Knowledge Discovery, 17(1), pp. 39–56.
- Pensa, R.G. & Boulicaut, J.-F., 2005. Towards fault-tolerant formal concept analysis. In 2005 Advances in Artificial Intelligence. pp. 212–223.
- Streich, A. et al., 2009. Multi-assignment clustering for Boolean data. In 26th Annual International Conference on Machine Learning. pp. 969–976.
- Vaidya, J. et al., 2009. Edge-RMP: Minimizing administrative assignments for role-based access control. Journal of Computer Security, 17(2), pp. 211–235.
- Wicker, J., Pfahringer, B. & Kramer, S., 2012. Multi-Label Classification Using Boolean Matrix Decomposition. In 27th Annual ACM Symposium on Applied Computing. New York, New York, USA: ACM Press, pp. 179–186.
- Xiang, Y. et al., 2011. Summarizing transactional databases with overlapped hyperrectangles. Data Mining and Knowledge Discovery, 23(2), pp. 215–251.