Boolean Tensor Factorization

Defining SPARQL with Boolean Tensor Operations

The Resource Description Framework (RDF) represents information in the form of subject–predicate–object triples. We can represent such RDF data using 3-way binary tensors: An element \((i,j,k)\) of \(\tens{T}\) is \(1\) if and only if the respective subject-predicate-object triple \((s_i,p_j,o_k)\) is present in the RDF graph \(G\).

An example for a simple SPARQL query that has a single triple pattern as basic graph pattern is SELECT * WHERE {?a \(G\):p\(_j\) \(G\):o\(_k\)}. The keyword SELECT acts as a projection operator. It identifies the variables to appear in the query result. It matches all RDF triples of \(G\) that have predicate p\(_j\) and object o\(_k\). If the RDF data is represented as a binary 3-way tensor \(\tens{T}\), the triple pattern selects the fiber \(t_{:jk}\). That is a vector of all subjects with predicate \(j\) and object \(k\). This vector has a \(1\) at positions \(i\) that correspond to an RDF triple \((s_i,p_j,o_k)\) present in the RDF graph \(G\). A slice \(T_{:j:}\) of \(\tens{T}\) would be selected if only one mode was fixed by the query as in SELECT * WHERE {?a \(G\):p\(_j\) ?b}.

It turns out that most SPARQL operations, especially the joins, can be expressed using the Khatri–Rao product [1]. Thus, insights on the computation of the Khatri–Rao can lead to direct real-world benefits. This re-interpretation also allows us to use techniques more familiar in relational data bases natively with RDF data.

References

  1. Saskia Metzler Pauli Miettinen Join Size Estimation on Boolean Tensors of RDF Data. WWW 2015 Companion Volume, , 7778.
    10.1145/2740908.2742738
    [manuscript | tech. rep. | pdf (WWW) | pdf (ACM)]
  2. Saskia Metzler Pauli Miettinen On Defining SPARQL with Boolean Tensor Algebra. arXiv:1503.00301 [cs.DB]
    [pdf (arXiv)]