Defining SPARQL with Boolean Tensor Operations
The Resource Description Framework (RDF) represents information in the form of subject–predicate–object triples. We can represent such RDF data using 3-way binary tensors: An element \((i,j,k)\) of \(\tens{T}\) is \(1\) if and only if the respective subject-predicate-object triple \((s_i,p_j,o_k)\) is present in the RDF graph \(G\).
An example for a simple SPARQL query that has a single triple pattern as basic graph pattern is SELECT * WHERE {?a
\(G\):p
\(_j\) \(G\):o
\(_k\)}
. The keyword SELECT
acts as a projection operator. It identifies the variables to appear in the query result.
It matches all RDF triples of \(G\) that have predicate p
\(_j\) and object o
\(_k\). If the RDF data is represented as a binary 3-way tensor \(\tens{T}\), the triple pattern selects the fiber \(t_{:jk}\). That is a vector of all subjects with predicate \(j\) and object \(k\). This vector has a \(1\) at positions \(i\) that correspond to an RDF triple \((s_i,p_j,o_k)\) present in the RDF graph \(G\). A slice \(T_{:j:}\) of \(\tens{T}\) would be selected if only one mode was fixed by the query as in SELECT * WHERE {?a
\(G\):p
\(_j\) ?b}
.
It turns out that most SPARQL operations, especially the joins, can be expressed using the Khatri–Rao product [1]. Thus, insights on the computation of the Khatri–Rao can lead to direct real-world benefits. This re-interpretation also allows us to use techniques more familiar in relational data bases natively with RDF data.
References
-
Join Size Estimation on Boolean Tensors of RDF Data.
WWW 2015 Companion Volume,
,
77–78.
10.1145/2740908.2742738
[manuscript | tech. rep. | pdf (WWW) | pdf (ACM)] -
On Defining SPARQL with Boolean Tensor Algebra.
arXiv:1503.00301 [cs.DB]
[pdf (arXiv)]