Triplify update vocabulary
When RDB data is published on the Web e.g. as Linked Data it is important to keep track of DB (and hence RDF) updates so crawlers know what has changed (after the last crawl) and should be re-retrieved from that endpoint.
To have a centralized registry (such as e.g. implemented by PingTheSemanticWeb service) does not seem to be feasible when Linked Data becomes more popular – think of millions of Linked Data endpoints pinging such a registry each time a small change occurred.
The approach: Linked Data Update Logs
Each Linked Data endpoint provides information about updates performed in a certain timespan as a special/standardized Linked Data source.
Let's assume the Example.com company provides a Linked Data endpoint with information about their products, employees etc. The endpoint is reachable via http://example.com/lod/.
The LOD endpoint contains a special LOD space below http://example.com/lod/updates which contains information about updates.
http://example.com/lod/update for example will return the following RDF:
http://example.com/lod/update/2007 rdf:type update:UpdateCollection .
http://example.com/lod/update/2008 rdf:type update:UpdateCollection .
http://example.com/lod/update/2008 could then return the following RDF:
http://example.com/lod/update/2008/Jan rdf:type update:UpdateCollection .
http://example.com/lod/update/2008/Feb rdf:type update:UpdateCollection .
This nesting could continue until we finally reach an URL, which exposes all updates performed in a certain second in time. For very frequently updated LOD endpoints (e.g. Wikipedia) this interval of one second will be sufficiently small enough, so the related update information can be still easily retrieved. For rarely updated LOD endpoints (e.g. a personal Weblog) links should only point to non-empty Update Collections in order to prevent crawlers from performing unnecessary HTTP requests.
http://example.com/lod/update/2008/Jan/01/17/58/06 then would for example contain RDF links (and additional metadata) to the Linked Data documents updated on Jan 1st, 2008 at 17:58:06, e.g. following triples:
http://example.com/lod/update/2008/Jan/01/17/58/06/user123 update:updatedResource http://example.com/lod/users/JohnDoe .
http://example.com/lod/update/2008/Jan/01/17/58/06/user123 update:updatedAt "20080101T17:58:06"^<xsd:dateTime> .
http://example.com/lod/update/2008/Jan/01/17/58/06/user123 update:updatedBy http://example.com/lod/users/JohnDoe .
Individual updates are identified by a sequential identifier (i.e. user123 in the example). Arbitrary meta data can be attached to these updates, such as the time of the update (probably redundant since that can be inferred from the URL) or a certain person who performed the update.
Triplify automatically generates all the resources in the update URI space, when the Triplify configuration $triplify['queries'] contains a query named update. This query has to return at least two columns. The first column contains the date when to update occurred, the second column contains the id of the updated resource. An example is given below:
SELECT p.changed AS id,p.id AS 'update:updatedResource->project' FROM project p
The workings of the Triplify Linked Data Update Logs can be observed with Triplify's own datasource registry:
A collection of updates.
An atomic update performed on an RDF resource.
Represents the deletion of an RDF resource.
Points to the resource which was updated.
Refers to the date and time when a certain update occurred.
Points to the user description who performed the update.
Last Modification: 2008-08-01 13:48:21 by Elias Theodorou