dld logo

Dockerizing Linked Data

How to Use the Bootstrap Script

You can configure your environment using a YAML (.yml) config file, which will be read by the dld.py bootstrap script. Just create a file named dld.yml and the script will look out for this file. The script behaviour can also be adjusted with some command line switches:

-c or --config
specifies to load the configuraion YAML from another file path
-f or --file
specifies on-the-fly a local dump file to be imported (equivalent to a datasets entry with a file key)
-l or --location
specifies on-the-fly a dump file on the web to be downloaded and imported (equivalent to a datasets entry with a file key)
-u or --uri
specifies on-the-fly the default graph name ((equivalent to the default_graph key under settings)
-w or --working-directory
specifies where to place the Docker Compose configuration and dataset dumps copies/download for import

dld.py will process the configuraion YAML, taking care of the following steps for you:

  • collecting (by copy or download) specified RDF dump files into a single location provided to the load component
  • generate a Docker Compose configuration file defining containers as specified establishing appropirates links, (shared) volumes and exposed ports to the host system to enable you to interact with front-end components in your browser

The provided containers of the resulting orchestration will coordinate themselves so that load components will perform bulk-imports to the store component. present components will know how to use the SPARQL-endpoint exposed by the store to allow for navigation (and possibly authoring).

Understanding the Configuration File

The main part of the scripts work is to translate the dld.yml file to a Docker Compose file. The script also downloads or copies your data into a given or default working directory and informs docker compose about the location of the provisioned data.

With the following list you can understand all keywords usable within the dld configuration file:

datasets
Defines the set of datasources to be preloaded and used.
graph_name
Specifies the base URI for the data set. Can only be used inside a datasets statement.
location
Points to a dump file on the web. Can only be used inside a datasets statement.
location_list
Points to a local file listing dump files on the web to download and import. (One URL per line, empty lines allowed.) Can only be used inside a datasets statement.
file
Points to a local file dump. Can only be used inside a datasets statement.
file_list
Points to a local file listing dump files to import. (One file path per line, empty lines allowed.) Can only be used inside a datasets statement.
settings
Defines the set of settings.
default_graph
Sets the default graph for the triple store. Can only be used inside a settings statement.
components
Defines the different parts of your linked data environment as key-value mappings. Each value is either just an image name or a key-value mapping for further container configuration.
store
Sets the triple store container you want to use. Can only be used inside a components statement.
Example: aksw/dld-store-virtuoso7
load
Sets a component to perform bulk import into the store. Can only be used inside a components statement.
present
Collection of presentation components for your linked datasets. Either just an image name or a key-value mapping for further container configuration. Can only be used inside a components statement.
Example: aksw/dld-present-ontowiki

The keys under components and present (e.g. store, load, ontowiki) will be part of the created docker containers. The corresponding values can be either just image names or key-value mappings themselves. These sub-ordinated key-value pairs are copied unaltered as docker-compose settings for the created containers (see the docker-compose YAML reference).

Example

The following shows a simple example of a `dld.yml` file. Keys marked in yellow are introduced and processed by the dld bootstrap script and keys marked in blue are docker-compose keys.

Simple dld.yml file


datasets:
    site: #identifier for dataset
        graph_name: "http://example.org/"
        file: "hello.ttl"

components:
    store:
        image: aksw/dld-store-virtuoso7
        ports: ["8895:8890"]
        environment:
            PWDDBA: "dba"
    load:
        image: aksw/dld-load-virtuoso

    settings:
        default_graph: "http://example.org/"

Output for Docker Compose


load:
  environment:
    DEFAULT_GRAPH: 'http://example.org/'
  image: aksw/dld-load-virtuoso
  links:
    - store
  volumes:
    - '/usr/src/app/examples/simple/wd-dld/models:/import'
  volumes_from:
    - store

store:
  environment:
    DEFAULT_GRAPH: 'http://example.org/'
    PWDDBA: dba
  image: aksw/dld-store-virtuoso7
  ports:
    - '8895:8890'

And now a more complicated example. Again, the yellow marked keys are processed by the dld bootstrap script and keys marked in blue are docker-compose keys.

Complicated dld.yml file


  datasets:
      site:
          graph_name: "http://pfarrerbuch.comiles.eu/"
          file: "pfarrerbuch.ttl"
      sachsen:
          graph_name: "http://pfarrerbuch.comiles.eu/sachsen/"
          file: "sachsen.ttl"
      hp-vocab:
          graph_name: "http://purl.org/voc/hp/"
          file: "hp-vocab.ttl"

  components:
      store:
          image: aksw/dld-store-virtuoso7
          volume: /tmp/volumes/virtuoso/
          environment:
              PWDDBA: "aslkdcoiqwecpqow"
      load:
          image: aksw/dld-load-virtuoso
      backup:
          image: aksw/dld-backup-virtuoso
          target:
              git: git@git.aksw.org:/pfarrerbuch.models
              sshkey: "id_rsa"
      present:
          ontowiki:
              image: aksw/dld-present-pfarrerbuch
              ports: '8080:80'

          settings:
              default_graph: "http://pfarrerbuch.comiles.eu/"

Output for Docker Compose


  load:
    environment:
        DEFAULT_GRAPH: 'http://pfarrerbuch.comiles.eu/'
    image: aksw/dld-load-virtuoso
    links:
        - store
    volumes:
        - '/usr/src/app/examples/pfarrerbuch/wd-dld/models:/import'
    volumes_from:
        - store

  presentontowiki:
    environment:
        DEFAULT_GRAPH: 'http://pfarrerbuch.comiles.eu/'
    image:
        - aksw/dld-present-pfarrerbuch
    links:
        - store
    ports:
        - '8080:80'

  store:
    environment:
        DEFAULT_GRAPH: 'http://pfarrerbuch.comiles.eu/'
        PWDDBA: aslkdcoiqwecpqow
    image: aksw/dld-store-virtuoso7
    volume: /tmp/volumes/virtuoso/

And something else...

You can dive a little deeper into the purpose of the Dockerizing project by reading the final presentation from the two graduated students who helped us during the semester.