Welcome to DASF’s documentation!

DASF Logo

The Data Analytics Software Framework DASF supports scientists to conduct data analysis in distributed IT infrastructures by sharing data analysis tools and data. For this purpose, DASF defines a remote procedure call (RPC) messaging protocol that uses a central message broker instance. Scientists can augment their tools and data with this protocol to share them with others or re-use them in different contexts.

Language support

The DASF RPC messaging protocol is based on JSON and uses Websockets for the underlying data exchange. Therefore the DASF RPC messaging protocol in general is language agnostic, so all languages with Websocket support can be utilized. As a start DASF provides two ready-to-use language bindings for the messaging protocol, one for Python and one for the Typescript programming language.

Needed infrastructure

DASF relies on Apache Pulsar as its underlying message broker. Apache Pulsar can be setup in various ways, e.g. locally, Docker or in a cluster. Please refer to the corresponding documentation for your own setup. We tested DASF with Version 2.7.*, but in general all later versions should also be supported.

Docker Image

The Apache Pulsar Docker image can be found here

You can start a standalone instance of pulsar with: docker run -d --name pulsar -p 80:80 -p 8080:8080 -p 6650:6650 apachepulsar/pulsar-standalone:2.7.4

WebSocket Service

Since the DASF RPC messaging protocol uses Websockets, Pulsars Websocket-Service has to be enabled. In case you setup the standalone variant, this should already be the case.

On how to enable the WebSocket-Service for the other variants please consult the corresponding documentation.

Open Source and Open Science

License

All DASF modules are released under the Apache-2.0 license.

Repository

The individual DASF modules are developed via the following git group. Feel free to checkout the source code or leave comment via the service desk or directly via the issue tracker.

Citation

In case you used DASF in your own work, please cite it using the following doi reference.

Citation DOI

Eggert, Daniel; Dransch, Doris (2021): DASF: A data analytics software framework for distributed environments. GFZ Data Services. https://doi.org/10.5880/GFZ.1.4.2021.004

Acknowledgment

DASF is developed at the GFZ German Research Centre for Geosciences (https://www.gfz-potsdam.de) and was funded by the Initiative and Networking Fund of the Helmholtz Association through the Digital Earth project (https://www.digitalearth-hgf.de/).