Welcome to DASF’s documentation!
The Data Analytics Software Framework DASF supports scientists to conduct data analysis in distributed IT infrastructures by sharing data analysis tools and data. For this purpose, DASF defines a remote procedure call (RPC) messaging protocol that uses a central message broker instance. Scientists can augment their tools and data with this protocol to share them with others or re-use them in different contexts.
Language support
The DASF RPC messaging protocol is based on JSON and uses Websockets for the underlying data exchange. Therefore the DASF RPC messaging protocol in general is language agnostic, so all languages with Websocket support can be utilized. As a start DASF provides two ready-to-use language bindings for the messaging protocol, one for Python and one for the Typescript programming language.
Needed infrastructure
DASF relies on Apache Pulsar as its underlying message broker. Apache Pulsar can be setup in various ways, e.g. locally, Docker or in a cluster. Please refer to the corresponding documentation for your own setup. We tested DASF with Version 2.7.*, but in general all later versions should also be supported.
Docker Image
The Apache Pulsar Docker image can be found here
You can start a standalone instance of pulsar with:
docker run -d --name pulsar -p 80:80 -p 8080:8080 -p 6650:6650 apachepulsar/pulsar-standalone:2.7.4
WebSocket Service
Since the DASF RPC messaging protocol uses Websockets, Pulsars Websocket-Service has to be enabled. In case you setup the standalone variant, this should already be the case.
On how to enable the WebSocket-Service for the other variants please consult the corresponding documentation.
Open Source and Open Science
License
All DASF modules are released under the Apache-2.0
license.
Repository
The individual DASF modules are developed via the following git group. Feel free to checkout the source code or leave comment via the service desk or directly via the issue tracker.
Gitlab Repository URL
Citation
In case you used DASF in your own work, please cite it using the following doi reference.
Citation DOI
Eggert, Daniel; Dransch, Doris (2021): DASF: A data analytics software framework for distributed environments. GFZ Data Services. https://doi.org/10.5880/GFZ.1.4.2021.004
Acknowledgment
DASF is developed at the GFZ German Research Centre for Geosciences (https://www.gfz-potsdam.de) and was funded by the Initiative and Networking Fund of the Helmholtz Association through the Digital Earth project (https://www.digitalearth-hgf.de/).