Guide Development

Workflow development Guide #

Workflows are all things that can be computed, broadly speaking.

For reproducibility, we want our workflows to be repeatable: producing the same output every time they are computed. This is easy enough to do in first approximation, but might be harder to achieve than it seems when the workflow relies on external resources. But we track every execution so it is not necessary to be overly concerned about these delicate details at every moment.

Generally, we want the workflows to be parametrized. Non-parametrized but strickly repeatable notebooks are less reusable since they always produce the same output data.

One way to create them in ODA is to build jupyter notebook.

Simple ODA Jupyter Notebook to a workflow #

Write a working repeatable workflow #

We recommend to start by creating your project as a Renku repository with this link, you probably want to use the “astronomy/mmoda” namespace.

First you need to make sure your notebook runs in a cloud environment. It needs to be repeatable - i.e. you can run it many times. If it depends on external services - try to make sure the requests are also repeatable - you might need to specify sufficient details. If the notebook does not produce the exactly the same result every time - it’s unfortunate, but do not worry too much, it might still be reproducible (see motivation on the difference between reproducibility and repeatability)

  • write your notebook, and make sure it runs from top to bottom
  • make a requirements.txt will the modules you need for this notebook

You can use a mock lightcurve notebook as an example.

Parametetrize the notebook #

Create a cell with the following tag “parameters” (see papermill manual):

  • the names of the declared variables will be used as parameter names in the MMODA service (except the default parameters, see below)
  • if not annotated, the types of the inputs parameters are determined based on the parameter default value
  • one can annotate the input parameter by putting comment with the term from the ontology.

Default parameters #

Several default common parameters are always set by the MMODA frontend. These include:

Type annotation Parameter default name
http://odahub.io/ontology#PointOfInterestRA RA
http://odahub.io/ontology#PointOfInterestDEC DEC
http://odahub.io/ontology#StartTime T1
http://odahub.io/ontology#EndTime T2
http://odahub.io/ontology#AstrophysicalObject src_name

If notebook contains parameters anotated with these types, their names will be automatically converted by the dispatcher plugin to the default ones. If some of them are ommited, they will be added to the list of workflow parameters automatically.

Note

Note that both target (Point of Interest) source name and target source coordinates are passed to the workflow, and in principle there is no guarantee the coordinates are that of the source. Indeed the exact choice of the coordinates for a given source depends on the energy band, desired precision, etc. For now, we leave is up to the workflow developer to reconcile these parameters.

Annotate the notebook outputs #

  • define the notebook output, similarly creating cell with tag “outputs”.
    • outputs may be strings, floats, lists
    • outputs may be also strings which contain filenames for valid files. If they do, the whole file will be considered output.
  • if you want to give more detailed description of the notebook input and output, use terms from the ontology.

Publish your workflow as a test service #

  • once some bots do their job, the workflow will be automatically installed in MMODA (by default, on a staging instance), and you will recieve an email!

Try to access your new service #

  • Assuming lightcurve-example from above was used, and the notebook name was random, you can run this:
$ oda-api -u https://dispatcher-staging.odahub.io get -i lightcurve-example -p random -a n_bins=5

TODO: workflow version, plot here and in renku create

(optional) Try a test service #

  • install nb2workflow tooling pip install 'nb2workflow[cwl,service,rdf,mmoda]>=1.3.30' --upgrade. Note that his command should be the only one you need to install the necessary dependencies for the workflow engine. You may of course also need some domain-specific packages .
  • inspect the notebook nbinspect my-notebook.ipynb
  • try to run the notebook nbrun my-notebook.ipynb
    • it will use all default parameters
    • you can specify parameters as nbrun --inp-nbins=10 my-notebook.ipynb, if nbins happens to be one of the parameters. try to start the service nb2service my-notebook.ipynb

Note

if you experience issues testing the service due to some “import error” or other strange messages try containerized service (note that it will not work in Renku):

  • nb2deploy $PWD test --local
  • then, look onto http://0.0.0.0:8000 for some metadata about the service
  • try to run some simple queries in http://0.0.0.0:8000/apidocs/

Note

If you still experience issues with local environment, try to develop the workflow directly in renkulab - note that some commands, like nb2deploy, will not work in this case.

Developing service in Renku #

https://renkulab.io/

TODO: explain how to run server

(optional) Add some verification test cases #

To make sure your service does not break with future updates, it’s useful to express some assumptions about the service outputs in some reference cases. They will be tested automatically every time new workflow version is installed.

we will explain later how to do this.