Creating and executing an analysis¶

The purpose of “analyses” in a paper version is to automate the generation of variable files, table data files and figures for inclusion into the manuscript. You would write Python functions that generate summary data, table data and perform plotting. In an analysis specification file (spec.yaml) you then declare how the functions should be used to generate the variable/table/figure files.

The Python functions should be located in one or more .py modules in the analysis folder. Alternatively, if you would like to work with the analysis functions interactively, an IPython notebook (.ipynb) can also be used (see Using IPython notebooks).

Creating variable files¶

A data summary function for creating variable files should return a (nested) dictionary of strings, numbers or lists. Numpy arrays will be automatically converted to lists before the dictionary is saved to a YAML file in the variables folder.

Within and across analyses, all YAML variable files are merged prior to the preprocessing step. This means that it is important to make sure that variables names do not clash and are unique.

In the spec.yaml, data summary functions need to be declared in a summary block. Below is an example in which stats and tests are the identifiers of the summaries to be generated, and for each a function in <module>.<function> notation is indicated. Optionally, extra keyword arguments are given as well. Note that a Python module <module>.py needs to exist in the analysis folder. The name of the variable files is formed from the name of the analysis and the summary identifier, i.e. <analysis>.<summary>.yaml.

summary:
  stats: module.stats_function
  tests:
    function: module.tests_function
    args:
      test: wilcoxon

See Insert variable values for more information about how to refer to variables in the document file.

Creating tables¶

Functions for generating rables should return two outputs: a list of column names and a sequence of rows (where each row is a sequence of table cell data). The output of the function will be saved as a comma-separated csv file.

In the spec.yaml file, table generating functions need to be declared in a tables block. Below is an example in which table1 and table2 are the identifiers of the tables to be generated, and for each a function in <module>.<function> notation is indicated. Optionally, extra keyword arguments are given as well. Note that a Python module <module>.py needs to exist in the analysis folder. The name of the csv files is formed from the name of the analysis and the table identifier, i.e. <analysis>.<table>.yaml.

tables:
  table1: module.table_one
  table2:
    function: module.table_two
    args:
      nrows: 3

See Insert tables for more information about how to incoporate tables from csv files in the document file.

Creating figures¶

To create a figure, you first have to define a figure layout that sets the location of the axes for plotting. Individual axes or groups of axes are labeled and organized in a (possibly nested) dictionary. There are three ways in which a layout can be specified in the figures block in the spec.yaml file:

grid layout: a regular grid of axes created by a call to matplotlib’s subplots function. You specify the number of rows and columns, as well as any additional arguments that should be passed to the subplots function. By default, axes are organized in a flat map and labeled ax1, ax2, etc. You may also provide a custom label prefix or a custom list of labels by specifying the label option.

Optionally, axes may be grouped column-wise or row-wise using the group option. By default, the groups and the axes inside a group are labeled col1, col2, … or row1, row2, … (depending on the grouping dimension). A custom label prefix or list of labels can be specified for groups and axes inside groups by the group_label and label options respectively.

If the array option is set to True, then (grouped) axes are organized in an array and not labeled individually. If no grouping is performed, than the label option must be specified as a string to set the label of the axes array. If grouping is performed, then group labels are determined as described before and the label option is ignored.

Here is an example figure layout definition that creates a figure with 2x2 grid of axes and specifies custom labels.
```
figures:
  main:
    layout:
      kind: grid
      nrows: 2
      ncols: 2
      group: columns
      group_label: column
      label: [top, bottom]
      args:
        figsize: [8,4]
```
The resulting hierarchy of labeled axes would be:
```
-column1
  ├─ top
  └─ bottom
-column2
  ├─ top
  └─ bottom
```
svg layout: a layout is created from a svg drawing in which rectangles are tagged as axes. Groups of axes can also be specially tagged to create a hierarchy. PaperBuilder uses the FigureFirst python module. See the FigureFirst docmentation for details on how to tag rectangles and groups in Inkscape. It is strongly recommended that to use a separate layer in Inkscape for the layout elements and put other drawings that need to appear above or below the plots in dedicated overlay/underlay layers. To define a svg layout in the spec.yaml, provide the name of the svg file. Below are three examples (for figures main, suppl1 and suppl2) that show how this can be done:
```
figures:
  main: main_layout.svg
  suppl1:
    layout: suppl1_layout.svg
    style:
     - seaborn-white
     - lines.linewidth: 5
       font.size: 9
  suppl2:
    layout:
      kind: svg
      file: suppl2_layout.svg
      output_layer: output
      hide_layer: layout
```
(Note that the style block for the suppl1 figure will be explained later). The suppl2 example shows two extra options that can be set. output_layer sets the name of the layer in the svg file in which the axes are drawn (default: output). hide_layer sets the name(s) of layer(s) that need to be hidden, which is usually the layer that contains the layout (default: layout).
custom layout: a figure and axes layout is created by a custom Python function. The function should return two values: the matplotlib figure object and a (nested) dictionary of axes. In the spec.yaml file, the function and (optionally) extra arguments for the function can be specified in the following ways:
```
figures:
  main: module.main_layout
  suppl1:
    layout: module.suppl1_layout
    style:
     - seaborn-white
     - lines.linewidth: 5
       font.size: 9
  suppl2:
    layout:
      kind: function
      function: module.suppl2_layout
      args:
        n: 1
```
Note that the custom layout function could also perform all the necessary plotting to fully create the figure, without making use of the plotting functions (see below). The downside of this approach is the strong coupling between layout and content generation, which does not allow the flexible reuse of (parameterized) plots across layouts.

Creating plot content¶

To plot the data in a figure, plotting functions need to be mapped to the labeled axes or groups of axes that were defined in the figure layout. The first argument to the plotting function will be the destination axes, an array of axes or a dictionary representing a group of axes. To define which plotting function should be used for which (group of) axes, one could put the following in the spec.yaml:

plots:
  col1.top: module.plot1
  col2:
    function: module.plot2
    args:
      npoints: 100
    style:
      - default
      - lines.linewidth: 1

The plots section in the spec.yaml file is a map between a (nested) label in the figure layout and a Python function in a local <module>.py with optional extra arguments. Deeper levels of the axes dictionary in the figure layout can be indicated using dot-notation. Given the grid layout example presented previously, the example above will map the plot1 function to the top axes in the first column and the plot2 function to the axes in column 2 (i.e. a dictionary with row1 and row2 axes). Note that a plotting style is also defined for the col2 entry, this will be explained further below.

Configuring plotting options¶

Matplotlib’s plotting functions accept arguments to set (line) style, line color, etc. each time you call them. However, if you need to consistently apply the same plotting style across figures, it is more convenient to use matplotlib’s style system rather than hard-coding the style in calls to plotting functions. PaperBuilder provides a mechanism to set the plotting style through plot_options.yaml files. Similar to the configuration system (Configuration options), plotting options can be set at the user, project and paper version level by the corresponding plot_options.yaml file. In addition, plot options can also be specified at the figure level (see Creating figures) and plot level (see Creating plot content).

PaperBuilder will call the plotting function as defined in the spec.yaml file in the context of the desired default plotting style. In the spec.yaml file, this plotting style is defined in the style block, for example:

style:
  - seaborn-white
  - lines.linewidth: 5
    font.size: 14

The content of the style block should a valid argument for the matplotlib.style.use function, i.e. a string specifying the name of a style, a dictionary of rc parameters or a list of these. For the library of names styles that ship with matplotlib, see the style sheets reference. Note that default plotting styles are combined across all levels, with the order that they are applied from general (user level) to specific (figure and plot level).

In some cases, it would be useful to temporarily set a custom plot style other than the default and other than the ones that ship with matplotlib. For example, you may have a default plot style set for drawing the data, but you would like to use a different plot style for annotations (e.g. thinner lines). To do this, you can create a custom style sheet under a given name in the style-library section of a plot_options.yaml file. For example:

style_library:
  annotation:
    lines.linewidth: 1

Within a plotting function, you can now do:

with matplotlib.style.context('annotation'):
    ax.plot([0,1], [0,1])

Often, you find yourself using the same colors across plots because they represent the same experimental group in the data. Piggy-backing on matplotlib’s internal map of named colors, you can create custom named colors in a plot_options.yaml file. For example:

colors:
  reward-low: royalblue
  reward-high: crimson
  annotation: black
  order-first: mediumseagreen
  order-last: seagreen
  delay: cadetblue
  ontime: orange

You could either map the custom color names to existing named colors (as is done above), or use of the accpted color formats (see https://matplotlib.org/api/colors_api.html).

With the above custom colors defined, you can now do the following in a plotting function:

# plot data for low reward group
ax.plot([0,1], [0,1], color='reward-low')

Executing an analysis¶

(to be completed)

Using IPython notebooks¶

PaperBuilder supports IPython notebooks in addition to standard python modules, if you prefer to do build and test the analysis functions interactively. Thus, when specifying any function in the spec.yaml file, the module part could refer to a ipynb file in the analysis folder.

However, there are a few caveats:

When a notebook is imported, only import statements, constants (i.e. capitalized module-level variables) and function definitions are imported. No other code in cells is executed. This means that the summary / table / plot functions should work independent of the remaining code.
Currently, plotting styles, custom named styles and named colors are not automatically set when you work interactively in the notebook. Future work may add this functionality.

Code organization¶