Execution engine - Handle map and reduce #30

anibalsolon · 2020-01-16T01:09:41Z

To support map/reduce operations, I propose two operators: @ for mapping, and % for reducing.

At each @ usage, it would spawn a dimension for the subsequent workflow (assuming a flat workflow is zero-dimensional). The result of using this operator is equivalent to the outer product of the dimensions, each entry being a job parametrization to execute.

job_b.field @= job_a.list_field

it reads as "map each entry from job_a.list_field to job_b.field

The % operator collapses dimensions, transforming their entries back into lists in the original ordering. It is a binary operation, in which the first argument is the entries to transform into a list, and the second argument is the dimensions to collapse.

job_c.field %= job_c.out_field @ job_a.list_field

it reads as "reduce the entries from job_c.out_field over the dimension job_a.list_field to job_c.field

A full example:

rp = ResourcePool()

pieces = lambda path: {'pieces': path.split('/'), 'indexes': range(path.count('/'))}
uppercase = lambda text: {'text': text.upper()}
indexed_uppercase = lambda text, index: {'text': `{index}-{text.upper()}`}
join = lambda pieces: {'text': '/'.join(pieces)}

job_pieces = PythonJob(function=pieces, reference='pieces_job')
job_pieces['path'] = Resource('usr/lib/libgimp.so')

job_uppercase = PythonJob(function=uppercase, reference='uppercase_job')
job_uppercase['text'] @= job_pieces.pieces

job_join = PythonJob(function=join, reference='join_job')
job_join['pieces'] %= job_uppercase.text @ job_pieces.pieces

rp[R('text')] = job_join.text

rp = DependencySolver(rp).execute(executor=Execution())

As to easy the mnemonics, one can name its dimensions, so when using the reduce operator, one can simply use the dimension name instead of having to refer to the original job:

job_uppercase['text'] @= job_pieces.pieces, 'path_pieces'

job_join['pieces'] %= job_uppercase.text @ 'path_pieces'

There are situations in which one might want to link fields in the same dimension. By providing a tuple of several fields, it will be mapped to the list of fields in the selector:

job_uppercase = PythonJob(function=indexed_uppercase, reference='uppercase_job')
job_uppercase[['text', 'index']] @= (job_pieces.pieces, job_pieces.indexes), 'path_pieces'

# All these reducing operators execute the same operation
job_join['pieces'] %= job_uppercase.text @ 'path_pieces'
job_join['pieces'] %= job_uppercase.text @ (job_pieces.pieces, job_pieces.indexes)
job_join['pieces'] %= job_uppercase.text @ (job_pieces.pieces)

The text was updated successfully, but these errors were encountered:

anibalsolon · 2020-03-19T22:11:52Z

@ccraddock @puorc Please, whenever you have time, take a look and let's discuss! And let me know if you need more details.

puorc · 2020-03-20T18:56:21Z

Hi Anibal, the map operator is nice! But I feel a little confused on 'reduce the entries from job_c.out_field over the dimension job_a.list_field to job_c.field'. Could you elaborate a bit more on ' dimension'? Does it represent one attribute of the return dict?

anibalsolon · 2020-03-29T20:30:40Z

Yes, so every time we map a list of N items to N nodes, we create a dimension for a node: from a dot (a simple workflow, in a nildimensional space), we go to a line with N points.

Mapping a list:

Mapping two lists, so the input of each node is the combination of each pair of items in both lists:

And, as we reduce each of the dimensions, we create lists of the results of the nodes. It is the reducing operation, but instead of summarizing the results (like in an average), we are creating lists.

If we reduced/squashed both dimensions at the same time, and inputted it to a node, the value would be a list of lists (i.e. a matrix) of the results.

puorc · 2020-04-02T15:45:45Z

Thank you for the illustration. It's nice to work on.

anibalsolon self-assigned this Jan 16, 2020

anibalsolon assigned ccraddock and puorc Mar 19, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Execution engine - Handle map and reduce #30

Execution engine - Handle map and reduce #30

anibalsolon commented Jan 16, 2020 •

edited

Loading

anibalsolon commented Mar 19, 2020 •

edited

Loading

puorc commented Mar 20, 2020

anibalsolon commented Mar 29, 2020

puorc commented Apr 2, 2020

Execution engine - Handle map and reduce #30

Execution engine - Handle map and reduce #30

Comments

anibalsolon commented Jan 16, 2020 • edited Loading

anibalsolon commented Mar 19, 2020 • edited Loading

puorc commented Mar 20, 2020

anibalsolon commented Mar 29, 2020

puorc commented Apr 2, 2020

anibalsolon commented Jan 16, 2020 •

edited

Loading

anibalsolon commented Mar 19, 2020 •

edited

Loading