См. там Synthesising and Verifying Multi-Core Parallelism in Categories of Nested Code Graphs.

На reddit дали альтернативное название этой статье: How to write code which is 4 times faster than hand-optimised C with assembly fragments in it by using a Haskell domain specific language.

Вот так.