1 Introduction to Stressflow

1 Introduction

Stress-flow is a new method of programming that seeks to provide benefits of both data-flow and control-flow methods of programming in one cohesive, powerful, object-oriented scheme. Stress-flow makes synchronization a natural part of every programming object or sub-routine.

The method results in both new software and compiler methodology as well as computer architecture view. Although it can be utilized on almost any existing architecture (single and multi-processor), gearing computer architecture toward stress-flow method leads to unprecedented capabilities and flexibilities of computer systems.

The goal of the method was being able to meet the high level goals naturally, directly, without any centralized supervisors, schedulers, or complex translation layers between programmer provided model/program and execution. To be able to do that, the process of designing the programming model was reversed. Since only nearly directly implementable model was acceptable, the experiments with implementations came first, eventually leading to the working stressflow implementation and, at the end, the formal model.

For these reasons, stressflow is quintessentially an implementation method at least as much as it is a programming model. This has somewhat complicated the detailed specification of stressflow, which mixes together description of the programming method with ways to compile and implement it on various multi-processor hardware. The purpose of this short descriptio is to present the basic stressflow programming model only, without the detailed discussion of how it is accomplished in compilers and hardware. The important thing to remember, however, is that ease of implementation, as described in its entirety in the detailed specification, is the key factor here, without which the proposed method could never begin to hope to become a practical programming tool.

1.1 Basic Layout of Stressflow Program

At the core, any stressflow program is a collection of stressflow atoms. Each stressflow atom is a sequence of instructions divided into “stressed” front portion of instructions and “relaxed” bottom portion of instructions. Each call to any stressflow atom generates a separately executed mini-thread. The caller of stressflow atom is prevented from doing anything until the previous call to the same stressflow atom exits the stressed section. Later, it is prevented from changing its state (generally executing beyond the call or group of them) until the called atom “relaxes” – i.e. finishes executing the stressed section and enters the relaxed section. Therefore, only one task corresponding to the same stressflow atom of the same object can be executing within the stressed section in a given moment, while many such tasks can be executing simultaneously within the relaxed section.

When stressflow atoms are fairly large, with one large stressflow atom calling a few smaller ones, this essentially replicates a common scheme of main thread with worker threads, where the smaller atoms stressed sections serve the goal of synchronization mechanisms between them and the worker thread. When a stressflow program consists of a large number of stressflow atoms, each of them containing few instructions and no loops, this, in essence, replicates behavior of a dataflow machine. The important difference, however, is that the stressflow atom calls now representing the data connections are explicit – meaning the caller indicates whom he is “stressing”/connecting to. This is far more natural than expressing dependency on set of data being produces.

The key advantage of stressflow is that it can therefore easily represent any large control-flow model threads cooperating together, dataflow model/anything resembling multi-path graph, and any hybrid of them in between. It also allows taking any existing code, include parts of stressflow code in it or gradually convert all of single thread code into completely parallel code capable of taking advantage of many processors.

A stressflow atom can call other stressflow atoms from within its stressed or relaxed sections. Portion of stressflow program that includes a few stressflow atoms calling themselves from within their stressed sections allows strongly tied together operations performed by different objects on the same data or machine state. This is how any complex synchronization mechanism can easily be described in easy to see structural fashion. On the other hand, work placed within relaxed section of a stressflow atom being called allows the caller to do other things, including more calls to the same stressflow atom. Since each call results in separately running mini-thread, this allows the caller to distribute parts of the same work between many mini-threads running the same code.

Therefore, when a stressflow atom calls another from a loop, this results in creating another instance of called stressflow atom almost as soon as the previous instance finishes its stressed section sequence – as shown on diagram below. This allows, among many other things, to partition work to be done on large collections of data into many parallel running routines, each of them working on the entire collection or portion of that collection.

The key advantage of stressflow is simplicity of optimization of parallelism. It simply involves moving as much work as possible to the relaxed section and leaving only what is necessary for synchronization in the stressed section. This simple rule intuitively leads to other necessary changes that need to be done in order to allow full parallelism possible.

1.2 Object-Oriented Aspect of Stressflow

An important part of the stressflow programming method is how it fits together with all previous object-oriented ideas. The word “previous” needs to be stressed here because from the standpoint of true modularity of code, inherent, built-in support for parallelism should have been at least as important as, say, inheritance. Parallelism in stressflow becomes part of frontal/visible object design structure thanks to the stressed/relaxed layout of stressflow atoms and the way a stressflow atom derives behavior of its stressed section from the method in which the stressflow has been defined.

Control over the stressflow section is derived from the scope in which a particular stressflow atom has been defined. A globally defined and static stressflow atoms will have global or static access means to the stressed section. A stressflow atom defined as member of an object will have these means replicated with the different instances of objects. Among many other things, this allows to use inheritance methods to control access to resources.

For example, the code shown above called the same stress flow atom in both the terms of routine and object being called. If the code was modified in way where the calls made from inside the loop were to the same routine but in different data instances of the same object, the stressed sections would not be constrained to the same access means – as everything together with access mechanisms would be replicated. However, this process could be reversed by a stressed section call to non-localized stress-flow atom. This call could be direct or part of base routine or constructor from which the stressflow object accessed is being derived.

1.3 Splitting and Joining Stressflow Paths

Being able to partition certain kinds of complex work into two or more parallel running paths/tasks and then being able to collect the results back is easy to understand and natural. Many software packages and programming models have sought to do just that – make software design similar to connecting, for example, electronic components. Examples here include LabView™, SoftWire™, Verilog, and many others. The objective of stressflow, related to this issue, was not to provide such “wiring” model as basis of all programming, but as means of interconnection between sequential (control-flow) routines – large or small – when it makes sense in terms of problem being represented and as one outcome of more universal programming scheme. Second, already mentioned, equally important goal here, was being able to do it without any centralized schedulers or flattening into single sequence of instructions of all interrelated model parts entered by programmer. This was the key weakness of the previous “software wiring” methods, which made them unsuitable as practical tools of multi-processor or even professional, high-performance programming.

The key implementation rule of stressflow is this – if atom A communicates with atoms B and C, then all that is necessary to assure synchronization and data cohesion between A, B, and C is direct connection between A, B, and C. No execution layers or threads that queue or sequence connections between A, B, and C together with any other connections, and no “middlemen” in between. Being able to accomplish this, stressflow becomes superior internal implementation method for any “software wiring” method.

From strictly abstract model view, splitting execution into two or more paths simply involves calling two or more stressflow atoms as part of the same operation or expression. Merging two or more paths, on the other hand, simply involves defining a routine that has one body, but two or more entry points, each of them possibly supplying different parameters. In script language, this can be done by defining/allowing a procedure with two headers.

These schemes, by themselves, are nothing earth-shaking, they were always possible by some direct or less direct means. However, when implemented without the stressed/relaxed functionality, such schemes either made required parallelism impossible, or needed a lot of extra code using rudimentary synchronization tools to assure data cohesiveness. This is why a procedure with two headers isn’t exactly a well known, commonly used thing.

In diagram shown above, atom A calls atoms B and C as a part of the same operation. This means that A is stalled until access to both B and C is obtained together or, in other words, until B and C finish stressed sections of tasks generated by the previous calls. Later, A is prevented from changing its state (continuing forward past certain point after the call) until later of B and C finishes its stressed section and enters the relaxed section.

Next, atoms B and C call two separate inputs of atom D. What this means in practice is that D starts executing when the second call arrives, and both B and C are prevented from changing their state (executing past certain points after the calls) until D executes and reaches its relaxed section.

When connections are made as shown on diagram above, with all calls happening out of stressed sections, A will be prevented from changing its state until D gets called and reaches its relaxed section. A is completely unaware of D’s existence, but gets constrained by it as part of the definition of tasks it wanted performed by B and C. This simple scheme naturally allows cascading of necessary dependencies between objects and using mechanisms like inheritance and dynamic connecting of atoms (described later) for that purpose. Each stressflow object, therefore, only defines its own constraints, while interconnection scheme alone cascades the constraints into more complex dependencies when it is necessary.

1.4 Ordered Relay Calls

Suppose the diagram shown above was used to implement dataflow like calculation in parallel, with atom A feeding many pairs of (X,Y) into the diagram, atom B receiving X and calculating square of X, atom C receiving Y, and calculating square of it, and atom D receiving X squared, Y squared and calculating square root of their sum. Connecting the atoms in the way shown above would produce correct results, but the only parallelism achieved would be by overlapping of the square calculations in atoms B and C. Ideally, to replicate dataflow machine operation better, the new pair should be fed long before square root of pair is being calculated. The layout of atom A depends on its relation with its data source. In most cases, the call (or a loop feeding pairs of data) can be placed inside atom A relaxed section, therefore we will assume it is, but it has no bearing on the performance of this problem.

However, if calculation of squares was to be moved to relaxed sections of B and C, the results produced would most often be incorrect. There would be nothing preventing say square of X₁ to arrive after square of X₂ and thus yield erroneous calculation on pair (X₂,Y₁) as the first result.

There is no need to have all calculations synchronized in this case as completely interconnected operations. All that is needed is making sure that the results arrive at the right order. Suppose now that some pre-reservation calls are made out of the stressed sections of B and C informing atom D to expect the data at the order the pre-reservation calls are made. The pre-reserved place is then utilized when the actual data is ready. This is all that is required to guarantee the proper order of data.

The diagram using thus described “ordered relay calls” is shown above. The reservation call can be made manually out of the stressed section and pre-reservation ID saved to be used later with the actual call. However, due to common usefulness of the scheme, it does make sense that this functionality is generated internally by the compiler, by allowing language means to define functions or atom calls as “ordered relays”. The compiler can therefore automatically insert all needed pre-reservation calls as prolog of the stressed section. It can also deal with the situation where a pre-reservation was made, but not utilized, by inserting a proper epilog at the end of the relaxed section.

As before, the details of actual implementation scheme are very important here as they allow performing the entire scheme with just a few instructions on existing processors, making it a good, practical and efficient scheme.

1.5 Connectors

Implicit call (direct connection) to the next stressflow atom in the program/diagram works well and is natural. There are, however, situations where this is inadequate. For example, many different objects may want to be called whenever mouse moves or some hardware or timer changes state. Having to write mouse/timer/hardware driver in a way where all “interested” atoms were to be listed by name would be impractical. In such a situation, a totally dataflow-like concept makes perfect sense.

To accommodate this need, stressflow defines the concept of a “connector” as another element fitting well within the stressed/relaxed scheme. To the caller, the connector looks like a regular stressflow atom to call and it provides exactly the same interface. To the objects using the connector, the connector is a device to which they can attach their stressflow atoms as part of object definition or through explicit instruction inside any code, constructors for example.

The key factor is that the connector is a wrapper around the stressflow atom concept. It not only allows forking execution, but assures all necessary process and data synchronization. The connector does not reserve/begin executing any target atoms unless they can be reserved together. Once they execute, the caller is prevented from changing its state (executing past certain point) until the last of the target stressflow atoms exits its stressed section. This assures that all targets get a chance to process or copy the data from the Driver before the driver changes its state, no matter how complex the interconnection topology.

1.6 Socket Construct

The socket construct is another structural tool designed around the stressflow framework. A socket is simply a stressflow with the body that is defined inside another routine, rather than a separate routine/sequence of instructions. The routine containing a socket is suspended when it reaches the socket until the socket is called from the outside. This rougly mimics the behavior of Ada rendezvous construct by using far more “atomic” stressflow atom concept.

A socket is useful when it is necessary to initiate parallel processing from a single routine or thread and collect the results back in the same routine or thread. Although fairly unnecessary in software designed entirely in spirit of stressflow, the sockey is a convenient tool to use when stressflow code is used inside existing non-stressflow code or when gradually converting non-stressflow code to stressfllow.

1.7 Other Stressflow Constructs

Complete stressflow specification defines more structural tools, all of them based around the same stressed/relaxed framework. Far more such tools are still possible.

The stressflow method, therefore, provides an efficient, atomic, low level scheme allowing to describe and to implement true, massive parallelism through structural object approach. Parallelism is defined by interconnection topology and not by some constructs or instructions hidden deep inside series of instructions. Due to the fact that parallel interface is also the front of all the structural stressflow elements or objects, the concept is well suited for both script and visual design. It can be used by itself for efficient description of parallelism in critical software, or it can be used as implementation layer for more abstract, application specific languages and tools. But, the most important element of the method is the fact that it can naturally and easily represent all dataflow algorithms and concepts, all control-flow algorithms and anything in between, which opens all new territory in modular, object oriented parallel software design.