How can you coordinate processes to start after others have finished? How does this work on a single system? How about a distributed system? What if each server can be running a different operating system? How can one system communicate with another when it's finished a task?
Polling
One way to coordinate processes is polling, but this has disadvantages. The way polling works is:
1. Check to see whether the task is complete.
2a. If the task is complete, congratulations! Take the result and run the next piece of code.
2b. If the task is not complete, wait for a period of time, say X. Then repeat from step 1.
Polling is a popular method of checking the state of something: maybe a process, a file, or something else. It's popular because it's simple, and because many developers don't think of, or even know about, the problems it brings. These problems are that it takes processing power away from other tasks - it's probably more obvious if I reword the description to show what's actually going on:
1. Make a time-consuming call to the file system or across a network to see whether the task is complete. If you're calling across the network, you'll need to wait for the response from the remote hardware, which will have to take time out from performing its own processes to respond. You might be able to make this asynchronous, but there's not guarantee of this.
2a. If the task is complete, congratulations! Take the result and run the next piece of code.
2b. If the task is not complete, wait for a period of time, say X. Then repeat from step 1.
The more frequently you check the condition you're waiting for, the more processor power it uses, and the more resources it uses communicating with the other task or process. However, if you poll less frequently, the average time to detect the condition increases. For example if the polling loop checks every 10 seconds, any change will take an average of 10 / 2, i.e. 5 seconds to be detected. In the worst case, the task will complete just after it's been checked, so in this case it will take just under 10 seconds to detect the change. This is generally a poor option.
Event-Driven Processing
Event-driven systems use the occurrence of an "event" - say the change to a file or process we mentioned earlier - to trigger the execution of another piece of code. This is built into most operating systems, and is usually how they work anyway.
For example, when you're developing Windows software, you can use event-driven code that gets called when registry values are changed, when the file system is updated, when the system becomes idle, and so on.
Unix lets you do a similar thing, using inotify
to monitor file system changes.
Event driven systems trigger parts of the system when something happens somewhere else. For example, when a file is closed, this might result in another function being called,or a response from a network request might cause a different function to be called.
If you have multiple systems with different operating systems, you can use TCP sockets, or maybe web hooks. Web hooks are HTTP calls to an API that trigger subsequent processes. These allow you to pass data between processes in the request body, and are generally a good thing to use. You can make them asynchronous, and many commercial services already integrate with them.
Another pattern that you often see in event-driven systems is Publish/Subscribe or Pub/Sub. This is generally even more flexible than webhooks: it allows a source to publish an event, which can then be picked up asynchronously by multiple subscribers. Neither publisher no subscribers need to have any dependencies on each other, and using channels allows one publisher to publish different events to different subscribers.
This has several advantages over polling. At a CPU level, only one process is active - the one that's actually doing work. There isn't another process constantly asking "Are you finished yet?". That means the worker process can work uninterrupted, and therefore more efficiently. It also means the process that needs the result of the current one doesn't start until it needs to. This drastically reduces the time between the first process finishing and the next one starting, making the system more responsive.
As well as all this, event driven architectures mean that the CPU isn't using as much power as with a polling architecture. This might not seem significant, but when you have an enterprise system, it can make a difference when you get the bill.
Summary
Polling requires constant checks on progress, and are an example of the Observer Effect. They're like working with a micromanager, or driving somewhere with the kids in the back asking "Are we there yet?" every couple of minutes. All of this excessive questioning slows everything down and generally makes you wish you'd done something else. It's a last resort.
Event driven systems let each task get on with their work, and trust them to notify the next in line when it's their turn to start. It's like driving somewhere and then telling the kids in the back when you've arrived. It's much more efficient, peaceful and civilized. That's why experienced engineers generally use this method.