Commonly Found Mistakes in Power Distribution Systems
September 10, 2014
Failed scenarios and mistakes in power distributions are not all that common, but they do exist. Part of it is due to improper or poor design of the power distribution model itself, meaning that a disaster scenario has been put into motion at the very beginning. We look at some of these mistakes and disaster scenarios with the intention being to inform readers to be wary of repeating such mistakes when designing their power distribution system.
The split brain
We'll start with the most common classic problem associated with distributed systems. This phenomenon emerges when machines responsible for providing certain kinds of fault-tolerant services lose contact with each other, resulting in a breakdown of communication and prompting them to operate independently. This is a dangerous situation for the entire system as it creates the scenario of a split brain with each part acting as a full-fledged brain on its own. When different parts of the system diverge, the groups of machines acting independently may incorrectly conclude that other groups have failed as well without double checking the status of other groups that might still be fully functional and running optimally. Such a scenario causes both sets of machines to act authoritatively with full control for the service they provide. In other words, they both start acting like brains of the system, where in fact, there should only be one.
Site failures
A site failure occurs when all the machines within a given site or group stop functioning and experience a complete shutdown. There can be many causes for this behavior, the most common of which is power loss. Of course, there is a UPS network installed to combat such scenarios; however, depending on how long the power loss lasts, the UPSs can even give out, prompting the entire system to shut down since there is no electricity. Because of the far-reaching impacts of a site failure, it is often regarded as a double fault because more than one machine has failed in such instances and distributed systems only have the ability to handle singular component failures. Distributed systems are designed from the ground up with the assumption that site failures do not occur. Yet they do occur and it is also commonly understood that such systems much have a backup plan to withstand the power loss for a sustainable amount of time.
Delayed messages
Imagine a situation in which the delivery of messages is not synchronized with the detection of failures that occurs within the same system. Messages are sent and received in specific configurations and the machines they are assigned to are also on specific configurations. The problems start when the message sent from one configurations ends up being received by a machine on a completely different configuration. Based on a similar assumption pertaining to distributed systems mentioned earlier, major application components are utilized under the same assumption that a conflict of configuration within the system will not happen and the configuration itself will not change or differ. Yet it does. And when that happens, serious problems arise.
In fact, delayed messaging scenarios can be quite difficult to diagnose and test within a distributed system. Developers who develop these applications are oblivious to the fact that networks never behave this way. Yet these problems are all too common in WAN configurations where the possibility of variability within the networks is at an all-time high.
Inconsistent failure detection
Whenever a component within a distributed system fails, a process should be in place that highlights the affected component correctly so that countermeasures in the form of troubleshooting or reconfiguration can begin. However, inconsistent failure detection or IFD occurs when various components of the system detect different sets of failures for the same problem.
Distributed systems today are designed with the assumption that failure detection is a common occurrence throughout the system. Given that assumption, the system can and will fail in strange ways that are not only harmful but also hard to diagnose. For example, if three machines -- X, Y, and Z -- are supposed to compensate for the failure of other machines in the system, their different observations about failures could result in them taking incorrect and often times worse actions and corrupting the entire data.
Power distribution systems are extremely complex to design and even more so to implement and troubleshoot. Staying clear of these mistakes will ensure your systems perform optimally 99.9% of the time.
- Steward Hudson is a researcher/blogger with experience writing for multiple industries including health, energy, finance, and more. He currently writes for electrical engineering company Current Solutions PC of White Plains, NY.
Related posts:
About the Author
You May Also Like