The Importance of Health Monitoring
Robin Hughes, Sales Director, Secure Logiq
Regardless of the build quality of a server, over a long period some component failures are inevitable. HDDs in an Enterprise server build are spinning disks, and in a security environment are being worked harder than conventional IT-centric applications. Also, security storage volume sizes are too large for it to be feasible to run regular data back-ups.
Hopefully, most Enterprise surveillance applications will have resilience built-in, such as battery-backed RAID, mirrored OS drives, redundant PSUs and an appropriate level of RAID resilience. However, the importance of monitoring the health of these systems cannot be underestimated.
Many systems are not constantly observed, and may only visited on occasion to review an incident, so it is important alerts are raised if the hardware is experiencing issues, especially if the component in question represents a potential single point of failure.
Most systems can withstand a camera failure, but with the exception of expensive failover models, most are not designed to withstand a critical server failure. For this reason, CCTV-centric cloud-based hardware monitoring and alerting utilities are vital. All Secure Logiq servers are shipped with the health monitoring and alert software: Logical Healthcheck Pro. This monitors critical components in the hardware and raises an alert in the event if something unexpected occurring.
It is important that such software has a negligible effect on the server’s resources, and extremely low bandwidth requirements enable it to be run on 3G networks if required.
Health monitoring offers huge advantages to the integrator, enabling them to proactively monitor their entire server estate from a single screen. Pre-emptive maintenance can reduce expensive call-outs, and can eliminate unforeseen downtime on a customer’s system. For example, a fan failure on a server would not present any immediate risk, but if unaddressed could lead to a system or component overheating, increasing the possibility of server failure. Most servers can withstand a drive or two failing, but one more would result in corruption of all data in the array and the video archive being lost.
Multiple techniques are utilised to monitor the various processes and components within the server, and this is data is available to the integrator via a simple GUI. Customisable thresholds allow the adjustment of settings to suit the environment, and to raise and escalate alerts – via email or SMS – providing warnings for abnormal hardware usage as well as temperature and component failure.
IT-centric monitoring tools are available but are often too expensive or complicated for surveillance applications, despite the mission critical nature of most installations
Many VMS packages offer local alerts, but because these are designed for use with a wide range of hardware, reporting is often limited to basic information. Setting up both the hardware and software alerting facilities is critical for any integrator looking to provide the best level of service to their customer.
Setting up local alerts is relatively common, but configuring offsite alerts is often met with resistance from end users concerned about their security system being online. To address this, a monitoring utility should use a certificate, and all outgoing traffic should be through an encrypted SSL connection and contain only information relevant to the health monitoring application. No site or personal data can be sent via the utility, making it fully compliant with GDPR. Tiered user management also ensures only authorised users have access to the information.
Health monitoring is an engineer-led and integrator-driven technology, and at Secure Logiq our R&D is driven by feedback from them. Future additions will include maintenance, out-of-warranty and end-of-life alerts, historical data logging and health reporting with export functionalities.
As a final thought, proactively monitoring the ‘engine’ of a system can help identify problems elsewhere in the installation. One customer observed regular nightly alerts for increased network load, and on investigation found external lighting was being switched off at a critical time, causing substantial image noise (and the inevitable image quality issues associated with low light applications).
Another customer identified unauthorised internet access as the root cause of a memory leak in the workstations which was causing the VMS client application to continually restart.