The term ‘Intelligent Video Analytics’ has been around for some time, and whilst analytics certainly add value to a large number of systems, there has been some debate as to how ‘intelligent’ the technology really is. Some (often those selling the analytics engines) will argue that a high degree of intelligence is on offer, while others (typically those configuring the solutions) may tend to disagree. The introduction of deep learning technology to security systems could be about to change all that!
During a recent conversation with a leading high-tech manufacturer, they told Benchmark about an application where the company had supplied approximately 1,000 channels of video analytics. Whilst the contract was a major one for the company, what concerned them was they felt the end user’s expectations had not been met. It wasn’t a case of the analytics not performing correctly, or of the delivered value not being what was anticipated at the start of the contract; something else was amiss.
Sensing some reluctance on the part of the customer to implement a second phase of the installation, the manufacturer decided to visit the site to try and better understand why the client had seemingly had developed ‘cold feet’ about moving forwards with their roll-out plans.
During the visit it became apparent that the customer only had around 150 channels of analytics implemented. This meant that approximately 85 per cent of their investment in IVA technology wasn’t actually doing anything. It simply had not been implemented. As such, much of the originally calculated return on investment was not going to be realised.
Some might take the attitude that if an end user opts to not implement security resources they have paid for, that is their concern. However, anything that reduces customer expectations can be a negative point, especially where on-going projects are concerned. Rather than just walking away, the manufacturer decided to try and discover why the implementation had not been completed. The reason was simple, it all came down to labour costs.
There are two elements to the implementation of video analytics. The first is the task of adding the analytics engines to the various video streams and enabling them. This is typically fairly simple and easy to achieve. The second part of the implementation, and the part of the process that can be far more time-consuming, is configuring them for accurate performance. This can take repeat site visits, with time spent assessing the events and violations and tweaking the rules to not only eliminate unwanted events, but to ensure that the customer is getting the results that they need.
Every site is bespoke, and even very similar sites using the same analytic rules may need to deliver very different results, based upon the specific operational requirements. One site might want all traffic – people and vehicles – to create events at specified times, whilst another might only be interested in vehicles that stop for a defined time in certain secure areas. One user might focus on security while another is more interested in site management.
The sheer scale of flexibility on offer from IVA is what makes it so attractive to end users. However, the trade-off for this is that it takes time to ensure performance is optimised. There are some who point towards self-learning IVA systems, and to date such solutions have proven that they can help to reduce the time required for configurations. However it is fair to say that installers and integrators still need to revisit applications to assess how environmental factors impact on overall performance. There will also inevitably be events created by certain circumstances that users either don’t want notifications for, or want treated in different ways.
IVA is like a made-to-measure suit: if the time isn’t taken to ensure the fit is correct, then the final product will never be as good as it could be. Even with a self-learning system, there may be a need to adjust detection zones and masks, camera angles, perspective settings, etc..
Defining intelligence
The security sector, like many other industries, uses the terms ‘intelligent’ and ‘smart’ quite freely. There is nothing wrong with this, and the terminology is debateable. It could be argued that a device using two sources of data to define whether or not an event has occurred is ‘intelligent’ compared to one that uses a single source of data. However, most people associate the description with systems that can make decisions based upon a wide variety of criteria.
Such systems may well be computationally powerful, but the decisions being made are typically not based upon reasoning or an understanding of the events unfolding. They are based upon alphanumeric values such as pixel changes, and how those changes correlate with ‘known’ patterns generated by advanced algorithms. The systems are certainly complex and highly advanced, but they are only looking for known or predicted patterns.
This isn’t to say that they’re dumb; IVA offers a wide range of flexibility and can introduce high levels of added value. What the algorithms lack is the ability to recognise objects or scenarios and apply discretions based upon that information when deciding how to manage incidents. Effectively, standard IVA technology uses defined behaviours – either inputted by the installer or integrator or defined by the manufacturer – to check for rules violations.
The application of reasoning is, some would argue, the basic foundation for intelligence in terms of how a system operates. If a system can recognise objects, detect behavioural patterns and then apply a degree of reasoning based upon the context of the gathered information, it is closer to being ‘intelligent’.
Thanks to numerous Hollywood productions, science fiction shows and a general misunderstanding of the way in which technology works, many end users think that the ability for systems to apply reasoning is already with us. This can actually be a negative, because it means that their expectations for advanced technologies are much higher than they should be.
For installers and integrators, this can be a stumbling block. Whilst those in the industry understand that video analytics requires varying degrees of configuration, the end user often underestimates the need for such set-up and tweaking.
Deep learning
For a good few years, AI (artificial intelligence) and neural networking, and latterly deep learning, have been buzz-phrases associated with the advancement of machine intelligence. Many academics will point out that deep learning is merely neural networking revisited. In the past, some manufacturers have claimed to offer systems based upon neural networking. However, because of limitations on processing capabilities at the time, these options barely scratched the surface.
Deep learning is based upon the use of multiple cascaded layers of algorithms. Each layer makes use of input and output signals. Effectively (and in simple terms) a layer will receive an input, which it will then modify and pass on as an output. Each subsequent layer treats the output of the previous layer as its input. Deep learning is so-called because it makes use of multiple processing layers. Each layer can ‘transform’ the input it receives based upon the parameters it has learned.
The key here is learning. Rather than simply following a defined set of rules, the advanced processing replicates the way neurons work in the brain. Certain inputs can be given greater priority than others, and the system can decide whether inputs are of value or not.
By breaking tasks down and filtering specific information, deep learning enables machines to perform complex tasks in a manner similar to the human brain. While standard IVA systems tend to look at pixel values for input data, deep learning systems can use pixel values, edges, vector shapes and a host of other visual elements to ‘recognise’ objects. This is achieved by effectively ‘training’ the algorithms.
In the past, the machine learning community has worked to train deep learning-based systems to recognise objects. For example, a machine cannot only differentiate between a quadruped animal and a human on their hands and knees, but can also identify if that animal is a, dog, horse or any other creature it has been trained to recognise.
In security and video analytics applications, the training can include behavioural traits, identification of individuals, unexpected or unusual activity, etc..
This is important because it allows video analytics to be deployed with much less incident-specific configuration. Instead systems can present end-users with a range of events and incidents. The user can then decide which of these are important and which are innocuous and therefore of little interest. This will then ensure that the system only identifies events of interest.
Deep learning allows the system to continue to train itself over a period of time. This means that if it detects an object, activity or an event that it has not seen before, this can be flagged and presented to the end-user. They can then decide whether or not the system should continue to notify for such occurrences or ignore any similar incidents in the future.
It is important to understand that deep learning is an intelligence-based technology, and will have an increasing number of applications as it is rolled out into common usage. Many of these will be outside of the security sector. Indeed, experts estimate that in the future up to 80 per cent of service-oriented jobs will be disrupted by the introduction of systems using deep learning technologies.
For security applications, and specifically video- and audio-based surveillance tasks, it is vital that the technology is properly implemented. This makes it very important for installers and integrators to work with manufacturers who have an established track record in delivering security technologies. Whilst deep learning is intelligent, it will not be all things to all men. Not all deep learning systems are the same.
One for the future?
Deep learning and artificial intelligence are topics that tend to crop up regularly. There are no end of learned sources eager to tell you how it will be the next big thing. However, deep learning is here today and many manufacturers are already well down the development path creating deep learning-based solutions. Admittedly, we are unlikely to see deep learning systems replacing the traditional security operator in the near future. What we will see very quickly is the introduction of flexible and beneficial features based upon system intelligence when analysing video streams.
The adoption of technology is always accelerated when significant business use cases exist. In the world of video surveillance, features such as detection, smart search, business intelligence, integrated solution automation and the growth of smart buildings and cities are all in demand. Deep learning systems not only add value but also make systems more efficient, and increased efficiency has never been unpopular with businesses and organisations!
By considering deep learning as a technology for the future, installers and integrators could potentially miss out on enhanced benefits and real-world flexibility today.
The wider impact of deep learning will increase, but even at the most basic level its implementation in a system can enhance functionality immeasurably. The good news for those installing the systems is that deep learning will actually reduce complexity. All of the high level programming and coding is done by the manufacturer. The installer or integrator enjoys a simpler configuration, and the end-user receives a more efficient and effective security system.
What has made deep learning move from being a theoretical goal to a reality is the increased use of GPUs (graphical processing units). For many years the CPU (central processing unit) has been considered as the ‘brains’ of servers and computers. The GPU had more to do with the delivery of visual information.
The requirement for high quality 3D modelled graphics in the gaming sector saw the power of GPUs increase. Whilst CPUs generally are made up of a few cores, GPUs utilise hundreds of cores. This means they are capable of managing thousands of data processes simultaneously. Inevitably this huge amount of computational power means that increasingly the GPU is utilised for much more than simply rendering images. Other tasks undertaken include transcoding, pattern matching, image analysis and recognition, signal processing, etc..
It is also worth noting that the mainstream use of GPU s, plus their increased deployment in other applications with high computational needs, does ensure that economies of scale exist. Systems utilising the technology can still be affordable, and indeed may be very cost-effective due to the fact that configuration times will be significantly reduced.
In summary
We started by looking at an issue which led to an end-user not fully utilising their investment in video analytics. That situation is one that will be well understood by many who have had to configure systems for a user’s bespoke requirements. Setting up video analytics correctly does take time (often over long periods to allow for changing conditions) and that carries a cost.
Where end-users do not enjoy the full functionality they have invested in, or fail to see the projected return on investment materialise because their resources have not been fully implemented, expectations will be missed. Such a situation is not good for installers or integrators,manufacturers or the end-user.
Deep learning makes video analytics more effective and efficient. It also enables installers and integrators to work more closely with their customers, and for those customers to realise greater benefits from their investment.