Face-based Video Analytics

Facial detection and facial recognition technologies have been trialled in a variety of security and safety applications in the past, with varied success. While the technologies have appeared in a range of applications from the very basic right up to advanced and complex critical solutions, often their effectiveness has depended upon the expectations of the user and the fitness for purpose in a given application. Not all face-based systems are the same and with the advent of AI and high level processing, can the benefits now be realised?

One of the most effective ways of identifying individuals, whether for security or site management purposes, if through facial information. As far as humans go, recognition using facial details is both intuitive and well practised. It therefore stands to reason that any system which can accurately and consistently use facial information as an identifier can add benefits for many smart solutions.

While facial recognition is intuitive for humans, processing and storage can be slow. Faces can be identified but recalling exact information can take time. ‘I know your face; wait, it’ll come to me,’ isn’t what you need in a critical situation. If that is the case with an average number of people, the situation quickly deteriorates when faced with tens of thousands of individuals.

Facial recognition is a cognitive skill that humans are very good at. Processing data and recalling stored information are things that computers are very good at. For many years the challenge has been to bring the two together. With the understanding that it is more realistic to teach a machine to recognise faces than to teach people to handle thousands of computational commands every second, the race to improve facial recognition systems has been ongoing for some time.

Accuracy has always been a challenge for facial recognition, because there are so many variables. Because of technological limitations in the early days, a set number of criteria were used to define any given face. This usually included plotting certain characteristics, mapping the face and gathering data of general detail such as skin tone, hair colour, etc..

When using basic plotting to identify a face, there are many conditions that can change the final mapping. For example, the dimensions of a face using eyes, nose and mouth will change if that person is static, talking, sneezing or yawning. The fewer plotting points used, the more that errors will be introduced.

In the early days of facial recognition, restriction on processing power limited systems to using basic plotting points, which naturally impacted on accuracy.

One of the hurdles facial recognition had to overcome in the past few decades was that many of the high profile use-cases turned out to be flawed. This was partially because the technology was at a very early stage in its lifecycle. As a result, many of the issues that came to light during trials either couldn’t be corrected or were due to the systems being expected to perform in a different way than their capabilities allowed.

Another problem was due to unrealistic expectations from end users and the general public. They believed that facial recognition could pick out a single individual in a crowded scene and accurately identify them to the police or other relevant authorities. When this didn’t happen, the general consensus was that the technology itself had failed.

By far the most reported ‘failure’ of facial recognition followed the use of the technology in Newham in the late 1990s. Launched with much fanfare, the system became the object of battles between the council, the police and various human rights organisations. The mass media lined up to mock the system amid claims that during its trial period it never actually identified a single person correctly.

Anyone with any understanding of Moore’s Law will appreciate that the system, which was installed two decades ago, would be positively archaic today. Processing power doubles approximately every 12 months, which means in the interim period it has become common for home computing devices to have more power than the base system used in that trial.

Since then, the world of facial recognition and facial detection has moved on in leaps and bounds. Processing power is no longer a limitation and the cost of storage has been slashed.

Implementations of GPU-based systems have enabled genuine implementations of artificial intelligence (AI) and deep learning. Providers of video analytics have moved beyond simple facial plotting. If anything, the time is now right for the many benefits of facial recognition and facial detection to be exploited.

A subtle difference

When it comes to facial-based smart video applications, there are two methods which are very different in terms of implementation and achievable results. These are facial detection and facial recognition. The definition is implicit in the terminology but still causes confusion for some when considering what the systems will and will not be capable of doing.

Both have a mild similarity in that they initially need to be able to recognise that there is a human face within the viewed scene. The traditional method for achieving this is to use triangulation, making use of facial features (eyes, nose and mouth) to identify the patterns inherent in a human face.

With the growing use of AI and deep learning in many smart video functions, this approach now seems somewhat archaic (although it is still relatively effective). However, AI can effectively ‘learn’ to recognise people in a scene and can also then track them until a suitable facial image can be captured for the given purpose. This approach is more fluid and may deliver superior results, but it also requires increased processing capabilities.

The significant difference between detection and recognition becomes evident once the presence of a face has been identified in a video scene.

Facial detection is just that: the algorithm detects that a face is present in a scene and this information is then used to trigger a pre-defined action or to alter the system configuration. This might sound somewhat limited but the potential on offer does deliver a cost-effective way to obtain maximum efficiencies from a surveillance system.

For example, video could be recorded making use of a lower frame rate or reduced image resolution when there is no activity or activity which does not provide a positive identification of individuals in a scene, i.e. when no faces are detected. However, both the frame rate and resolution could be increased when a face is detected.

Video which includes face-based information could be automatically bookmarked or flagged as footage of interest, making any searches following an incident a simpler task. It is also possible to use the presence of a face in a viewed scene to dynamically create a region of interest, allowing a higher level of automation to be implemented.

Motion-based searches could also be filtered. For example, specifying a motion search on a doorway would show any activity in the selected area, regardless of whether an individual could be identified from it. By applying a filter based upon facial detection, this would ensure that the search only returned motion where a person could potentially be identified from the footage.

Facial recognition makes use of facial detection in the first instance. It then creates a ‘template’ from the facial information and compares that with stored data to ascertain whether the individual in known on the system. This allows the creation of white-lists (persons who are permitted to be in the location) and black-lists (people who are not authorised to be there).

Facial recognition is more processor-hungry than facial detection. It carries out the same detection methods as facial detection but then creates a template and searches the database for a match that falls within the prescribed thresholds to ensure accuracy. If the thresholds are minimalised, searching takes longer but is more accurate. Maximising the thresholds makes the process faster but will lead to a greater number of false accepts or incorrect matches.

While facial recognition is more relevant for many sites than the more basic facial detection, it has historically presented its own issues. For example, in an attempt to offer a differentiation from other competitive products, one manufacturer included facial recognition as a standard feature of its NVRs when the technology was still relatively new. The requirement for processing and storage meant that in order to be effective, the number of stored and searched templates was less than ten.

This limitation made the additional cost of the NVR hard to justify. Only small sites could find a credible use for facial recognition with a user count in single digits, and the ability to remember that many faces falls easily into the range of faces that human operators can easily remember. Admittedly there was the ability to use the facial information for automation, but there were also more cost-effective ways of doing this.

When server-based systems were expanded to support the use of databases containing hundreds or thousands of images, storage and processing demands often made them unrealistic.

The AI solution

In recent years the development of facial recognition technologies has, much like other video analytics, advanced significantly. Algorithms to detect faces have improved and accuracy has reached a level that makes it suitable for an increasingly wide range of applications.

Despite increases in processing speed over the many years that the technology has been developed, facial recognition – much like many analytics – only allowed small amounts of the gathered data to be utilised. However, this situation has changed with the introduction of GPUs into security hardware and servers.

Because of the huge increase in processing cores, GPU-based AI and deep learning not only enhances the performance of facial recognition systems but also enables in-depth forensic searches to be carried out across multiple streams on large and distributed systems. It also allows searches to be historical, in that they can include all retained data.

For example, when an individual is identified the system operator can search for all instances in which they appear. The system will find the most obvious matches; these may be further refined using other information such as the colour of clothing.

When the system deliver matches the operator can select those that are correct and reject any that not. The system will then use the accepted instances to further learn about the individual being searched for. This enables facial information captured at different angles to be included in searches.

As the system learns the facial information, supplementary data, such as clothing colour, becomes less critical. This can result in appearances of the individual from other days to be flagged if any exist.

The speed of the search and the fact that the system is learning more about the facial information means it can also identify any appearances of the individual on live streams, essentially tracking them around the site if they are still at the location.

The speed and ease with which the system can search, coupled with the ability to constantly learn more about the facial data being searched for, allows the technology to be used for a wide range of applications.

Security is an obvious use. The ability to identify suspects, build an audit trail of their activity prior to event and even search for them in real-time empowers operators and allows black-lists to be managed on-the-fly rather than retrospectively.

However, facial recognition could also be used for access control. Currently, biometric technologies are used for contactless access control but these often require users to stop and wait while the scanning takes place, usually at a terminal. This can create bottle-necks. Facial recognition can be carried out at a greater distance, while people are on the move.

Another option, and one which has been deployed by a number of airports, is to use facial recognition for status reporting. Capturing facial information allows the airport authorities to assess flow through the facility. As passengers disembark facial information is captured. This then provides information about how long it takes for passengers to clear immigration, baggage collection and customers. It can also be used as people enter the airport to qualify times for security checks, transfers, etc..

Another area that could benefit from AI-based facial recognition is the care of vulnerable people. With an aging population who need increased levels of support, facial recognition could be used to ensure their status and location was reported on a regular basis, enabling carers to respond should any of them find themselves in difficulty.

In summary

Facial recognition has had a challenging period of development, and because of the resources needed to make it effective it developed a reputation, albeit an incorrect one, for not being effective.

However, the increased use of GPUs – and the subsequent sea-change in processing capabilities – mean that the technology now offers a valuable and flexible tool for many applications.

Using the right facial analytics processing engine, coupled with GPU-enabled servers, delivers a tool that empowers users, added real-world benefits and creates a smart solution that exceeds user expectations.

BENCHMARK
Benchmark is the industry's only publication for installers and integrators which is dedicated to technological innovation and the design and implementation of smarter solutions. With an unrivalled level of experience in technology-based systems, Benchmark delivers independent and credible editorial content.