Audio detection and analytics

When it comes to situational awareness, video surveillance is an impressive tool with a wide range of benefits. Add video analytics and its power substantially increases. However, when considering detection and anlytics, audio is often under-used. Benchmark considers the role of audio detection and analytics in the creation of credible solutions.

For many years, video surveillance has led the way with regard to situational awareness. The increased implementation of intelligent video analytics has pushed the boundaries with regard to performance, and ensures that the benefits of video can be fully exploited by those seeking an advanced level of protection. The power offered by a well designed and correctly implemented video system using IVA will often be unsurpassed by most other technologies.

Whilst video surveillance continues to grow in the UK and other parts of the world, both in terms of scale and efficiency, and is accepted by the vast majority as an effective tool to mitigate risks and enhance business management, audio surveillance has not always been equally embraced. This is despite it being of significant value in assessing and investigating events.

A few years ago, Benchmark visited an educational environment and was discussing its system with the security manager. He explained that his biggest frustration was the inability to deploy audio surveillance across the site. There had been incidents where pupils had been in the grounds out of school times, and had been confrontational and at times threatening when approached by maintenance and other non-teaching staff. However, without audio, this was not always obvious when the video footage was viewed. Additionally, cases of vandalism had been hard to identify as there was no audio-based evidence of the exact time of an incident.

Such anecdotal evidence is not unusual. Audio footage can add a layer of information that not only assists in investigations but can also mean that the detection of some threats is missed.

Audo detection and the classiciation of sounds has become a realistic and affordable option with the development of new technologies. Where once the use of audio in security was merely to allow conversations to be reviewed (when privacy restrictions allowed), today a more proactive approach is possible.

Systems can create alerts based upon voice-based or other sounds. In the first instance increases in volume (shouting), voice characteristics (such as anger), key words or changes in expected behaviour can be used to generate alerts. In the second case, unexpoected noises, volume increases or specific classifications of noise can also crete alarms. In the latter case, this might be gun shots, breaking glass, sirens, etc..

Many end users, along with some installers and integrators, shy away from the use of audio surveillance technologies because of fears relating to privacy breaches. Staff or visitors to a site do not want to be listened to. They view such an action as snooping or eavesdropping, and as such they feel it infringes their rights.

Whilst people’s actions and movements might be open for analysis via CCTV, their words are considered as more private. For example, whilst few workers will object to the use of CCTV in the workplace, resistance to audio surveillance remains high.

Whilst audio surveillance might be limited in its applications, there are still many benefits that audio information can offer. The detection of audio exceptions via intelligent analytics, as mentioned earlier, can enhance situational awareness by triggering actions in response to unusual, unexpected or predefined sounds.

Analysing audio

Audio analytics predominantly appear in the surveillance sector in one of two guises. The first is as a basic audio detection function. This generally will detect sounds that exceed a pre-defined threshold. The second is as audio analytics.

The main difference between detection and analysis is that detection functions tend to not seek out specific sound classifications. A window being broken, a gun shot or a scream may all be treated in the same way as thunder, a vehicle with a faulty exhaust, an emergency services vehicle with siren sounding or even an ice cream van!

Audio analysis will be more defined. A sound will need to match criteria with regard to its classification: these might include (but will not be limited to) frequency (often multiple frequencies), volume and duration. With regarding to multiple frequencies, breaking windows have two distinct frequencies that occur within a very small time window. The first is a lower frequency flex as the glass receives an impact, followed by a higher frequency sound as it shatters.

As detection usually involves the sensing of any sound above a defined volume threshold, the technology might initially appear to be limited. This is because specific noises cannot be identified, and only general exceptions can be used to trigger alarms or automated actions.

For example, if you consider an office environment where interactions with the public take place, an exception might include someone shouting at staff, a door being slammed, a screen or desk being banged, etc.. All these incidents might represent a cause for concern. However, analytics based solely on volume would also generate an alarm if a baby was crying, if an item was dropped or during thunder storms!

Whether this is acceptable or not very much depends upon the operational requirements of the system. If the role of the analytics is to alert an operator or security personnel to an event, with an onus on them then verifying the situation, it may be acceptable. Alternatively, in a closed building at night or during weekends, sounds other than those created by the general building fabric might warrant further investigation.

Where the security team has an on-site presence, such alerts could be sent to an operator or even to a patrol via a handheld device. As with any detection technology, if the rate of nuisance alarms is high, the effectiveness of the system may suffer as events will inevitably be ignored.

If a more intelligent solution is required, analytics should be sought which can differentiate specific sounds whilst ignoring others, regardless of their volume.

With the ability to differentiate specific sound signatures, the power of audio analytics is significantly enhanced. The ability to filter and identify whether an alert is created by breaking glass, a gun shot, a crying baby, an impact, aggressive behaviour, fire or smoke alarms, keywords, machinery malfunctioning (or even stopping) makes audio analytics a powerful tool for both security and business management.

Platform options

Just as is the case with video analytics, installers and integrators have a variety of ways to implement audio detection and its associated analytics. A growing number of camera manufacturers are now utilising spare processing power in their devices to add either audio detection or audio analytics functionality. As such, it is increasingly a common standard feature.

Where a more specialised approach is required, many of the ‘open platform’ cameras and encoders allow installers and integrators to run analytics Apps. Higher level audio analytics are ideal for such an approach, allowing installers and integrators to select the specific audio software options required for any given application. This reduces costs, as there will often not be a requirement for a full suite of audio sensors. As with all App-based functionality, installers and integrators can take a ‘mix and match’ approach to ensure they maximise the potential on offer from what is a very significant and powerful detection option.

As with many things in surveillance today, there exists something of a debate over whether video analytics are better served by being executed centrally or at the edge. This is also true for audio-based analytics.

Some will argue that the process is optimised by deploying a dedicated server running multiple channels of the analytics software. This does allow all analytics to be operated in an optimised environment, but can also increase the capital investment for the user. Others believe that analytics at the edge sits better with modern system design. It also allows the use of specific analytics at certain locations, thus enabling a best-of-breed solution.

Increasingly, VMS providers are also working with many of the leading audio analytics companies. This means that some detection options can be managed directly from the video surveillance GUI, and can be allocated to specific cameras or groups of devices.

Many audio analytic providers stress that their audio sensors are compatible with low cost microphones, and given that many camera are equipped for two-way audio, it makes sense to minimise installation time and use these rather than fitting discrete devices. Of course, it pays to be prudent and carry out field trials with the specified audio package prior to installation.

In summary

Audio detection for alarm triggering is currently widely available as an integral feature of cameras and encoders. Whilst the majority of options are limited by the fact they are volume-based, they can still deliver benefits in a wide range of applications.

For those seeking a higher degree of intelligence, then systems using advanced processing and filtering are becoming more common and easier to implement. For those seeking such solutions, it is a case of seeking out an advanced camera with good audio analytics, an open platform device of a compatible VMS; a dedicated server could also be deployed if a variety of analytics are to be implemented.

Whichever route is taken, professional audio analytics can significantly enhance situational awareness.

Audio analytics

Audio detection and analytics

Analysing audio

Platform options

In summary

Deep Learning: the future of video analytics?

Infrastructure Test: Siklu EtherHaul 600TX

Related Articles

Audio Detection and Sound Classification

Audio Detection and Sound Classification

Implementing Audio Analytics