Performance and Load Estimation

These results should be viewed as guidelines and not performance guarantees, since there are many variables that affect performance (file set, network configurations, hardware characteristics, etc.). If throughput is important to your implementation, OPSWAT recommends site-specific benchmarking before implementing a production solution.

Factors that affect performance

  • MetaDefender Core version

  • MetaDefender Core engine package and configuration

    • set of engines (which and how many)
    • product configuration (e.g., thread pool size)
  • MetaDefender Distributed Cluster API Gateway version

  • System environment

    • server profile (CPU, RAM, hard disk)
    • client application location - remote or local
    • system caching and engine level caching
  • Dataset

    • encrypted or decrypted

    • file types

      • different file types (e.g., document, image, executable)
      • archive file or compound document format files
    • file size

    • bad or unknown (assume to be clean)

  • Performance tool

Performance metrics

While processing files on the system, service performance is measured by various metrics. Some of them are commonly used to define performance levels, including:

Performance metricsDescription
Number of processed objects per hour vs. Number of processed files per hour

On MetaDefender Core, meaning of “files” and “objects” are not the same.

  • “files”: exclusively refers to original files submitted to MetaDefender Core. These could be either archive or non-archive file formats. For archives, depending on archive handling settings, MetaDefender Core may need to extract them and process all nested files inside as well. For example, one archive file could contain millions of nested files inside.
  • “objects”: refers to any individual files that MetaDefender Core must process. These could be separate original files submitted to MetaDefender Core, or extracted files coming from an archive. The number of processed objects is considered to be a more accurate throughput metric to measure MetaDefender Core performance.

The primary metric used to measure average vs peak throughput of a MetaDefender Core system is “processed objects per hour.”

Submission load

(number of successful requests per second)

This performance metric measures the load generated by a test client application that simulates loads submitted to MetaDefender Core.

A submission is considered successful when the client app submits a file to MetaDefender Core and receives a dataID, which indicates that the file has successfully been added to the Queue.

Submission load should measure both average and peak loads.

Average processing time per objectThe primary metric used to measure processing time of a MetaDefender Core system is “avg processing time (seconds/object).”

Total processing time

(against certain data set)

Total processing time is a typical performance metric to measure the time it takes to complete the processing of a whole dataset.

How test results are calculated

Performance (mainly scanning speed) is measured by throughput rather than unit speed. For example, if it takes 10 seconds to process 1 object, and it also takes 10 seconds to process 10 objects, then performance is quantified as 1 second per object, rather than 10 seconds.

  • total time / total number of objects processed: 10 seconds / 10 objects = 1 second / object.

Dataset

File categoryFile typeNumber of filesTotal sizeAverage file size
DocumentDOC3,820534 MB0.14 MB
Medium archive filesRPM CAB EXE50Compressed size: 2.8 GB Extracted size: 12.09 GBCompressed size: 56.02 MB Extracted size: 0.036 MB
Big archive filesCAB4Compressed size: 2.9 GB Extracted size: 124 GBCompressed size: 715 MB

Environment

Topology

Using AWS environment with the specification below:

MDDC system

MD CoreFile StorageAPI GatewayPostgreSQLRabbitMQRedis
OSWindows Server 2022Rocky Linux 9Rocky Linux 9Rocky Linux 9Rocky Linux 9Rocky Linux 9
AWS instance typec5.2xlargec5n.4xlargec5n.2xlargec5.xlargec5.xlargec5.xlarge
vCPU8164444
Memory16GB32GB8GB8GB8GB32GB

Disk Type

IOPS

Throughput

Size

gp3

3000

125MB/s

100GB

gp3

12000

1000MB/s

150GB

gp3

3000

256MB/s

100GB

gp3

10000

550MB/s

100GB

gp3

3000

125MB/s

80GB

gp3

3000

125MB/s

80GB

Network bandwidth (baseline & burst)

2.5 Gbps

10 Gbps

15 Gbps

25 Gbps

5 Gbps

25 Gbps

1.25 Gbps

10 Gbps

1.25 Gbps

10 Gbps

1.25 Gbps

10 Gbps

Benchmark (Geekbench)EC2 c5.2xlargeEC2 c5n.4xlargeEC2 c5n.2xlargeEC2 c5.xlargeEC2 c5.xlargeEC2 c5.xlarge

Client tool

Detail
OSRocky Linux 9
AWS instance typec5n.xlarge
vCPU4
Memory10GB
Disk

Type: gp3

IOPS: 3000

Throughput: 125MB/s

Size: 80GB

Network bandwidth

Baseline: 5 Gbps

Burst: 10 Gbps

Product information

  • MetaDefender Core v5.14.2

  • Engines:

    • Metascan 8: Ahnlab, Avira, ClamAV, ESET, Bitdefender, K7, Quick Heal, VirIT Explorer
    • Archive v7.4.0
    • File type analysis v7.4.0
  • MDDC Control Center v2.0.0

  • MDDC API Gateway v2.0.0

  • MDDC File Storage v2.0.0

  • PostgreSQL v14.17

  • RabbitMQ v3.12.6

  • Redis v7.2.1

MetaDefender Core settings

General settings

  • Turn off data retention
  • Turn off engine update
  • Scan queue: 1000 (for Load Balancer deployment)

Archive Extraction settings

  • Max recursion level: 99999999
  • Max number of extracted files: 99999999
  • Max total size of extracted files: 99999999
  • Timeout: 10 minutes
  • Handle archive extraction task as Failed: true
    • Extracted partially: true

Metascan settings

  • Max file size: 99999999
  • Scan timeout: 10 minutes
  • Per engine scan timeout: 1 minutes

Advanced settings

RabbitMQ

  • RABBITMQ_SERVER_ADDITIONAL_ERL_ARGS=-rabbit consumer_timeout unlimited default_consumer_prefetch {false,525}

Redis

  • redis-cli flushall
  • redis-cli config set save ''
  • redis-cli config set maxmemory 25gb
  • redis-cli config set maxmemory-policy volatile-ttl

Performance results

Load-balance deployment vs MDDC deployment

Multiple tests are conducted using 12 MetaDefender Core instances across two deployment types, MetaDefender Distributed Cluster (MDDC) and Load Balancer, to determine the superiority of the MDDC in 4 different datasets.

ScenarioResult
Aggressively submitted 2M non-archive files at a rate of 800 files per second.

Submitted 400 medium archive files at a rate of 1 files per second.

Submitted a mix of 189K non-archive and medium archive files at a rate of 180 files per second.

Submitted 4 large CAB files.

The scenarios replicate 2 different routing cases of a common Load Balancer.

LB OneToOne: An ideal routing ensures that one CAB file is routed to a single MD Core.

LB FourToOne: The worst routing that delivered four CAB files to a single MD Core.

#

Archive distribution

In workflow, setting "Load shared among MetaDefender Core instances for archive processing" is enabled.

Scaling out

In the following test scenarios, we conducted experiments on four datasets using 4 and 12 of MD Core instances in MetaDefender Distributed Cluster (MDDC), demonstrating the benefits of increased instance counts.

ScenarioResult
Aggressively submitted 2M non-archive files at a rate of 800 files per second.

Submitted 400 medium archive files at a rate of 1 files per second.

Submitted a mix of 189K non-archive and medium archive files at a rate of 60 files per second.

Submitted 4 large CAB files.

Archive distribution

In workflow, setting "Load shared among MetaDefender Core instances for archive processing" is enabled.

Type to search, ESC to discard
Type to search, ESC to discard
Type to search, ESC to discard