Capacity Planning for Data Replication and Clustering

The tables in this topic help you provision your deployment to support data replication and clustering based on your expected usage rates in the following environments:

Note: Refer to Data Durability and Availability in Stellar Cyber for a discussion of best practices related to maintaining data availability.

Capacity Planning for Physical Appliances

The # Appliances column in the table below indicates the number of data nodes. Keep in mind that a multi-node cluster will include one additional appliance operating as the Master DP. The Master DP doesn't process data itself but distributes it to workers and coordinates the cluster.

Data Replica

# Appliances

Ingestion (GB/day)

#Reports

#Playbooks

#Tenants

#Concurrent Sessions

N

1

300

100

1000

50

15

N

2

600

200

2000

100

30

N

3

900

300

3000

150

45

N

4

1200

400

4000

200

60

N

5

1500

500

5000

250

75

N

6

1800

600

6000

300

90

N

7

2100

700

7000

350

105

Yes

2

400

200

2000

100

30

Yes

3

600

300

3000

150

45

Yes

4

800

400

4000

200

60

Yes

5

1000

500

5000

250

75

Yes

6

1200

600

6000

300

90

Yes

7

1400

700

7000

350

105

Yes

8

1600

800

8000

400

120

Yes

9

1800

900

9000

450

135

Yes

10

2000

1000

10000

500

150

Capacity Planning for Cloud-Based Deployments

The DL Count column in the AWS and Azure tables below indicate the number of DL instances that actually store data. Keep in mind that a cloud-based deployment will include one additional Dedicated DL-Master node when the cluster scales to daily ingestion greater than 250 GB and requires more than a single DL node. The Dedicated DL-Master node provides storage management and ElasticSearch operations but does not store data itself .

AWS Capacity Planning

Data Replica? Data Ingestion # Tenants # Reports # ATH #Concurrent Sessions DA Instance DA Count Per DA CPU Per DA Memory DL Instance DL Count Per DL CPU Per DL Memory
N/A 50 10 10 100 5 N/A N/A N/A N/A r5.4xlarge 1 16 128GB
N/A 100 to 250 25 100 1000 15 M5.4xlarge 1 16 64GB r5.4xlarge 1 16 128GB
N/A 300 50 100 1000 15 M5.4xlarge 1 16 64GB r5.4xlarge 2 16 128GB
N/A 350 50 100 1000 15 M5.4xlarge 2 16 64GB r5.4xlarge 2 16 128GB
No 500 75 200 1500 20 M5.4xlarge 2 16 64GB r5.4xlarge 2 16 128GB
No 600 75 300 2000 30 M5.4xlarge 2 16 64GB r5.4xlarge 3 16 128GB
No 900 100 400 3000 45 M5.4xlarge 3 16 64GB r5.4xlarge 4 16 128GB
Yes 400 75 300 2000 30 M5.4xlarge 2 16 64GB r5.4xlarge 3 16 128GB
Yes 600 100 400 3000 45 M5.4xlarge 2 16 64GB r5.4xlarge 4 16 128GB
Yes 800 100 400 2000 45 M5.4xlarge 3 16 64GB r5.4xlarge 4 16 128GB

 

Azure Capacity Planning

Data Replica?

Data Ingestion

# Tenants

# Reports

# ATH

#Concurrent Sessions

DA Instance

DA Count

Per DA CPU

Per DA Memory

DL Instance

DL Count

Per DL CPU

Per DL Memory

N/A

50

10

10

100

5

Standard_D16s_v3

N/A

N/A

N/A

Standard_E16s_v3

1

16

128GB

N/A

100 to 250

25

100

1000

15

Standard_D16s_v3

1

16

64GB

Standard_E16s_v3

1

16

128GB

N/A

300

50

100

1000

15

Standard_D16s_v3

1

16

64GB

Standard_E16s_v3

2

16

128GB

N/A

350

50

100

1000

15

Standard_D16s_v3

2

16

64GB

Standard_E16s_v3

3

16

128GB

No

500

75

200

1500

20

Standard_D16s_v3

2

16

64GB

Standard_E16s_v3

3

16

128GB

No

750

75

300

2000

30

Standard_D16s_v3

3

16

64GB

Standard_E16s_v3

4

16

128GB

No

1000

100

400

3000

45

Standard_D16s_v3

4

16

64GB

Standard_E16s_v3

5

16

128GB

Yes

300

75

300

2000

30

Standard_D16s_v3

2

16

64GB

Standard_E16s_v3

3

16

128GB

Yes

450

100

400

3000

45

Standard_D16s_v3

2

16

64GB

Standard_E16s_v3

4

16

128GB

Yes

600

100

400

2000

45

Standard_D16s_v3

2

16

64GB

Standard_E16s_v3

5

16

128GB

Data Sinks and Capacity Planning

If you enable a data sink in your deployment, it is crucial that you do not exceed the guidelines in the tables above and provision sufficient DA nodes for your anticipated ingestion. Do not exceed 300GB of daily ingestion per DA node.

Data sink performance depends heavily on I/O bandwidth between DA nodes and the data sink itself and adding a data sink can reduce DA performance by 30-40%. Because of this, you should anticipate loading your DA nodes with no more than the maximum of 300GB of daily ingestion per node described in the tables above when a data sink is enabled. If your current configuration exceeds this per-DA load, add additional DA nodes to your cluster before enabling a data sink.