Adding a General S3 Data Sink

You must have Root scope to use this feature.

You can use the System | Data Processor | Data Sinks page to add a General S3 Data Sink to Stellar Cyber. A General S3 Data Sink is a data sink that uses any AWS S3-Compatible Storage Service for its storage (for example, wasabi).

Adding a General S3 Data Sink consists of the following major steps:

Add a user in your S3-compatible storage service with the appropriate permissions and retrieve its Access and Secret keys.
Get the bucket name in your S3-compatible storage service.
Get the endpoint URL in your S3-compatible storage service.
Add the General S3 Data Sink in Stellar Cyber.

There are many S3-compatible storage services available on the market. The procedures used to create a user and retrieve the bucket name and endpoint URL vary between vendors. The examples in this procedure use the wasabi S3-compatible storage service.

You can also add an actual AWS S3 data sink using the General S3 option as long as you know the endpoint URL. However, Stellar Cyber recommends that you use the dedicated S3 type for actual AWS S3 data sinks.

Adding a User and Obtaining Keys

The following procedure provides a summary of how to add a user in wasabi. The instructions vary depending on the storage vendor you are suing.:

Navigate to www.wasabi.com and create an account.
Once you have created and confirmed your wasabi account, log in to the wasabi console.
Click on Users in the left panel. Then click the Create User button, as illustrated below:
Supply a username, check the box for Programmatic access, and click Next, as illustrated below:
Choose an optional Group for the user and then click Next. We've added our user to the admins group in the image below:
You can either assign a policy to the user now or create one later with the minimum permissions. We'll add the minimum permissions later and just click Next for now.
Click Create User to finish creating the user.
Wasabi provides you with the Access Key and Secret Key for your account. You'll need these later on when you add the data sink in Stellar Cyber, so make sure you click Download CSV to save them.

Here's our new user in the Users List:

Create a Bucket for the Data Sink

Next, we'll create a bucket and retrieve its name so we can add it in Stellar Cyber:

In the wasabi console, navigate to Buckets and click the Create Bucket button, as illustrated below:
Supply a Bucket Name and Region for the bucket and click Next.

Make sure you note the Bucket Name. You will need it when you add the data sink in Stellar Cyber. In this example, our bucket is named stellarbucket.
Set the Bucket Properties as desired and click Next.
Review your settings and click Create Bucket to finish creating the bucket.

The new bucket appears in the bucket list.

Create a Policy with the Minimum Permissions

You can either use a built-in wasabi policy that provides full permissions (for example, AmazonS3FullAccess) or create your own policy with just the minimum permissions and assign that policy to the bucket you created earlier. The procedure below explains how to create a policy with just the minimum permissions:

In the wasabi console, navigate to Policies and click the Create Policy button, as illustrated below:
Supply a Name and Description for the policy.

Use the Policy Editor to define the permissions for the policy. As illustrated below, our policy includes only the minimum permissions listed below:

"Effect": "Allow",

"Action": [

"s3:PutObject",

"s3:GetObject",

“s3:ListBucket”,

Here is a sample policy with the necessary minimum permissions granted. Replace stellarbucket with your own bucket name.

Copy

{
        "Version": "2012-10-17",
        "Statement": [
                {
                        "Effect": "Allow",
                        "Action": [
                                "s3:ListBucket"
                        ],
                        "Resource": [
                                "arn:aws:s3:::stellarbucket"
                        ]
                },
                {
                        "Effect": "Allow",
                        "Action": [
                                "s3:PutObject",
                                "s3:GetObject"
                        ],
                        "Resource": [
                                "arn:aws:s3:::stellarbucket/*"
                        ]
                }
        ]
}

You may also want to grant the following additional privileges for convenience: s3:ListBucket and s3:DeleteObject.

Here's an example of our policy in the wasabi console using the stellarbucket name:

Granting Multipart Permissions

If you set a Batch Window of greater than 60 seconds, you also need to enable the following permissions for the Multipart Upload feature in S3:

"s3:ListMultipartUploadParts"

"s3:AbortMultipartUpload"

"s3:ListBucketMultipartUploads"

Adding the General S3 Data Sink in Stellar Cyber

To add a General S3 Data Sink:

Click System | Data Processor | Data Sinks. The Data Sink list appears.
Click Create. The Setup Data Sink screen appears.
Enter the Name of your new Data Sink. This field does not support multibyte characters.
Choose GeneralS3for the Type.

Additional fields appear in the Setup Data Sink screen:
Use the Region dropdown to select the region where the S3 bucket is located. You supplied the bucket's region in Create a Bucket for the Data Sink .
Supply the name of the bucket in the Bucket field. You supplied this name in Create a Bucket for the Data Sink .
Supply the Access Key and Secret Key you identified in Adding a User and Obtaining Keys in the corresponding fields.
Supply the Endpoint for the bucket. The Endpoint is the publicly accessible URL of the domain where your bucket is located. For example, our stellarcyber bucket in wasabi is located in the N. Virginia us-east-1 region, which corresponds to an endpoint of s3.wasabisys.com.

Stellar Cyber uses the endpoint together with the bucket name to construct the full URL to your data sink. For example, the full URL to our stellarcyber bucket is https://s3.wasabisys.com/stellarbucket.

Keep in mind that not all buckets are created with public access enabled by default. Follow your storage vendor's documentation to enable public access.
Select the types of data to send to the Data Sink by toggling the following checkboxes:
- Raw Data – Raw data received from sensors, log analysis, and connectors after normalization and enrichment has occurred and before the data is stored in the Data Lake.
- Alerts – Security anomalies identified by Stellar Cyber using machine learning and third-party threat-intelligence feeds, reported in the Alerts interface, and stored in the aella-ser-* index.
- Assets – MAC addresses, IP addresses, and routers identified by Stellar Cyber based on network traffic, log analysis, and imported asset feeds and stored in the aella-assets-* index.
- Users – Users identified by Stellar based on network traffic and log analysis and stored in the aella-users-* index.
Alerts, assets, and users are also known as derived data because Stellar Cyber extrapolates them from raw data.
Click Next.

At this point, Stellar Cyber attempts to reach the bucket for the data sink using the settings you have specified in the previous steps:
- If Stellar Cyber is not able to reach the bucket, an error message appears and you must check and correct the settings as necessary.
- If the settings are validated successfully, the Advanced (Optional) page appears, as illustrated below:
Stellar Cyber can detect and alert you to the following errors in your data sink configuration:
- Missing required parameters
- Failed to connect to bucket (for example, due to an incorrect region or endpoint URL)
- Incorrect access key
- Incorrect secret key
Specify whether to partition records into files based on their write_time (the default) or timestamp.

Every interflow record includes both of these fields:
- write_time indicates the time at which the Interflow record was actually created.
- timestamp indicates the time at which the action documented by the Interflow record took place (for example, the start of a session, the time of an update, and so on).
When files are written to the Data Sink they are stored at a path like the following, with separate files for each minute:

In this example, we see the path for November 9, 2021 at 00:23. The records appearing in this file would be different depending on the setting of the Partition time by setting as follows:
- If write_time is enabled, then all records stored under this path would have a write_time value falling into the minute of UTC 2021.11.09 - 00:23.
- If timestamp is enabled, then all records stored under this path would have a timestamp value falling into the minute of UTC 2021.11.09 - 00:23.
In most cases, you will want to use the default of write_time. It tends to result in a more cost-efficient use of resources and is also compatible with future use cases of data backups and cold storage using a data sink as a target.
Enable the Compression option to specify that records be written to the Data Sink in compressed (gzip) format.

For most use cases, Stellar Cyber recommends enabling the compression option to save on storage costs. Compression results in file sizes roughly 1/10^th the size of uncompressed files.
You can use the Retrieve starting from field to specify a date and time from which Stellar Cyber should attempt to write alert, asset, and user records to a newly created Data Sink. You can click in the field to use a handy calendar to set the time/date

Note the following:
- If you do not set this option, Stellar Cyber simply writes data from the time at which the sink is created.
- This option only affects alert, asset, and user records. Raw data is written from the time at which the sink is created regardless of the time/date specified here.
- If you set a time/date earlier than available data, Stellar Cyber silently skips the time without any available records.
Use the Time Format, Batch Window and Batch Size fields to specify how often data is written to the sink. The frequency with which data is written to the sink also affects the size of the files – the longer you wait to send files, the larger they will be.
- Start by setting the Time Format option. This specifies the units for the Batch Window and gives you access to different granularities for the files written to the Data Sink. You can specify Seconds, Minutes, or Hours (the default).
- The Batch Window specifies the maximum amount of time that can elapse before data is written to the Data Sink. By default, this is set to 6 hours. The values available depend on the units selected for Time Format:
  - If Time Format is set to Hours, you can select from 1, 4, 6, 12, or 24 hours. The 24 hour setting is the maximum granularity; after that, files start to become too large for efficient storage.
  - If Time Format is set to Minutes, you can select from 5, 10, 20, or 30 minutes. As you can see, each of these values divides evenly into an hour, giving you a precise idea of the number of files stored to the sink per hour by each worker. For example, with a setting of 30 minutes, there will be two files per hour for each worker in the sink.
  - If Time Format is set to Seconds, you can specify any value up to 60. This was the granularity supported in Data Sink version prior to 4.3.7.
  The Batch Window helps you balance granularity with costs when storing data in external cloud storage, where vendors often charge you by the API call.
  For example, writing to a data sink with a fine granularity expressed in seconds may result in excessive files and folders written to your external cloud storage and require you to incur the costs of a more expensive storage tier. By using a coarser granularity, you can ensure that the data files written to the cloud are larger and written less frequently. For example, the default granularity of six hours typically ensures that data files are larger than 128 KB, allowing you to take advantage of, for example, the less costly AWS S3 Intelligent Storage Tier. Contact Customer Success if you are interested in moving your Data Sink to a different storage tier.
  
  Note that after upgrading to 4.3.7, existing AWS and OCI data sinks with a Batch Window greater than 60 seconds are converted to the nearest available selection expressed in minutes or hours.
- The Batch Size specifies the maximum number of records that can accumulate before they are sent to the Data Sink. You can specify either 0 (disabled) or a number of records between 100 and 10,000.
  
  The Batch Size option is only available when the Batch Window is set to a per-second interval and Size Format is set to Records. This is the batching implementation used in versions prior to 4.3.7. For all other Batch Windows, the Batch Size is set to Unlimited and cannot be changed.
  
  If you use the per-second interval together with the Batch Size option, Stellar Cyber batches data to the Data Sink depending on whichever of these parameters is reached first. Consider a Data Sink with a Batch Window of 30 seconds and a Batch Size of 300 records:
  - If at the end of the Batch Window of 30 seconds, Stellar Cyber has 125 records, it sends them to the data sink. The Batch Window was reached before the Batch Size.
  - If at the end of 10 seconds, Stellar Cyber has 300 records, it send the 300 records to the Data Sink. The Batch Size was reached before the Batch Window.
Note that if you set a Batch Window of greater than 60 seconds, you must grant additional Multipart Upload permissions to the S3 user for this data sink.
You can use the Filter options to Exclude or Include specific Message Classes for the Data Sink. By default, Filter is set to None. If you check either Exclude or Include, an additional Message Class field appears where you can specify the message classes to use as part of the filter. For example:

You can find the available message classes to use as filters by searching your Interflow in the Investigate | Threat Hunting | Interflow Search page. Search for the msg_class field in any index to see the prominent message classes in your data.
Click Next to review the Data Sink configuration. Use the Back button to correct any errors you notice. When you are satisfied with the sink's configuration, click Submit to add it to the DP.
Click Submit.

The new Data Sink is added to the list.