🌙
 

Subscribe to the Taegis™ XDR Documentation RSS Feed at .

Learn more about RSS readers or RSS browser extensions.

Hunting with Jupyter Notebooks

hunting jupyter python taegis magic pandas dataframe


This documentation describes the tools and workflows that enable threat hunting operations using Jupyter Notebooks.

Secureworks uses Jupyter Notebooks to automate threat hunting procedures. Threat hunting procedures implemented as Jupyter Notebooks can:

Jupyter Notebook Integration

Secureworks released several open-source projects to integrate Taegis™ and Jupyter notebooks. These tools drive our threat hunting workflows:

Tool Purpose
Taegis™ SDK for Python Low-level Python client to call Taegis™ GraphQL APIs
Taegis™ Magic Convenience utilities to interact with Taegis™ from the command line and Jupyter Notebooks

Taegis™ SDK for Python

The Taegis™ SDK for Python is the official Python client to call Taegis™ GraphQL APIs.

Taegis™ SDK for Python is code generated from the GraphQL schema. GraphQL operations and complex types are represented as Python dataclasses. The static analysis features of Jupyter Notebooks, such as tab and auto-completion, natively integrate with dataclasses making it easy to browse available operations and construct input arguments from the notebook.

Note

While a deep dive into the fundamentals of GraphQL is outside the scope of this document, it is helpful to understand how GraphQL operations and types map to Python objects in the Taegis™ SDK for Python:

  • Each Taegis™ API exposes GraphQL operations that an authorized client is allowed to call. These operations include queries (read), mutations (write), and subscriptions (a protocol for bidirectional communication between the client and server over web sockets).
  • GraphQL allows API developers to define types, which are data structures that are used as both arguments to, and return values from, operations. GraphQL queries allow the client to request specific fields on the return type; this improves performance and reduces the number of round-trip API calls required for the client.
  • GraphQL APIs are self-describing. Metadata about all of the available GraphQL operations, types, and fields is defined in a schema and is exposed via introspection queries.

In the Taegis™ SDK for Python, each service module represents a logical grouping of GraphQL types and operations based on functionality. For example, the alerts service is a Python module that contains the queries, mutations, subscriptions, and types related to Taegis™ XDR alerts.

See the Taegis™ SDK for Python documentation for further details.

Taegis™ Magic Jupyter Integration

Taegis™ Magic Jupyter Integration provides a convenience layer to interact with Taegis™ APIs on the command line and from Jupyter Notebooks. Taegis™ Magic alleviates the need for notebook users to implement boilerplate logic, such as pagination, when trying to perform routine tasks. Taegis™ Magic is implemented as an IPython Magic, which is a special kind of macro supported in Jupyter Notebooks.

When Taegis™ Magic is used as a standalone command-line tool, results are returned to the console as JSON. When invoked from a Jupyter Notebook, results are returned as pandas DataFrames. A pandas DataFrame is a powerful two-dimensional data structure for manipulating and analyzing data in Python. Returning Taegis™ query results as pandas DataFrames allows hunters to quickly dissect the data to find evidence of a threat.

Taegis™ Magic can automatically construct XDR investigations while using a Jupyter Notebook. XDR investigations are a central part of the XDR analysis workflow and data model. They allow analysts or automated systems to organize relevant security information, such as events, alerts, assets, search queries, and key findings. XDR investigations are the primary medium to communicate findings and collaborate towards the resolution of a security incident. We leverage Jupyter Notebooks' native support for rich media output, such as markdown and HTML, to populate the key findings section of XDR investigations. The Taegis™ Magic Jupyter Integration allows notebook users to stage evidence in an XDR investigation based on resource identifiers found inside pandas DataFrames.

See the Taegis™ Magic Jupyter Integration documentation for further details.

Hunting Workflows

The following sections describe how we use Jupyter Notebooks to support common threat hunting workflows.

Exploratory Data Analysis (EDA)

EDA Hunting

EDA Hunting

Exploratory analysis is an ad hoc workflow when threat hunters rapidly iterate to test their hypotheses about how to find evidence of some threat in available data. Threat hunters use Taegis™ Magic from Jupyter Notebooks to query for relevant data, then analyze results as pandas DataFrames. The relevant information can be escalated as an XDR investigation.

  1. Load Taegis™ Magics notebook extension.
  2. Query for data.
  3. Analyze results.
  4. Stage evidence for an investigation.
  5. Create investigation.

Automating Hunting Procedures

Create Notebook

Create Notebook

If a threat hunter identifies a useful way to find evidence of some threat from the EDA workflow, then they can formalize the steps taken as a threat hunting procedure. In this context, a threat hunting procedure is a Jupyter Notebook that is intended to be reused and shared with other analysts.

Unlike a notebook used for free-form EDA, a threat hunting procedure notebook should contain canned language in markdown to help future threat hunters and customers understand the methodology. It should also include additional metadata to help organize the procedure in a catalog of hunting procedures managed under version control. See the Metadata section below for more information about suggested notebook metadata for threat hunting purposes.

Once a threat hunting procedure has been formalized in a Jupyter Notebook and committed to version control, other analysts can instantiate a copy of that notebook to repeat the procedure.

  1. Create new notebook.
  2. Add notebook metadata to catalog hunting procedure.
  3. Add markdown cells to create desired report structure.
  4. Add code cells to fetch and analyze relevant evidence.
  5. Add code cells to automate investigation creation.
  6. Commit notebook under version control.

Hunting at Scale

Hunting Fan Out

Hunting Fan Out

We treat threat hunting procedure notebooks as discrete units of work, usually written to function on a per-tenant basis. But we are able to run notebooks at scale across many tenants by taking a fan-out/fan-in approach.

We run a hunting procedure across many individual tenants, then review them in aggregate. Taegis™ Magic enables hunting at scale through its ability to cache query results. Once the threat hunting procedure notebook has finished executing across the tenants in scope, threat hunters can parse the cached query results. For tenants that do not have any evidence related to the threat, we can automatically create a null findings investigation to show our work even though no threats were identified. For tenants that have interesting search results related to the threat, we then assign it to a threat hunter for manual review. The threat hunter can simply open the already-executed copy of the notebook containing the cached query results.

  1. Select target tenants.
  2. Select hunting notebook.
  3. Execute notebook in parallel across tenants.
  4. Review results in aggregate.
  5. Assign to threat hunters for human analysis and/or create null findings investigations.

Managing Hunting Procedures

Version Control

We recommend storing threat hunting procedure notebooks under version control, such as git. Version control systems provide a plethora of benefits for threat hunting procedure notebooks, such as allowing multiple hunters to share and collaborate on content and tracking change history for a notebook over time. Many popular hosted version control systems, such as GitHub and Gitlab, also provide essential workflow tools in the form of issue tracking and peer review mechanisms.

Jupyter Notebooks can be challenging to store under version control because they are JSON documents, which are difficult to compare line-by-line. Some popular hosted version control systems have special support for notebook files to address these inconveniences. Alternatively, you can use tools like jupytext to convert notebooks into a structured markdown document format. We use jupytext to store all threat hunting notebooks in MyST markdown format.

Linting

Linting is a form of static analysis that reads source code and identifies potential problems, such as syntax and formatting errors. In the context of threat hunting notebooks, linting is extremely useful to ensure that threat hunting procedure notebooks contain necessary metadata, markdown content, and code content. We use pytest as a CI job to enforce conventions and quality standards across threat hunting notebooks in a given project.

In our experience, different threat hunting teams have unique needs and preferences, so a one-size-fits-all approach will likely meet resistance. It is helpful to write general purpose pytest fixtures to read notebooks, then ask each hunting team to implement their own linting rules.

Metadata

Jupyter Notebooks contain multiple kinds of metadata. The official Jupyter Notebook specification defines several required metadata fields, which determine the behavior of the notebook and how it is rendered in the browser. Jupyter Notebooks also support arbitrary JSON objects in the metdata for custom use cases.

In the context of threat hunting procedures, we use Jupyter Notebook metadata:

Notebook-Level Metadata

Notebook-Level Metadata

Notebook-Level Metadata

Required Notebook Metadata

All Jupyter Notebooks must contain kernelspec metadata. This metadata tells Jupyter which kernel to use when evaluating code cells. Kernels correspond to a programming language, such as Python, and a runtime environment where necessary dependencies are located.

Note

We recommend curating a shared Jupyter kernel for threat hunting purposes. This kernel should contain the taegis-sdk-python, taegis-magic, pandas, and any other Python packages commonly used in threat hunting procedures.

Hunting-Specific Notebook Metadata

Beyond the required metadata, we recommend the following notebook-level metadata for each threat hunting procedure notebook:

This information is useful when threat hunters need to search through a repository containing many threat hunting procedures.

Cell-Level Metadata

Each cell in a Jupyter notebook has its own metadata. The notebook interface is typically responsible for setting the required metadata fields. Cell metadata is used to determine how the browser should render the cell.

The most common kind of cell metadata that is relevant to users are cell tags. Tags are a JSON array of arbitrary string labels. The notebook web interface allows users to conveniently add and remove cell tags from web GUI.

In the context of threat hunting, there are two categories of custom tags that we recommend using:

Cell Tag Purpose
remove_input Removes the code input from the cell in the export
remove_output Removes the code output from the cell in the export
remove_cell Removes both inputs and outputs from the export

Info

The remove_cell tag is commonly used to display guidance that is only intended for the threat hunter and not the customer. This guidance is removed when exporting the notebook as the key findings section of an XDR investigation. The following is an example of using the nbconvert CLI to export a notebook without any cells tagged with remove_cell:

jupyter nbconvert --to markdown --no-input my-hunting-notebook.ipynb --TagRemovePreprocessor.enabled=True --TagRemovePreprocessor.remove_cell_tags remove_cell

Markdown Cells

Jupyter Notebooks natively support rendering HTML and markdown text. This feature allows notebooks to mix human-readable language with programmatically executable code.

In the context of threat hunting notebooks, the markdown cells are used for two important purposes:

The markdown content of a hunting notebook can be exported and used to populate the key findings section of an XDR investigation. Therefore, we include canned language that describes the what, why, and how of the threat hunting procedure as markdown text. We generally structure hunting notebooks into the following sections using markdown headings:

Section Purpose
Title Identifies the threat hunting notebook or procedure
Executive Summary High-level description of the threat and the performed hunting procedures
Technical Analysis Technical description of the performed hunting procedures
Findings Written by analysts upon completing hunting procedures
Recommendations Suggestions for remediation guidance, prevention, etc.

The structure and content of the markdown text should be adapted for the specific procedure and hunting use case.

For more information about XDR Investigation best practices, see our Knowledge Base article series.

Code Cells

Jupyter Notebooks can evaluate and execute code for many interpreted programming languages, the most popular of which is Python.

In the context of threat hunting notebooks, the code cells contain the automation for threat hunting procedures. These procedures typically involve:

 

On this page: