Insights and Data Gathering (Nebula)

Hubble can gather incredible amounts of raw data from your hosts for later analysis. The codename for the insights piece of hubble is Nebula. It primarily uses osquery which allows you to query your system as if it were a database.

Module Documentation

Nebula (


Nebula queries are formatted into query groups which allow you to schedule sets of queries to run at different cadences. The names of these groups are arbitrary, but the queries provided in hubblestack_data are grouped by timing:

    query: SELECT t.unix_time AS query_time, AS process, AS process_id, p.pgroup AS process_group, p.cmdline, p.cwd, p.on_disk, p.resident_size AS mem_used, p.user_time, p.system_time, (SELECT strftime('%s','now')-ut.total_seconds+p.start_time FROM uptime AS ut) AS process_start_time, p.parent, AS parent_name, g.groupname AS 'group', g.gid AS group_id, u.username AS user, u.uid AS user_id, eu.username AS effective_username, eg.groupname AS effective_groupname, p.path, h.md5 AS md5, h.sha1 AS sha1, h.sha256 AS sha256, '__JSONIFY__'||(SELECT json_group_array(json_object('fd',pof.fd, 'path',pof.path)) FROM process_open_files AS pof WHERE GROUP BY AS open_files, '__JSONIFY__'||(SELECT json_group_array(json_object('variable_name',pe.key, 'value',pe.value)) FROM process_envs AS pe WHERE GROUP BY AS environment FROM processes AS p LEFT JOIN processes AS pp ON LEFT JOIN users AS u ON p.uid=u.uid LEFT JOIN users AS eu ON p.euid=eu.uid LEFT JOIN groups AS g ON p.gid=g.gid LEFT JOIN groups AS eg ON p.gid=eg.gid LEFT JOIN hash AS h ON p.path=h.path LEFT JOIN time AS t WHERE p.parent IS NOT 2 AND (process NOTNULL OR p.parent NOTNULL);
    query: SELECT t.unix_time AS query_time, CASE WHEN 2 THEN 'ipv4' WHEN 10 THEN 'ipv6' ELSE END AS family, h.md5 AS md5, h.sha1 AS sha1, h.sha256 AS sha256, AS directory, ltrim(pos.local_address, ':f') AS src_connection_ip, pos.local_port AS src_connection_port, pos.remote_port AS dest_connection_port, ltrim(pos.remote_address, ':f') AS dest_connection_ip, AS name, AS pid, p.parent AS parent_pid, AS parent_process, p.path AS file_path, f.size AS file_size, p.cmdline AS cmdline, u.uid AS uid, u.username AS username, CASE pos.protocol WHEN 6 THEN 'tcp' WHEN 17 THEN 'udp' ELSE pos.protocol END AS transport FROM process_open_sockets AS pos JOIN processes AS p ON LEFT JOIN processes AS pp ON LEFT JOIN users AS u ON p.uid=u.uid LEFT JOIN time AS t LEFT JOIN hash AS h ON h.path=p.path LEFT JOIN file AS f ON f.path=p.path WHERE NOT pos.remote_address='' AND NOT pos.remote_address='::' AND NOT pos.remote_address='' AND NOT pos.remote_address='' AND (pos.local_port,pos.protocol) NOT IN (SELECT lp.port, lp.protocol FROM listening_ports AS lp);
    query: SELECT t.unix_time AS query_time, c.event, c.minute, c.hour, c.day_of_month, c.month, c.day_of_week, c.command, c.path AS cron_file FROM crontab AS c JOIN time AS t;
    query: SELECT t.unix_time AS query_time, l.username AS user, l.tty,, l.type AS utmp_type, CASE l.type WHEN 1 THEN 'RUN_LVL' WHEN 2 THEN 'BOOT_TIME' WHEN 3 THEN 'NEW_TIME' WHEN 4 THEN 'OLD_TIME' WHEN 5 THEN 'INIT_PROCESS' WHEN 6 THEN 'LOGIN_PROCESS' WHEN 7 THEN 'USER_PROCESS' WHEN 8 THEN 'DEAD_PROCESS' ELSE l.type END AS utmp_type_name, AS src, l.time FROM last AS l LEFT JOIN time AS t WHERE l.time > strftime('%s','now') - 3600;
    query: SELECT t.unix_time AS query_time, AS container_id, AS container_name, dc.image AS image_name, di.created AS image_created_time, di.size_bytes AS image_size, di.tags AS image_tags, dc.image_id AS image_id, dc.command AS container_command, dc.created AS container_start_time, dc.state AS container_state, dc.status AS status, '__JSONIFY__'||(SELECT json_group_array(json_object('key',dcl.key, 'value',dcl.value)) FROM docker_container_labels AS dcl WHERE GROUP BY AS container_labels, '__JSONIFY__'||(SELECT json_group_array(json_object('mount_type',dcm.type, 'mount_name',, 'mount_host_path',dcm.source,  'mount_container_path',dcm.destination, 'mount_driver',dcm.driver, 'mount_mode',dcm.mode, 'mount_rw',, 'mount_progpagation',dcm.propagation)) FROM docker_container_mounts AS dcm WHERE GROUP BY AS container_mounts, '__JSONIFY__'||(SELECT json_group_array(json_object('port_type',dcport.type, 'port',dcport.port, 'host_ip',dcport.host_ip,  'host_port',dcport.host_port)) FROM docker_container_ports AS dcport WHERE GROUP BY AS container_ports, '__JSONIFY__'||(SELECT json_group_array(json_object('network_name',, 'network_id',dcnet.network_id, 'endpoint_id',dcnet.endpoint_id, 'gateway',dcnet.gateway, 'container_ip',dcnet.ip_address, 'container_ip_prefix_len',dcnet.ip_prefix_len, 'ipv6_gateway',dcnet.ipv6_gateway, 'container_ipv6_address',dcnet.ipv6_address, 'container_ipv6_prefix_len',dcnet.ipv6_prefix_len, 'container_mac_address',dcnet.mac_address)) FROM docker_container_networks AS dcnet WHERE GROUP BY AS container_networks FROM docker_containers AS dc JOIN docker_images AS di ON LEFT JOIN time AS t;
    query: SELECT t.unix_time AS query_time,, rpm.version, rpm.release, rpm.source AS package_source, rpm.size, rpm.sha1, rpm.arch FROM rpm_packages AS rpm JOIN time AS t;
    query: SELECT t.unix_time AS query_time, os.* FROM os_version AS os LEFT JOIN time AS t;
    query: SELECT t.unix_time AS query_time, ia.interface, ia.address, id.mac FROM interface_addresses AS ia JOIN interface_details AS id ON ia.interface=id.interface LEFT JOIN time AS t WHERE NOT ia.interface='lo';

Nebula query data is very verbose, and is really meant to be sent to a central aggregation location such as splunk or logstash for further processing. However, if you would like to run the queries manually you can call the nebula execution module:

hubble nebula.queries day
hubble nebula.queries hour verbose=True


Nebula supports organizing query groups across files, and combining/targeting them via a top.nebula file (similar to topfiles in SaltStack):

  - '*':
      - hubblestack_nebula_queries
  - 'G@splunk_index:some_team':
      - some_team

Each entry under nebula is a SaltStack style compound match that describes which hosts should receive the list of queries. All queries are merged, and conflicts go to the last-defined file.

The files referenced are relative to salt://hubblestack_nebula_v2/ and leave off the .yaml extension.

You can also specify an alternate top.nebula file.

For more details, see the module documentation: Nebula (