Real-time Reporting

Overview

The Real-Time Reporting Service enables data analysis of data collected from multiple sources over time. The service provides:

  • A means to retrieve metadata about existing data tables.
  • A means to create, update, and delete data tables.
  • A query mechanism for event data that allows users to discover and interpret meaningful patterns in the data.

The query mechanism excels at aggregating time-based data, and allows for multi-dimensional operations such as limiting, grouping, and filtering. Depending on the data that you have submitted to the system, you might use the Real-Time Reporting Service's query functionality to determine, during a specified time period:

  • The two most popular browsers used by a particular type of user
  • The most frequently-viewed product categories grouped by day for the last week
  • The total number of successful checkout events in the last month
  • The total number of site visitors for each hour during the previous day

NOTE: Data is automatically ingested into the Real-Time Reporting system via the Profile Edge service. If you create a table whose name and type are both 'clickstream' or are both 'commerce', relevant events submitted to the Profile Edge service will be ingested into this table.

NOTE: It is strongly recommended that PII (Personally Identifiable Information) be excluded from data submitted to the Real-Time Reporting Service.


API Reference

/{tenant}

/{tenant}/tables

get

Returns a description of each table defined for the tenant. Requires a hybris.realtime_reporting_view scope.

/{tenant}/tables/{tableId}

A table represents a logical grouping of dimension columns as well as an input specification.

get

Retrieves the table definition (metadata). Requires a hybris.realtime_reporting_view scope.

put

Creates or replaces a table definition, which defines table dimension columns and parameters for dimension data. If replacement excludes a pre-existing dimension column, the associated data will be maintained but will not be accessible through subsequent queries. Requires a hybris.realtime_reporting_manage scope.

/{tenant}/tables/{tableId}/query

post

Performs the provided query, targeting the specified tenant and table, and returns a JSON representation of the query results. Requires a hybris.realtime_reporting_view scope.

/{tenant}/tables/{tableId}/events

patch

Posts a request to edit or delete data from a specified table. Such modification requests are executed periodically. Requires a hybris.realtime_reporting_manage scope.


Data Ingestion

The Real-time Reporting Service can automatically ingest event data submitted to the Edge Service. This includes the following event types:

  • Clickstream and page-load data representing user interaction with your web interface.
  • Commerce data about customer transactions.

Event data is ingested by the Real-time Reporting Service if a Real-time Reporting table associated with that data has been created.

Transformation of JSON Values to Target Column Type

In order to validate input data and guarantee ingestion, it is strongly recommended that event properties destined for numeric columns have an appropriate numeric JSON type. Properties targeting 'Long'-type columns should be expressed as an integral number in the input JSON; properties targeting 'Float'-type columns should be expressed as a JSON 'number' type.

The Real-time Reporting Service currently offers implicit conversion of String values to numeric columns, as long as the String presentation can be parsed into the target type. For instance, the String "1234.67" cannot be parsed into a Long column, because it represents a decimal value.

At this time, the service will strip any commas from Strings targeting numeric columns prior to attempting a conversion. The period symbol '.' will be treated as decimal point. This implicit conversion behavior may change in future releases and should be considered "experimental".

Messages containing String properties that cannot be successfully parsed will be rejected entirely at ingestion time.


Table Queries

The Real-time Reporting Service provides a rich query language to aggregate ingested data in a wide variety of ways.

As we explore the individual aspects of the query language, we will use some sample data to show how different elements of a query affect the data returned.

Example

The following represents a sample dataset ingested into the Real-time Reporting Service:

timestampidbrowserreferrer_domaincategory
2017-04-01T15:00:00Zevent1chromegoogle.comshoes
2017-04-01T15:01:00Zevent2firefoxfacebook.comelectronics
2017-04-01T15:03:00Zevent3chromegoogle.comelectronics
2017-04-02T11:03:00Zevent4unknowncnn.comshoes
2017-04-02T11:05:00Zevent5unknowncnn.comfood
2017-04-02T11:07:00Zevent6safaricnn.comfood
2017-04-02T11:09:00Zevent7unknowncnn.comelectronics
2017-04-02T11:11:00Zevent8nullcnn.comshoes

The following is an example query for this data that performs the following aggregation:

  • Retrieves data within an interval between April 1 (inclusive) and April 3, 2017 (exclusive)
  • Counts the number of page views per browser
  • Orders results by page views, from highest to lowest
  • Excludes results wherein the value of the "browser" column is null
  • Limits results to four records maximum
{
    "interval": "2017-04-01/2017-04-03",
    "granularity": "all",
    "dimensions": ["browser"],
    "aggregations": [
        {"type": "count", "name": "page_views"}
    ],
    "order": [
        {"column": "page_views", "direction": "descending"}
    ],
    "limit": 4,
    "filter": { "type": "notequal", "column": "browser", "value": null }
}

The above query produces the following results:

[
  {
    "page_views": 3,
    "browser": "unknown"
  },
  {
    "page_views": 2,
    "browser": "chrome"
  },
  {
    "page_views": 1,
    "browser": "firefox"
  },
  {
    "page_views": 1,
    "browser": "safari"
  }
]

Parameters

This section describes in detail the seven elements, or parameters, used in a Table Query.

Interval

REQUIRED An interval represents the time period to target for a query. Only events that fall within the provided interval are considered in the query. The interval must be a string representing an ISO-8601 interval. Each of the following examples represents an interval of two days, starting on April 1st, 2017:

  • 2017-04-01T00:00:00Z/2017-04-03T00:00:00Z
  • 2017-04-01/2017-04-03
  • 2017-04-01/P2D

NOTE that if an interval specifies a start and end date (as in the first two examples above), then the first date is inclusive and the second date is exclusive.

Read more about ISO-8601 time intervals here.

Granularity

REQUIRED A granularity represents the timeframe into which events are grouped. For example: A granularity of hour will return data grouped by hour. The granularity must be one of the following strings:

  • all
  • hour
  • day
  • week
  • month

Note that a value of all yields no timeframe grouping.

Dimensions

The dimensions element defines returned data groups by dimension. The dimensions element must be represented as a comma-delimited array of strings referencing one or more dimensions. If no dimensions are specified in a query, then no grouping occurs and the raw events are returned.

In our example query we specify grouping by a single browser dimension:

{
    ...
    "dimensions": ["browser"],
    ...
}

You can also group data using more than one dimension. The example below groups results in unique combinations of browser and device_type:

{
    ...
    "dimensions": ["browser", "device_type"],
    ...
}

Limit

A limit is an integer that defines the maximum number of records the query will return. The maximum number of return values allowed by the service is 10,000.

In our example query we specify a limit of 4:

{
    ...
    "limit": 4,
    ...
}

Aggregations

The aggregations element is a comma-delimited array of objects that defines one or more operations to be applied to retrieved data. Each such object contains a type, name, and optional column field.

  • name : An arbitrary label for the aggregation which can be referenced in the order element.
  • column : Identifies the data column to which aggregation is applied.
  • type : Defines the type of aggregation to be applied, and can be one of the following values:
    • count : Yields a count of value instances.
    • count_distinct : Yields an estimated count of unique instances of a value.
    • max : Yields the maximum value, according to the natural ordering of the data type.
    • mean : Yields the mean value, according to the natural ordering of the data type.
    • min : Yields the minimum value, according to the natural ordering of the data type.
    • sum : Yields the sum of data values.
    • stddev: Yields the standard deviation of data values.
    • variance: Yields the variance of data values.

In our example query we request row counts, and define the name page_views to label such counts:

{
    ...
    "aggregations": [ {"type": "count", "name": "page_views"} ]
    ...
}

You can specify multiple aggregations. The following returns row counts as page_views, and returns counts of distinct referrer_domain values as distinct_referrer_domains:

{
    ...
    "aggregations": [
        { "type": "count", "name": "page_views" },
        { "type": "count_distinct", "name": "distinct_referrer_domains", "column": "referrer_domain"},
    ]
    ...
}

NOTE: A query with no "aggregations" element retrieves individual records, and thus may yield a very large result set and reduced performance.

Order

An order defines the order of returned results. The order is represented as a comma-delimited array of objects; multiple objects yield hierarchical ordering. Each such object must contain a column and direction field.

  • column : Identifies the data column to which ordering is applied, and can reference a dimension or an aggregation name.
  • direction : Defines the sort order of the returned data, using one of the following values:
    • ascending
    • descending

In our example query we specify that we want the results to be returned in descending order of page_views:

{
    ...
    "order": [
        {"column": "page_views", "direction": "descending"}
    ]
    ...
}

Here the order element references a dimension (referrer_domain), and aggregation names (page_views, distinct_referrer_domains):

{
    ...
    "dimensions": [ "referrer_domain", "session_id"],
    "aggregations": [
        { "type": "count_distinct", "name": "distinct_referrer_domains", "column": "referrer_domain"}
    ],
    "order": [
        {"column": "referrer_domain", "direction": "descending"},
        {"column": "page_views", "direction": "descending"},
        {"column": "distinct_referrer_domains", "direction": "descending"}
    ]
    ...
}

Filter

A filter includes or excludes matched data from the retrieved set by applying rules represented as objects. Each rules object must contain a type, column, and value field.

  • type : Defines a condition to be met; valid values are 'equal' and 'notequal'.
  • column : Identifies the data column to which the rule applies.
  • value : Defines the data value targeted by the rule.

In our example query we specify a single filter that includes only data whose browser column value is not null:

{
    ...
    "filter": { "type": "notequal", "column": "browser", "value": null }
    ...
}

Rules, and sets of rules, optionally can be concatenated with the conjunctions 'and' and 'or' to create complex logic. The following returns data wherein: browser is not null and category is not "shoes":

{
    ...
    "filter": {
        "and": [
          { "type": "notequal", "column": "browser", "value": null },
          { "type": "notequal", "column": "category", "value": "shoes" }
        ]
    }
    ...
}

The following returns data wherein: (browser is not null) and (category is "shoes" or category is "food"):

{
    ...
     "filter": {
        "and": [
          { "type": "notequal", "column": "browser", "value": null },
          { "or": [
            { "type": "equal", "column": "category", "value": "shoes" },
            { "type": "equal", "column": "category", "value": "food" }
          ]}
        ]
    }
    ...
}


Table Metadata Retrieval

The Real-time Reporting Service enables retrieval of metadata about existing data tables. Table metadata comprises:

  • name: The required value is either 'clickstream' or 'commerce'.
  • description
  • ingestionType: The required value is either 'clickstream' or 'commerce'.
  • columns: A List of objects representing columns, each comprising:
    • name
    • description
    • type: Valid column types are 'string', 'long', 'float', and 'timestamp'. Exactly one 'timestamp' column must exist.
    • path: An XPath representing the path to the column in the input JSON data. For example, '$.x.y' indicates that this column is associated with a 'y' element in the JSON that is embedded in an 'x' element. Note that the column name can differ from the path location: A column named 'color' can be associated with an element named 'y' in the JSON.
    • collection: 'false' if the column holds a single data value, and 'true' if it holds an array of values.


Table Creation and Replacement

The Real-time Reporting Service enables creation and replacement of data tables.

Note that the value of the 'tableId' URI path of the PUT request will become the table name. Once created, the table name cannot be altered.

The table schema defines table columns and parameters for column data. The basic schema for a 'clickstream' table is as follows:

{
  "description":"Profile clickstream events",
  "ingestionType":"clickstream",
  "columns": [
    {
      "name": "action_name",
      "type": "string",
      "description": "Identifier for the type of event; e.g. 'CartUpdate', 'ProductDetailPageViewEvent'"
    },
    {
      "name": "cookies_enabled",
      "type": "string",
      "description": "Indicates whether the source browser has cookies enabled",
      "path": "$.cookie"
    },
    {
      "name": "browser",
      "type": "string",
      "description": "A string representing the source browser, including major and minor version; e.g. 'Chrome_21_1_5'"
    },
    {
      "name": "browser_no_minor",
      "type": "string",
      "description": "A string representing the source browser, including major version only; e.g. 'Chrome_21'"
    },
    {
      "name": "browser_no_version",
      "type": "string",
      "description": "A string representing the source browser, excluding version information; e.g. 'Chrome'"
    },
    {
      "name": "browser_type",
      "type": "string",
      "description": "Indicates the type of the source browser; must be one of: 'browser', 'mobile_browser', 'email_client', 'application', 'unknown'"
    },
    {
      "name": "date",
      "type": "timestamp",
      "description": "Indicates the event timestamp, in ISO-8601 format"
    },
    {
      "name": "device_type",
      "type": "string",
      "description": "Indicates the type of the source device; must be one of: 'computer', 'mobile, 'tablet, 'game_console, 'digital_media_receiver, 'wearable_computer, 'unknown'"
    },
    {
      "name": "language",
      "type": "string",
      "description": "Represents the source language as an Accept-Language string; e.g. 'en', 'de'"
    },
    {
      "name": "locale",
      "type": "string",
      "description": "Represents the source language and locale as an Accept-Language string; e.g. 'en-US', 'de-CH'"
    },
    {
      "name": "operating_system",
      "type": "string",
      "description": "Indicates the source operating system and its version number; e.g. 'Windows_95'",
      "collection": false
    },
    {
      "name": "operating_system_no_version",
      "type": "string",
      "description": "Indicates the source operating system, excluding version number; e.g. 'Windows'"
    },
    {
      "name": "piwik_id",
      "type": "string",
      "description": "An internally-generated string identifying the event"
    },
    {
      "name": "referrer",
      "type": "string",
      "description": "Represents the full URL of the referring site; e.g. 'http://m.facebook.com/ad/12345'",
      "path": "$.urlref"
    },
    {
      "name": "referrer_domain",
      "type": "string",
      "description": "Represents the domain of the referring site; e.g. 'facebook.com'",
      "collection": false
    },
    {
      "name": "screen_resolution",
      "type": "string",
      "description": "Represents the raw resolution of the source display, as [width]x[height]; e.g. '1920x1200'",
      "path": "$.res"
    },
    {
      "name": "search",
      "type": "string",
      "description": "Contains the literal search string applied, if any; e.g. 'mineral content, density'"
    },
    {
      "name": "search_category",
      "type": "string",
      "description": "Represents the category of a search indicated by the 'search' term; e.g. 'Snow'",
      "path": "$.search_cat"
    },
    {
      "name": "session_id",
      "type": "string",
      "description": "An internally-generated string identifying the current session",
      "path": "$.id"
    },
    {
      "name": "url",
      "type": "string",
      "description": "Indicates the URL of the event source"
    },
    {
      "name": "profile_id",
      "type": "string",
      "description": "References the event's profile stored in the profile service"
    },
    {
      "name": "guid",
       "type": "string",
       "description": "Unique identifier for the event. Auto-generated at ingestion time"
    },
    {
       "name": "profile_trace_id",
       "type": "string",
       "description": "Profile Trace Id. Auto-populated at ingestion time"
    }
  ]
}

The basic schema for a 'commerce' table is as follows:

{
  "description":"ECP commerce events",
  "ingestionType":"commerce",
  "columns": [
    {
      "name": "date",
      "type": "timestamp",
      "description": "Indicates the event timestamp, in ISO-8601 format"
    },
    {
      "name": "event_type",
      "type": "string",
      "path": "$.type",
      "description": "Identifies the type of event: 'order', 'order shipment', 'return', 'partial return', 'cancellation', 'account registration', or 'login'"
    },
    {
      "name": "channel_ref",
      "type": "string",
      "path": "$.channelRef",
      "description": "References the channel for the event; e.g. 'webshop'"
    },
    {
      "name": "consumer_ref",
      "type": "string",
      "path": "$.consumer.ref",
      "description": "References the consumer that the event was issued for; e.g. 'someone@mail.com'"
    },
    {
      "name": "consumer_type",
      "type": "string",
      "path": "$.consumer.type",
      "description": "Indicates the consumer reference type; e.g. 'email'"
    },
    {
      "name": "item_index",
      "type": "expansion",
      "path": "$.body.lineItems[*]",
      "description": "The presence of this column indicates that the body.lineItems arrays in the event will be expanded, yielding a unique record for each array entry. The value of this column indicates the index of a given line item record. For example, an event with two line items will yield two unique line item records with 'item_index' equal to '0' and 'item_index' equal to '1', respectively",
    },
    {
      "name": "item_ref",
      "type": "string",
      "path": "$.body.lineItems[*].ref",
      "description": "Defines the path to a line item reference ID; e.g. 'ABCDEF1'"
    },
    {
      "name": "item_type",
      "type": "string",
      "path": "$.body.lineItems[*].type",
      "description": "Indicates the type of a particular line item; e.g. 'Product'"
    },
    {
      "name": "item_pos",
      "type": "long",
      "path": "$.body.lineItems[*].pos",
      "description": "Indicates the position of an item in the array of line items. Synonymous with the 'item_index' column; e.g. '1' represents the second entry in an array"
    },
    {
      "name": "item_quantity",
      "type": "long",
      "path": "$.body.lineItems[*].quantity",
      "description": "Indicates the quantity of a particular line item; e.g. '2'"
    },
    {
      "name": "item_unit",
      "type": "string",
      "path": "$.body.lineItems[*].unit",
      "description": "Indicates the unit applied to a particular line item; e.g. 'kg', 'pieces'"
    },
    {
      "name": "item_price_list",
      "type": "float",
      "path": "$.body.lineItems[*].price_list",
      "description": "Contains the original listed price for a particular line item; e.g. '17.00'"
    },
    {
      "name": "item_price_effective",
      "type": "float",
      "path": "$.body.lineItems[*].price_effective",
      "description": "Contains the effective price for a particular line item; e.g. '15.00'"
    },
    {
      "name": "item_currency",
      "type": "string",
      "path": "$.body.lineItems[*].currency",
      "description": "Indicates the currency of a particular line item; e.g. 'EUR'"
    },
    {
      "name": "item_tax_amount",
      "type": "float",
      "path": "$.body.lineItems[*].taxAmount",
      "description": "Contains the tax amount as a percentage for a particular line item; e.g. '17%'"
    },
    {
      "name": "item_status",
      "type": "string",
      "path": "$.body.lineItems[*].status",
      "description": "Indicates the status of a particular line item within the entire order; e.g. 'cancellation requested' or 'pre-order'"
    },
    {
      "name": "promotion_info_ref",
      "type": "string",
      "path": "$.body.promotionInfo.ref",
      "description": "References a particular promotion; e.g. 'xmas2017abc'"
    },
    {
      "name": "promotion_info_type",
      "type": "string",
      "path": "$.body.promotionInfo.type",
      "description": "Indicates the type of promotion; e.g. 'absolute reduction'"
    },
    {
      "name": "payment_info_payment_type",
      "type": "string",
      "path": "$.body.paymentInfo.paymentType",
      "description": "Indicates the type of payment used for an order; e.g. 'Master'"
    },
    {
      "name": "payment_info_status",
      "type": "string",
      "path": "$.body.paymentInfo.status",
      "description": "Indicates the status of an order payment; e.g. 'requested'"
    },
    {
      "name": "status",
      "type": "string",
      "path": "$.body.status",
      "description": "Indicates the status of an order; e.g. 'new'"
    },
    {
      "name": "order_id",
      "type": "string",
      "path": "$.body.orderId",
      "description": "Represents an individual order; e.g. '123abc'"
    },
    {
      "name": "shipment_info_address_first_name",
      "type": "string",
      "path": "$.body.shipmentInfo.address.firstName",
      "description": "Contains the first name of the addressee; e.g. 'Joe'"
    },
    {
      "name": "shipment_info_address_last_name",
      "type": "string",
      "path": "$.body.shipmentInfo.address.lastName",
      "description": "Contains the last name of the addressee; e.g. 'Smith'"
    },
    {
      "name": "shipment_info_address_addition",
      "type": "string",
      "path": "$.body.shipmentInfo.address.addition",
      "description": "Contains additional information for the shipment address; e.g. the company name"
    },
    {
      "name": "shipment_info_address_street",
      "type": "string",
      "path": "$.body.shipmentInfo.address.street",
      "description": "Contains the street name of the shipment address; e.g. 'Broadway'"
    },
    {
      "name": "shipment_info_address_number",
      "type": "string",
      "path": "$.body.shipmentInfo.address.number",
      "description": "Contains the address number of the shipment address; e.g. '1234'"
    },
    {
      "name": "shipment_info_address_zip",
      "type": "string",
      "path": "$.body.shipmentInfo.address.zip",
      "description": "Contains the zip code of the shipment address; e.g. '80302'"
    },
    {
      "name": "shipment_info_address_city",
      "type": "string",
      "path": "$.body.shipmentInfo.address.city",
      "description": "Contains the city of the shipment address; e.g. 'Boulder'"
    },
    {
      "name": "shipment_info_address_country",
      "type": "string",
      "path": "$.body.shipmentInfo.address.country",
      "description": "Contains the country of the shipment address; e.g. 'Germany'"
    },
    {
      "name": "shipment_info_carrier",
      "type": "string",
      "path": "$.body.shipmentInfo.carrier",
      "description": "Indicates the order shipment carrier; e.g. 'UPS'"
    },
    {
      "name": "shipment_info_tracking_ref",
      "type": "string",
      "path": "$.body.shipmentInfo.trackingRef",
      "description": "References the tracking information for the order shipment; e.g. 'abc123XYZ'"
    },
    {
      "name": "shipment_info_status",
      "type": "string",
      "path": "$.body.shipmentInfo.status",
      "description": "Indicates the status of the order shipment; e.g. 'not delivered'"
    },
    {
      "name": "order_value",
      "type": "float",
      "path": "$.body.orderValue",
      "description": "Contains the total order value; e.g. '114.00'"
    },
    {
      "name": "currency",
      "type": "string",
      "path": "$.body.currency",
      "description": "Indicates the currency of an order; e.g. 'EUR'"
    },
    {
      "name": "profile_id",
      "type": "string",
      "description": "References the event's profile stored in the profile service"
    },
    {
       "name": "guid",
       "type": "string",
       "description": "Unique identifier for the event. Auto-generated at ingestion time"
    },
    {
       "name": "profile_trace_id",
       "type": "string",
       "description": "Profile Trace Id. Auto-populated at ingestion time"
    },
  ]
}

If table replacement excludes a pre-existing column, the associated data will be maintained but will not be accessible through subsequent queries. Renaming columns is not supported at this time.

Supported column types are:

  • String
  • Long
  • Float
  • Timestamp

NOTE that only 'String'-type columns can be flagged with "collection":true; all other column types must be single-valued.

The 'path' attribute defines the XPath expression that will be used to populate the column during ingestion, based on the supplied event data in JSON format. If no 'path' is defined, the path value defaults to the 'name' of the column; for example, if 'name' is 'url', the 'path' value defaults to ''$.url'.

If a column is flagged as "collection", data ingested into this column must be in the form of a JSON array of comma-delimited values, at the XPath location indicated.

Mandatory Id Columns

RTR GUID In order to support GDPR requirements for record editing, a column named guid of type string must be included in the table spec. The path may be omitted. Values for this column will be generated by Real-time Reporting at ingestion time.

Profile Trace Id In order to track events throughout profile, a column named profile_trace_id of type string must be included in the table spec. The path may be omitted. Values for this column will be populated by Real-time Reporting at ingestion time.


Table Data Editing

The Real-time Reporting service allows you to delete and edit previously-ingested table data.

The Real-time Reporting system caches and executes data modification requests periodically. Thus, there might be a significant delay between the submission of the request and the update of the target data.

Delete data

A request to delete data requires a filter that specifies a column:value pair. The column:value pair defines all records to delete within a given tenant table. For example, this message deletes all data where profileId column takes the value abc.

{
  "type": "delete",
  "filter": {
    "type": "equal",
    "column": "profileId",
    "value": "abc"
  }
}

Edit data

A request to edit data requires a filter that specifies the guid of a single record in the database. In addition, the request provides a modifiedEvent key-value map of all record properties to reflect in the updated entry. The value map specifies the guid and the same timestamp as the record to modify. For example, this data edit message specifies that the record stored for the guid has two properties: productId with the value abc123, and productDescription with the value Tennis Shoes.

{
  "type": "edit",
  "filter": {
    "type": "equal",
    "column": "guid",
    "value": "61bc5b52-73e6-4dde-bb28-f361267b27ad"
  },
  "modifiedEvent": {
    "guid": "61bc5b52-73e6-4dde-bb28-f361267b27ad",
    "time": "2017-05-09T18:24:00.000Z",
    "productId": "abc123",
    "productDescription": "Tennis Shoes"
  }
}


Headers

Messages posted to the Real-time Reporting Service as a POST request have the following headers:

NameRequired?DescriptionExample
AuthorizationYesThe bearer token issued by yaas.ioBearer 020-943da2ff-0092-4228-bb4c-b44bfa0f4cb8
Content-TypeFor POST and PUT requests onlyIdentifies the data type of the submitted contentapplication/json


Array Expansion

The Realtime Reporting Service currently does not support a truly relational data design. Many events, however, contain compound objects in the form of array elements. This example represents an order in which the customer purchases two items: a Canon camera costing 99.99, and a Nikon flash costing 5.50:

Order Event

{
  "order_id" : 123,
  "items" : [
    { "name" : "camera", "brand" : "canon", "price" : 99.99 },
    { "name" : "flash", "brand" : "nikon", "price" : 5.50 }
  ]
}

In order to store these nested event details in columnar fashion, this single items array is expanded to yield two independent order events, one for each item purchased. Realtime Reporting offers an array-expansion feature to achieve this, yielding the following data. Note that the items_index column, which tracks the order of the expanded records, is defined in table metadata rather than in the order event:

Order Data

order_iditems_indexitem_nameitem_branditem_price
1230cameracanon99.99
1231flashnikon5.50

The table's schema definition enables the array expansion, and can target one or more arrays for expansion. For each expanded array, the schema must define an index column, for instance, items_index in Order Data above. The system auto-generates index values during array expansion, starting at zero.

Enabling Array Expansion

For each array you expand, define:

  • An index column that targets the array you expand. In the example, items_index is the index column.
  • A column for each of the array entry's attributes included in the expanded data. In the example, item_name is the data column label for the array entry's name attribute, and item_brand is the data column label for the array entry's brand attribute.

The XPath expression 'items[*]' indicates "select all elements of the array found at root element 'items'". This XPath format is used to target arrays that are to be expanded. Thus, the path for a given array's index column must end in the array selector [*].

Order Schema

{
  [...],
  "columns": [
    [...],
     {
       "name": "items_index",
       "path": "$.items[*]",
       "type": "expansion"
     },
     {
       "name": "item_name",
       "path": "$.items[*].name",
       "type": "string"
     },
     {
       "name": "item_brand",
       "path": "$.items[*].brand",
       "type": "string"
     },
     [...]
  ]
}

Index Column Schema Rules

  • You must set the column type to expansion to indicate that this column serves as the index column for array expansion
  • You must set the collection attribute to 'false' (property will default to false if omitted)
  • You must define the path attribute path to the array that the system expands

As best practice, form the index column name by concatenating the name of the expanded array with the suffix '_index'. For example, items_index is the index column name for an expanded items array.

Child Arrays and Peer Arrays

Child Arrays

An event array may contain one or more child arrays. You can define a child array as expandable if its parent array is a top-level expandable array. Here, the event's expandable items array contains an expandable child array of variantOptions.

Child Array Event

{
  "order_id": 123,
  "items": [
    {
      "product": "socks",
      "variantOptions": [
        {
          "name": "size",
          "value": "large"
        },
        {
          "name": "color",
          "value": "white"
        }
      ]
    },
    {
      "product": "Core Workout",
      "variantOptions": [
        {
          "name": "format",
          "value": "MPEG"
        },
        {
          "name": "length",
          "value": "90 min"
        },
        {
          "name": "publisher",
          "value": "Fit4Life"
        }
      ]
    }
  ]
}

The schema defines expandable child arrays using the same rules and format applied to expandable parent arrays:

Child Array Order Schema

{
  [...],
  "columns": [
    [...],
    {
       "name": "variantOptions_index",
       "path": "$.items[*].variantOptions[*]",
       "type": "expansion"
    },
    {
       "name": "variant_name",
       "path": "$.items[*].variantOptions[*].name",
       "type": "string"
    },
    {
       "name": "variant_value",
       "path": "$.items[*].variantOptions[*].value",
       "type": "string"
    },
    [...]
  ]
}

The Child Array Event and Child Array Order Schema examples above result in the following expanded data store:

order_iditems_indexproductvariantOptions_indexvariant_namevariant_value
1230socks0sizelarge
1230socks1colorwhite
1231Core Workout0formatMPEG
1231Core Workout1length90 min
1231Core Workout2publisherFit4Life

Peer Arrays

An event can contain multiple arrays at the same level, known as 'peer' arrays. This example shows the peer arrays, items and payment:

Peer Array Event

{
  "order_id": 123,
  "items": [
    {
      "product": "socks",
      "sku": "abc"
    },
    {
      "product": "Core Workout",
      "sku": "efg"
    }
  ],
  "payment": [
    {
      "type": "giftCard",
      "amount": 25.00
    },
    {
      "type": "creditCard",
      "amount": 30.98
    }
  ]
}

If the schema defines multiple peer arrays as expandable, the resulting data for one expanded array excludes the data from its peer arrays. In this example, the first two columns include expanded items data and exclude payment data, and the final two columns include expanded payment data and exclude items data:

Peer Array Data

order_iditems_indexproductskupay_indexpay_typepay_amount
1230socksabc
1231Core...efg
1230giftCard25.00
1231creditCard30.98

Combining Child and Peer Arrays

You can combine child array and peer array expansions. At this time, you can expand parent and direct child arrays, and no more than 3 peer arrays at any given hierarchy level.

Querying Expanded Arrays

Consider array expansion when defining your queries, because the resulting data rows can share duplicate information. In the following example, an event contains duplicated data in the order_id and order_total fields in the expanded data store:

{
  "order_id" : 123,
  "order_total": 105.49
  "items" : [
    { "name" : "camera", "price" : 99.99 },
    { "name" : "flash", "price" : 5.50 }
  ]
}
order_iditems_indexorder_totalitem_nameitem_price
1230105.49camera99.99
1231105.49flash5.50

To count the number of unique order_id values, use the 'count_distinct' aggregation:

  [...]
  "aggregations": [
    { "type": "count_distinct", "name": "distinct_order_ids", "column": "order_id"}
  ]
  [...]

To retrieve the unique order_total value, target only rows where the index value is '0':

  [...]
  "filter": { "type": "equal", "column": "items_index", "value": 0 }
  [...]

NOTE

Using the expansion feature may drastically increase the number of database records that are created from a single ingestion event. At this time, the system will discard any ingestion event that would result in more than 1,000 individual records.


Prerequisites

  • You must have a tenant
  • Your tenant must be subscribed to the Realtime Reporting package
  • Your tenant must have a Client with the scopes hybris.realtime_reporting_view and hybris.realtime_reporting_manage


Setup

Generate a Bearer Token as follows:

  • Browse to your tenant in BUILDER
  • Browse to Development > Clients > [Client with realtime_reporting scopes]
  • Under GENERATE CREDENTIALS, click GENERATE ACCESS TOKEN
  • Select the correct Requested Region
  • Under Select Scopes, choose hybris.realtime_reporting_view
  • Click GENERATE and copy the resulting string, including the prefix "Bearer".


Execution

In a separate browser tab, browse to the SERVICE URL indicated at the top of this tutorial screen.

VIEW TABLES

For this example, use a table whose name and ingestionType are both 'clickstream'.

To determine whether such a table pre-exists, attempt to view the 'clickstream' table:

  • In the /{tenant}/tables/{tableId} row, click GET
  • In the tableId field, enter the name of the target table; for instance: 'clickstream'
  • In the tenant and hybris-tenant fields, enter the name of your tenant
  • In the Authorization field, enter the copied value of your access token, including the prefix "Bearer"

If the table exists, the query will yield a 200 response and a representation of the table similar to the following:

{
    "name": "clickstream",
    "description": "yProfile clickstream events",
    "ingestionType": "clickstream",
    "status": "active",
    "columns": [
        {
            "name": "action_name",
            "type": "string",
            "path": "$.action_name",
            "collection": false
        },
        {
            "name": "date",
            "type": "timestamp",
            "path": "$.date",
            "collection": false
        },
        [etc.]
    ]
}

The status attribute of the returned table indicates that it is 'active': available for use. Other possible status values are 'deleting' (in the process of being deleted) and 'error'. If the returned table's status is 'error', please contact support.

CREATE/REPLACE A TABLE

If no 'clickstream' table exists, you can create a new one by submitting a JSON request with the structure described in the VIEW TABLES section above. If a 'clickstream' table pre-exists, you can replace it with one whose structure matches your needs. Both goals can be met with the same process, described here.

  • In the /{tenant}/tables/{tableId} row, click PUT
  • In the tableId field, enter 'clickstream' - the required table name
  • In the tenant and hybris-tenant fields, enter the name of your tenant
  • In the Authorization field, enter the copied value of your access token, including the prefix "Bearer"
  • In the BODY section, enter the JSON representation of your table. A default representation is included in this section. NOTE that a timestamp-type column is required, and that - if you are replacing an existing table - this column cannot be renamed.

The request should yield a 204 response, indicating that your table is being created or replaced. After a short waiting period, your table will be available.

QUERY FOR DATA

  • In the /{tenant}/tables/{tableId}/query row, click POST
  • Enter 'clickstream' in the tableId field
  • In the tenant and hybris-tenant fields, enter the name of your tenant
  • In the Authorization field, enter the copied value of your access token, including the prefix "Bearer"
  • In the BODY field, enter the following, replacing interval datetime values with appropriate values:
    {
      "granularity": "all",
      "interval": "2017-04-12T00:00:00.000/2017-04-13T00:00:00.000",
      "limit": 5
    }
    
    This query should yield a 200 response containing up to five JSON response elements. To retrieve more elements, increase the value of the limit attribute in the query.

You can now execute more complex queries, such as:

{
  "granularity": "day",
  "interval": "2017-04-12T19:00:00.000/2017-04-13T20:00:00.000",
  "dimensions": [ "url" ],
  "filter": {"type": "notequal", "column": "url", "value": null},
  "aggregations": [
    { "type": "count_distinct", "name": "distinct_sessions_by_url", "column": "session_id"}
  ],
  "limit": 5,
  "order": [ {"column": "distinct_sessions_by_url", "direction": "descending"} ]
}


Glossary

TermDescription
aggregationsAn optional query element that defines one or more operations to be applied to retrieved data. Valid types are "count", "count_distinct", "sum", "min", "max", "mean", "stddev", and "variance".
columnA string, numeric, or timestamp field representing a data category. A column can represent data stored as a category in the service (e.g. "url", "price"), or data derived upon retrieval from the service (e.g. "distinct_urls", "count").
dimensionA column that represents a data category explicitly stored in the service, rather than derived upon retrieval. A dimension can be filtered or grouped on.
filterAn optional query element that includes or excludes matched data from the retrieved set by applying logical operators; for example, data equals a specified value and does not equal a specified value.
granularityA query element that determines the scale of time groupings for retrieved data: "hour", "day", "week", "month", or "all" (which yields a single grouping).
intervalA query element that indicates the ISO-8601 date-time bounds for data targeted by a query. The interval is delimited by a forward slash. If the interval comprises two dates (for example, "2017-03-02T19:00:00.000/2017-03-02T22:00:00.000"), then the first date is inclusive and the second date is exclusive.
limitAn optional query element that defines the maximum number of results to return.
orderAn optional query element that enforces ordering of data by data type; for example, dimension X values are returned in numerically-ascending order.
packageA developer-defined bundle of microservices. A partner, team, tenant, or another package can subscribe to a particular package to access the functionality of those bundled microservices.
tableA collection of dimensions associated with a particular tenant.
tenantA registered entity with a shared commercial goal that subscribes to SAP Hybris Profile services and packages to reach that goal. A tenant can also develop and contribute enrichers and context adapters to the SAP Hybris Profile suite. Within YaaS, a tenant is a project.


  • Send feedback

    If you find any information that is unclear or incorrect, please let us know so that we can improve the Dev Portal content.

  • Get Help

    Use our private help channel. Receive updates over email and contact our specialists directly.

  • hybris Experts

    If you need more information about this topic, visit hybris Experts to post your own question and interact with our community and experts.