Overview
The Real-Time Reporting Service enables data analysis of data collected from multiple sources over time. The service provides:
- A means to retrieve metadata about existing data tables.
- A means to create, update, and delete data tables.
- A query mechanism for event data that allows users to discover and interpret meaningful patterns in the data.
The query mechanism excels at aggregating time-based data, and allows for multi-dimensional operations such as limiting, grouping, and filtering. Depending on the data that you have submitted to the system, you might use the Real-Time Reporting Service's query functionality to determine, during a specified time period:
- The two most popular browsers used by a particular type of user
- The most frequently-viewed product categories grouped by day for the last week
- The total number of successful checkout events in the last month
- The total number of site visitors for each hour during the previous day
NOTE: Data is automatically ingested into the Real-Time Reporting system via the Profile Edge service. If you create a table whose name and type are both 'clickstream' or are both 'commerce', relevant events submitted to the Profile Edge service will be ingested into this table.
NOTE: It is strongly recommended that PII (Personally Identifiable Information) be excluded from data submitted to the Real-Time Reporting Service.
API Reference
/{tenant}
/{tenant}/tables
Returns a description of each table defined for the tenant. Requires a hybris.realtime_reporting_view scope.
/{tenant}/tables/{tableId}
A table represents a logical grouping of dimension columns as well as an input specification.
Retrieves the table definition (metadata). Requires a hybris.realtime_reporting_view scope.
Creates or replaces a table definition, which defines table dimension columns and parameters for dimension data. If replacement excludes a pre-existing dimension column, the associated data will be maintained but will not be accessible through subsequent queries. Requires a hybris.realtime_reporting_manage scope.
/{tenant}/tables/{tableId}/query
Performs the provided query, targeting the specified tenant and table, and returns a JSON representation of the query results. Requires a hybris.realtime_reporting_view scope.
/{tenant}/tables/{tableId}/events
Posts a request to edit or delete data from a specified table. Such modification requests are executed periodically. Requires a hybris.realtime_reporting_manage scope.
Data Ingestion
The Real-time Reporting Service can automatically ingest event data submitted to the Edge Service. This includes the following event types:
- Clickstream and page-load data representing user interaction with your web interface.
- Commerce data about customer transactions.
Event data is ingested by the Real-time Reporting Service if a Real-time Reporting table associated with that data has been created.
Transformation of JSON Values to Target Column Type
In order to validate input data and guarantee ingestion, it is strongly recommended that event properties destined for numeric columns have an appropriate numeric JSON type. Properties targeting 'Long'-type columns should be expressed as an integral number in the input JSON; properties targeting 'Float'-type columns should be expressed as a JSON 'number' type.
The Real-time Reporting Service currently offers implicit conversion of String values to numeric columns, as long as the String presentation can be parsed into the target type. For instance, the String "1234.67" cannot be parsed into a Long column, because it represents a decimal value.
At this time, the service will strip any commas from Strings targeting numeric columns prior to attempting a conversion. The period symbol '.' will be treated as decimal point. This implicit conversion behavior may change in future releases and should be considered "experimental".
Messages containing String properties that cannot be successfully parsed will be rejected entirely at ingestion time.
Table Queries
The Real-time Reporting Service provides a rich query language to aggregate ingested data in a wide variety of ways.
As we explore the individual aspects of the query language, we will use some sample data to show how different elements of a query affect the data returned.
Example
The following represents a sample dataset ingested into the Real-time Reporting Service:
timestamp | id | browser | referrer_domain | category |
---|---|---|---|---|
2017-04-01T15:00:00Z | event1 | chrome | google.com | shoes |
2017-04-01T15:01:00Z | event2 | firefox | facebook.com | electronics |
2017-04-01T15:03:00Z | event3 | chrome | google.com | electronics |
2017-04-02T11:03:00Z | event4 | unknown | cnn.com | shoes |
2017-04-02T11:05:00Z | event5 | unknown | cnn.com | food |
2017-04-02T11:07:00Z | event6 | safari | cnn.com | food |
2017-04-02T11:09:00Z | event7 | unknown | cnn.com | electronics |
2017-04-02T11:11:00Z | event8 | null | cnn.com | shoes |
The following is an example query for this data that performs the following aggregation:
- Retrieves data within an interval between April 1 (inclusive) and April 3, 2017 (exclusive)
- Counts the number of page views per browser
- Orders results by page views, from highest to lowest
- Excludes results wherein the value of the "browser" column is null
- Limits results to four records maximum
{
"interval": "2017-04-01/2017-04-03",
"granularity": "all",
"dimensions": ["browser"],
"aggregations": [
{"type": "count", "name": "page_views"}
],
"order": [
{"column": "page_views", "direction": "descending"}
],
"limit": 4,
"filter": { "type": "notequal", "column": "browser", "value": null }
}
The above query produces the following results:
[
{
"page_views": 3,
"browser": "unknown"
},
{
"page_views": 2,
"browser": "chrome"
},
{
"page_views": 1,
"browser": "firefox"
},
{
"page_views": 1,
"browser": "safari"
}
]
Parameters
This section describes in detail the seven elements, or parameters, used in a Table Query.
Interval
REQUIRED An interval represents the time period to target for a query. Only events that fall within the provided interval are considered in the query. The interval must be a string representing an ISO-8601 interval. Each of the following examples represents an interval of two days, starting on April 1st, 2017:
- 2017-04-01T00:00:00Z/2017-04-03T00:00:00Z
- 2017-04-01/2017-04-03
- 2017-04-01/P2D
NOTE that if an interval specifies a start and end date (as in the first two examples above), then the first date is inclusive and the second date is exclusive.
Read more about ISO-8601 time intervals here.
Granularity
REQUIRED A granularity represents the timeframe into which events are grouped. For example: A granularity of hour
will return data grouped by hour. The granularity must be one of the following strings:
- all
- hour
- day
- week
- month
Note that a value of all
yields no timeframe grouping.
Dimensions
The dimensions element defines returned data groups by dimension. The dimensions element must be represented as a comma-delimited array of strings referencing one or more dimensions. If no dimensions are specified in a query, then no grouping occurs and the raw events are returned.
In our example query we specify grouping by a single browser
dimension:
{
...
"dimensions": ["browser"],
...
}
You can also group data using more than one dimension. The example below groups results in unique combinations of browser
and device_type
:
{
...
"dimensions": ["browser", "device_type"],
...
}
Limit
A limit is an integer that defines the maximum number of records the query will return. The maximum number of return values allowed by the service is 10,000.
In our example query we specify a limit of 4:
{
...
"limit": 4,
...
}
Aggregations
The aggregations element is a comma-delimited array of objects that defines one or more operations to be applied to retrieved data. Each such object contains a type
, name
, and optional column
field.
name
: An arbitrary label for the aggregation which can be referenced in the order element.column
: Identifies the data column to which aggregation is applied.type
: Defines the type of aggregation to be applied, and can be one of the following values:- count : Yields a count of value instances.
- count_distinct : Yields an estimated count of unique instances of a value.
- max : Yields the maximum value, according to the natural ordering of the data type.
- mean : Yields the mean value, according to the natural ordering of the data type.
- min : Yields the minimum value, according to the natural ordering of the data type.
- sum : Yields the sum of data values.
- stddev: Yields the standard deviation of data values.
- variance: Yields the variance of data values.
In our example query we request row counts, and define the name page_views
to label such counts:
{
...
"aggregations": [ {"type": "count", "name": "page_views"} ]
...
}
You can specify multiple aggregations. The following returns row counts as page_views
, and returns counts of distinct referrer_domain
values as distinct_referrer_domains
:
{
...
"aggregations": [
{ "type": "count", "name": "page_views" },
{ "type": "count_distinct", "name": "distinct_referrer_domains", "column": "referrer_domain"},
]
...
}
NOTE: A query with no "aggregations" element retrieves individual records, and thus may yield a very large result set and reduced performance.
Order
An order defines the order of returned results. The order is represented as a comma-delimited array of objects; multiple objects yield hierarchical ordering. Each such object must contain a column
and direction
field.
column
: Identifies the data column to which ordering is applied, and can reference a dimension or an aggregation name.direction
: Defines the sort order of the returned data, using one of the following values:- ascending
- descending
In our example query we specify that we want the results to be returned in descending order of page_views
:
{
...
"order": [
{"column": "page_views", "direction": "descending"}
]
...
}
Here the order element references a dimension (referrer_domain
), and aggregation names (page_views
, distinct_referrer_domains
):
{
...
"dimensions": [ "referrer_domain", "session_id"],
"aggregations": [
{ "type": "count_distinct", "name": "distinct_referrer_domains", "column": "referrer_domain"}
],
"order": [
{"column": "referrer_domain", "direction": "descending"},
{"column": "page_views", "direction": "descending"},
{"column": "distinct_referrer_domains", "direction": "descending"}
]
...
}
Filter
A filter includes or excludes matched data from the retrieved set by applying rules represented as objects. Each rules object must contain a type
, column
, and value
field.
type
: Defines a condition to be met; valid values are 'equal' and 'notequal'.column
: Identifies the data column to which the rule applies.value
: Defines the data value targeted by the rule.
In our example query we specify a single filter that includes only data whose browser
column value is not null:
{
...
"filter": { "type": "notequal", "column": "browser", "value": null }
...
}
Rules, and sets of rules, optionally can be concatenated with the conjunctions 'and' and 'or' to create complex logic. The following returns data wherein: browser
is not null and category
is not "shoes":
{
...
"filter": {
"and": [
{ "type": "notequal", "column": "browser", "value": null },
{ "type": "notequal", "column": "category", "value": "shoes" }
]
}
...
}
The following returns data wherein: (browser
is not null) and (category
is "shoes" or category
is "food"):
{
...
"filter": {
"and": [
{ "type": "notequal", "column": "browser", "value": null },
{ "or": [
{ "type": "equal", "column": "category", "value": "shoes" },
{ "type": "equal", "column": "category", "value": "food" }
]}
]
}
...
}
Table Metadata Retrieval
The Real-time Reporting Service enables retrieval of metadata about existing data tables. Table metadata comprises:
name
: The required value is either 'clickstream' or 'commerce'.description
ingestionType
: The required value is either 'clickstream' or 'commerce'.columns
: A List of objects representing columns, each comprising:name
description
type
: Valid column types are 'string', 'long', 'float', and 'timestamp'. Exactly one 'timestamp' column must exist.path
: An XPath representing the path to the column in the input JSON data. For example, '$.x.y' indicates that this column is associated with a 'y' element in the JSON that is embedded in an 'x' element. Note that the columnname
can differ from the path location: A column named 'color' can be associated with an element named 'y' in the JSON.collection
: 'false' if the column holds a single data value, and 'true' if it holds an array of values.
Table Creation and Replacement
The Real-time Reporting Service enables creation and replacement of data tables.
Note that the value of the 'tableId' URI path of the PUT request will become the table name. Once created, the table name cannot be altered.
The table schema defines table columns and parameters for column data. The basic schema for a 'clickstream' table is as follows:
{
"description":"Profile clickstream events",
"ingestionType":"clickstream",
"columns": [
{
"name": "action_name",
"type": "string",
"description": "Identifier for the type of event; e.g. 'CartUpdate', 'ProductDetailPageViewEvent'"
},
{
"name": "cookies_enabled",
"type": "string",
"description": "Indicates whether the source browser has cookies enabled",
"path": "$.cookie"
},
{
"name": "browser",
"type": "string",
"description": "A string representing the source browser, including major and minor version; e.g. 'Chrome_21_1_5'"
},
{
"name": "browser_no_minor",
"type": "string",
"description": "A string representing the source browser, including major version only; e.g. 'Chrome_21'"
},
{
"name": "browser_no_version",
"type": "string",
"description": "A string representing the source browser, excluding version information; e.g. 'Chrome'"
},
{
"name": "browser_type",
"type": "string",
"description": "Indicates the type of the source browser; must be one of: 'browser', 'mobile_browser', 'email_client', 'application', 'unknown'"
},
{
"name": "date",
"type": "timestamp",
"description": "Indicates the event timestamp, in ISO-8601 format"
},
{
"name": "device_type",
"type": "string",
"description": "Indicates the type of the source device; must be one of: 'computer', 'mobile, 'tablet, 'game_console, 'digital_media_receiver, 'wearable_computer, 'unknown'"
},
{
"name": "language",
"type": "string",
"description": "Represents the source language as an Accept-Language string; e.g. 'en', 'de'"
},
{
"name": "locale",
"type": "string",
"description": "Represents the source language and locale as an Accept-Language string; e.g. 'en-US', 'de-CH'"
},
{
"name": "operating_system",
"type": "string",
"description": "Indicates the source operating system and its version number; e.g. 'Windows_95'",
"collection": false
},
{
"name": "operating_system_no_version",
"type": "string",
"description": "Indicates the source operating system, excluding version number; e.g. 'Windows'"
},
{
"name": "piwik_id",
"type": "string",
"description": "An internally-generated string identifying the event"
},
{
"name": "referrer",
"type": "string",
"description": "Represents the full URL of the referring site; e.g. 'http://m.facebook.com/ad/12345'",
"path": "$.urlref"
},
{
"name": "referrer_domain",
"type": "string",
"description": "Represents the domain of the referring site; e.g. 'facebook.com'",
"collection": false
},
{
"name": "screen_resolution",
"type": "string",
"description": "Represents the raw resolution of the source display, as [width]x[height]; e.g. '1920x1200'",
"path": "$.res"
},
{
"name": "search",
"type": "string",
"description": "Contains the literal search string applied, if any; e.g. 'mineral content, density'"
},
{
"name": "search_category",
"type": "string",
"description": "Represents the category of a search indicated by the 'search' term; e.g. 'Snow'",
"path": "$.search_cat"
},
{
"name": "session_id",
"type": "string",
"description": "An internally-generated string identifying the current session",
"path": "$.id"
},
{
"name": "url",
"type": "string",
"description": "Indicates the URL of the event source"
},
{
"name": "profile_id",
"type": "string",
"description": "References the event's profile stored in the profile service"
},
{
"name": "guid",
"type": "string",
"description": "Unique identifier for the event. Auto-generated at ingestion time"
},
{
"name": "profile_trace_id",
"type": "string",
"description": "Profile Trace Id. Auto-populated at ingestion time"
}
]
}
The basic schema for a 'commerce' table is as follows:
{
"description":"ECP commerce events",
"ingestionType":"commerce",
"columns": [
{
"name": "date",
"type": "timestamp",
"description": "Indicates the event timestamp, in ISO-8601 format"
},
{
"name": "event_type",
"type": "string",
"path": "$.type",
"description": "Identifies the type of event: 'order', 'order shipment', 'return', 'partial return', 'cancellation', 'account registration', or 'login'"
},
{
"name": "channel_ref",
"type": "string",
"path": "$.channelRef",
"description": "References the channel for the event; e.g. 'webshop'"
},
{
"name": "consumer_ref",
"type": "string",
"path": "$.consumer.ref",
"description": "References the consumer that the event was issued for; e.g. 'someone@mail.com'"
},
{
"name": "consumer_type",
"type": "string",
"path": "$.consumer.type",
"description": "Indicates the consumer reference type; e.g. 'email'"
},
{
"name": "item_index",
"type": "expansion",
"path": "$.body.lineItems[*]",
"description": "The presence of this column indicates that the body.lineItems arrays in the event will be expanded, yielding a unique record for each array entry. The value of this column indicates the index of a given line item record. For example, an event with two line items will yield two unique line item records with 'item_index' equal to '0' and 'item_index' equal to '1', respectively",
},
{
"name": "item_ref",
"type": "string",
"path": "$.body.lineItems[*].ref",
"description": "Defines the path to a line item reference ID; e.g. 'ABCDEF1'"
},
{
"name": "item_type",
"type": "string",
"path": "$.body.lineItems[*].type",
"description": "Indicates the type of a particular line item; e.g. 'Product'"
},
{
"name": "item_pos",
"type": "long",
"path": "$.body.lineItems[*].pos",
"description": "Indicates the position of an item in the array of line items. Synonymous with the 'item_index' column; e.g. '1' represents the second entry in an array"
},
{
"name": "item_quantity",
"type": "long",
"path": "$.body.lineItems[*].quantity",
"description": "Indicates the quantity of a particular line item; e.g. '2'"
},
{
"name": "item_unit",
"type": "string",
"path": "$.body.lineItems[*].unit",
"description": "Indicates the unit applied to a particular line item; e.g. 'kg', 'pieces'"
},
{
"name": "item_price_list",
"type": "float",
"path": "$.body.lineItems[*].price_list",
"description": "Contains the original listed price for a particular line item; e.g. '17.00'"
},
{
"name": "item_price_effective",
"type": "float",
"path": "$.body.lineItems[*].price_effective",
"description": "Contains the effective price for a particular line item; e.g. '15.00'"
},
{
"name": "item_currency",
"type": "string",
"path": "$.body.lineItems[*].currency",
"description": "Indicates the currency of a particular line item; e.g. 'EUR'"
},
{
"name": "item_tax_amount",
"type": "float",
"path": "$.body.lineItems[*].taxAmount",
"description": "Contains the tax amount as a percentage for a particular line item; e.g. '17%'"
},
{
"name": "item_status",
"type": "string",
"path": "$.body.lineItems[*].status",
"description": "Indicates the status of a particular line item within the entire order; e.g. 'cancellation requested' or 'pre-order'"
},
{
"name": "promotion_info_ref",
"type": "string",
"path": "$.body.promotionInfo.ref",
"description": "References a particular promotion; e.g. 'xmas2017abc'"
},
{
"name": "promotion_info_type",
"type": "string",
"path": "$.body.promotionInfo.type",
"description": "Indicates the type of promotion; e.g. 'absolute reduction'"
},
{
"name": "payment_info_payment_type",
"type": "string",
"path": "$.body.paymentInfo.paymentType",
"description": "Indicates the type of payment used for an order; e.g. 'Master'"
},
{
"name": "payment_info_status",
"type": "string",
"path": "$.body.paymentInfo.status",
"description": "Indicates the status of an order payment; e.g. 'requested'"
},
{
"name": "status",
"type": "string",
"path": "$.body.status",
"description": "Indicates the status of an order; e.g. 'new'"
},
{
"name": "order_id",
"type": "string",
"path": "$.body.orderId",
"description": "Represents an individual order; e.g. '123abc'"
},
{
"name": "shipment_info_address_first_name",
"type": "string",
"path": "$.body.shipmentInfo.address.firstName",
"description": "Contains the first name of the addressee; e.g. 'Joe'"
},
{
"name": "shipment_info_address_last_name",
"type": "string",
"path": "$.body.shipmentInfo.address.lastName",
"description": "Contains the last name of the addressee; e.g. 'Smith'"
},
{
"name": "shipment_info_address_addition",
"type": "string",
"path": "$.body.shipmentInfo.address.addition",
"description": "Contains additional information for the shipment address; e.g. the company name"
},
{
"name": "shipment_info_address_street",
"type": "string",
"path": "$.body.shipmentInfo.address.street",
"description": "Contains the street name of the shipment address; e.g. 'Broadway'"
},
{
"name": "shipment_info_address_number",
"type": "string",
"path": "$.body.shipmentInfo.address.number",
"description": "Contains the address number of the shipment address; e.g. '1234'"
},
{
"name": "shipment_info_address_zip",
"type": "string",
"path": "$.body.shipmentInfo.address.zip",
"description": "Contains the zip code of the shipment address; e.g. '80302'"
},
{
"name": "shipment_info_address_city",
"type": "string",
"path": "$.body.shipmentInfo.address.city",
"description": "Contains the city of the shipment address; e.g. 'Boulder'"
},
{
"name": "shipment_info_address_country",
"type": "string",
"path": "$.body.shipmentInfo.address.country",
"description": "Contains the country of the shipment address; e.g. 'Germany'"
},
{
"name": "shipment_info_carrier",
"type": "string",
"path": "$.body.shipmentInfo.carrier",
"description": "Indicates the order shipment carrier; e.g. 'UPS'"
},
{
"name": "shipment_info_tracking_ref",
"type": "string",
"path": "$.body.shipmentInfo.trackingRef",
"description": "References the tracking information for the order shipment; e.g. 'abc123XYZ'"
},
{
"name": "shipment_info_status",
"type": "string",
"path": "$.body.shipmentInfo.status",
"description": "Indicates the status of the order shipment; e.g. 'not delivered'"
},
{
"name": "order_value",
"type": "float",
"path": "$.body.orderValue",
"description": "Contains the total order value; e.g. '114.00'"
},
{
"name": "currency",
"type": "string",
"path": "$.body.currency",
"description": "Indicates the currency of an order; e.g. 'EUR'"
},
{
"name": "profile_id",
"type": "string",
"description": "References the event's profile stored in the profile service"
},
{
"name": "guid",
"type": "string",
"description": "Unique identifier for the event. Auto-generated at ingestion time"
},
{
"name": "profile_trace_id",
"type": "string",
"description": "Profile Trace Id. Auto-populated at ingestion time"
},
]
}
If table replacement excludes a pre-existing column, the associated data will be maintained but will not be accessible through subsequent queries. Renaming columns is not supported at this time.
Supported column types are:
- String
- Long
- Float
- Timestamp
NOTE that only 'String'-type columns can be flagged with "collection":true; all other column types must be single-valued.
The 'path' attribute defines the XPath expression that will be used to populate the column during ingestion, based on the supplied event data in JSON format. If no 'path' is defined, the path value defaults to the 'name' of the column; for example, if 'name' is 'url', the 'path' value defaults to ''$.url'.
If a column is flagged as "collection", data ingested into this column must be in the form of a JSON array of comma-delimited values, at the XPath location indicated.
Mandatory Id Columns
RTR GUID In order to support GDPR requirements for record editing, a column named guid
of type string
must be included in the table spec. The path may be omitted. Values for this column will be generated by Real-time Reporting at ingestion time.
Profile Trace Id In order to track events throughout profile, a column named profile_trace_id
of type string
must be included in the table spec. The path may be omitted. Values for this column will be populated by Real-time Reporting at ingestion time.
Table Data Editing
The Real-time Reporting service allows you to delete and edit previously-ingested table data.
The Real-time Reporting system caches and executes data modification requests periodically. Thus, there might be a significant delay between the submission of the request and the update of the target data.
Delete data
A request to delete data requires a filter that specifies a column:value
pair. The column:value
pair defines all records to delete within a given tenant table. For example, this message deletes all data where profileId
column takes the value abc
.
{
"type": "delete",
"filter": {
"type": "equal",
"column": "profileId",
"value": "abc"
}
}
Edit data
A request to edit data requires a filter that specifies the guid
of a single record in the database. In addition, the request provides a modifiedEvent
key-value map of all record properties to reflect in the updated entry. The value map specifies the guid
and the same timestamp as the record to modify. For example, this data edit message specifies that the record stored for the guid
has two properties: productId with the value abc123
, and productDescription with the value Tennis Shoes
.
{
"type": "edit",
"filter": {
"type": "equal",
"column": "guid",
"value": "61bc5b52-73e6-4dde-bb28-f361267b27ad"
},
"modifiedEvent": {
"guid": "61bc5b52-73e6-4dde-bb28-f361267b27ad",
"time": "2017-05-09T18:24:00.000Z",
"productId": "abc123",
"productDescription": "Tennis Shoes"
}
}
Headers
Messages posted to the Real-time Reporting Service as a POST request have the following headers:
Name | Required? | Description | Example |
---|---|---|---|
Authorization | Yes | The bearer token issued by yaas.io | Bearer 020-943da2ff-0092-4228-bb4c-b44bfa0f4cb8 |
Content-Type | For POST and PUT requests only | Identifies the data type of the submitted content | application/json |
Array Expansion
The Realtime Reporting Service currently does not support a truly relational data design. Many events, however, contain compound objects in the form of array elements. This example represents an order in which the customer purchases two items: a Canon camera costing 99.99, and a Nikon flash costing 5.50:
Order Event
{
"order_id" : 123,
"items" : [
{ "name" : "camera", "brand" : "canon", "price" : 99.99 },
{ "name" : "flash", "brand" : "nikon", "price" : 5.50 }
]
}
In order to store these nested event details in columnar fashion, this single items
array is expanded to yield two independent order events, one for each item purchased. Realtime Reporting offers an array-expansion feature to achieve this, yielding the following data. Note that the items_index
column, which tracks the order of the expanded records, is defined in table metadata rather than in the order event:
Order Data
order_id | items_index | item_name | item_brand | item_price |
---|---|---|---|---|
123 | 0 | camera | canon | 99.99 |
123 | 1 | flash | nikon | 5.50 |
The table's schema definition enables the array expansion, and can target one or more arrays for expansion. For each expanded array, the schema must define an index column, for instance, items_index
in Order Data above. The system auto-generates index values during array expansion, starting at zero.
Enabling Array Expansion
For each array you expand, define:
- An index column that targets the array you expand. In the example,
items_index
is the index column. - A column for each of the array entry's attributes included in the expanded data. In the example,
item_name
is the data column label for the array entry'sname
attribute, anditem_brand
is the data column label for the array entry'sbrand
attribute.
The XPath expression 'items[*]' indicates "select all elements of the array found at root element 'items'". This XPath format is used to target arrays that are to be expanded. Thus, the path for a given array's index column must end in the array selector [*].
Order Schema
{
[...],
"columns": [
[...],
{
"name": "items_index",
"path": "$.items[*]",
"type": "expansion"
},
{
"name": "item_name",
"path": "$.items[*].name",
"type": "string"
},
{
"name": "item_brand",
"path": "$.items[*].brand",
"type": "string"
},
[...]
]
}
Index Column Schema Rules
- You must set the column type to
expansion
to indicate that this column serves as the index column for array expansion - You must set the
collection
attribute to 'false' (property will default to false if omitted) - You must define the
path
attribute path to the array that the system expands
As best practice, form the index column name by concatenating the name of the expanded array with the suffix '_index'. For example, items_index
is the index column name for an expanded items
array.
Child Arrays and Peer Arrays
Child Arrays
An event array may contain one or more child arrays. You can define a child array as expandable if its parent array is a top-level expandable array. Here, the event's expandable items
array contains an expandable child array of variantOptions
.
Child Array Event
{
"order_id": 123,
"items": [
{
"product": "socks",
"variantOptions": [
{
"name": "size",
"value": "large"
},
{
"name": "color",
"value": "white"
}
]
},
{
"product": "Core Workout",
"variantOptions": [
{
"name": "format",
"value": "MPEG"
},
{
"name": "length",
"value": "90 min"
},
{
"name": "publisher",
"value": "Fit4Life"
}
]
}
]
}
The schema defines expandable child arrays using the same rules and format applied to expandable parent arrays:
Child Array Order Schema
{
[...],
"columns": [
[...],
{
"name": "variantOptions_index",
"path": "$.items[*].variantOptions[*]",
"type": "expansion"
},
{
"name": "variant_name",
"path": "$.items[*].variantOptions[*].name",
"type": "string"
},
{
"name": "variant_value",
"path": "$.items[*].variantOptions[*].value",
"type": "string"
},
[...]
]
}
The Child Array Event and Child Array Order Schema examples above result in the following expanded data store:
order_id | items_index | product | variantOptions_index | variant_name | variant_value |
---|---|---|---|---|---|
123 | 0 | socks | 0 | size | large |
123 | 0 | socks | 1 | color | white |
123 | 1 | Core Workout | 0 | format | MPEG |
123 | 1 | Core Workout | 1 | length | 90 min |
123 | 1 | Core Workout | 2 | publisher | Fit4Life |
Peer Arrays
An event can contain multiple arrays at the same level, known as 'peer' arrays. This example shows the peer arrays, items
and payment
:
Peer Array Event
{
"order_id": 123,
"items": [
{
"product": "socks",
"sku": "abc"
},
{
"product": "Core Workout",
"sku": "efg"
}
],
"payment": [
{
"type": "giftCard",
"amount": 25.00
},
{
"type": "creditCard",
"amount": 30.98
}
]
}
If the schema defines multiple peer arrays as expandable, the resulting data for one expanded array excludes the data from its peer arrays. In this example, the first two columns include expanded items
data and exclude payment
data, and the final two columns include expanded payment
data and exclude items
data:
Peer Array Data
order_id | items_index | product | sku | pay_index | pay_type | pay_amount |
---|---|---|---|---|---|---|
123 | 0 | socks | abc | |||
123 | 1 | Core... | efg | |||
123 | 0 | giftCard | 25.00 | |||
123 | 1 | creditCard | 30.98 |
Combining Child and Peer Arrays
You can combine child array and peer array expansions. At this time, you can expand parent and direct child arrays, and no more than 3 peer arrays at any given hierarchy level.
Querying Expanded Arrays
Consider array expansion when defining your queries, because the resulting data rows can share duplicate information. In the following example, an event contains duplicated data in the order_id
and order_total
fields in the expanded data store:
{
"order_id" : 123,
"order_total": 105.49
"items" : [
{ "name" : "camera", "price" : 99.99 },
{ "name" : "flash", "price" : 5.50 }
]
}
order_id | items_index | order_total | item_name | item_price |
---|---|---|---|---|
123 | 0 | 105.49 | camera | 99.99 |
123 | 1 | 105.49 | flash | 5.50 |
To count the number of unique order_id
values, use the 'count_distinct' aggregation:
[...]
"aggregations": [
{ "type": "count_distinct", "name": "distinct_order_ids", "column": "order_id"}
]
[...]
To retrieve the unique order_total
value, target only rows where the index value is '0':
[...]
"filter": { "type": "equal", "column": "items_index", "value": 0 }
[...]
NOTE
Using the expansion feature may drastically increase the number of database records that are created from a single ingestion event. At this time, the system will discard any ingestion event that would result in more than 1,000 individual records.
Prerequisites
- You must have a tenant
- Your tenant must be subscribed to the Realtime Reporting package
- Your tenant must have a Client with the scopes
hybris.realtime_reporting_view
andhybris.realtime_reporting_manage
Setup
Generate a Bearer Token as follows:
- Browse to your tenant in BUILDER
- Browse to Development > Clients > [Client with realtime_reporting scopes]
- Under
GENERATE CREDENTIALS
, click GENERATE ACCESS TOKEN - Select the correct Requested Region
- Under Select Scopes, choose
hybris.realtime_reporting_view
- Click
GENERATE
and copy the resulting string, including the prefix "Bearer".
Execution
In a separate browser tab, browse to the SERVICE URL
indicated at the top of this tutorial screen.
VIEW TABLES
For this example, use a table whose name and ingestionType are both 'clickstream'.
To determine whether such a table pre-exists, attempt to view the 'clickstream' table:
- In the
/{tenant}/tables/{tableId}
row, click GET - In the
tableId
field, enter the name of the target table; for instance: 'clickstream' - In the
tenant
andhybris-tenant
fields, enter the name of your tenant - In the
Authorization
field, enter the copied value of your access token, including the prefix "Bearer"
If the table exists, the query will yield a 200
response and a representation of the table similar to the following:
{
"name": "clickstream",
"description": "yProfile clickstream events",
"ingestionType": "clickstream",
"status": "active",
"columns": [
{
"name": "action_name",
"type": "string",
"path": "$.action_name",
"collection": false
},
{
"name": "date",
"type": "timestamp",
"path": "$.date",
"collection": false
},
[etc.]
]
}
The status
attribute of the returned table indicates that it is 'active': available for use. Other possible status
values are 'deleting' (in the process of being deleted) and 'error'. If the returned table's status is 'error', please contact support.
CREATE/REPLACE A TABLE
If no 'clickstream' table exists, you can create a new one by submitting a JSON request with the structure described in the VIEW TABLES section above. If a 'clickstream' table pre-exists, you can replace it with one whose structure matches your needs. Both goals can be met with the same process, described here.
- In the
/{tenant}/tables/{tableId}
row, click PUT - In the
tableId
field, enter 'clickstream' - the required table name - In the
tenant
andhybris-tenant
fields, enter the name of your tenant - In the
Authorization
field, enter the copied value of your access token, including the prefix "Bearer" - In the BODY section, enter the JSON representation of your table. A default representation is included in this section. NOTE that a
timestamp
-type column is required, and that - if you are replacing an existing table - this column cannot be renamed.
The request should yield a 204
response, indicating that your table is being created or replaced. After a short waiting period, your table will be available.
QUERY FOR DATA
- In the
/{tenant}/tables/{tableId}/query
row, click POST - Enter 'clickstream' in the
tableId
field - In the
tenant
andhybris-tenant
fields, enter the name of your tenant - In the
Authorization
field, enter the copied value of your access token, including the prefix "Bearer" - In the
BODY
field, enter the following, replacing interval datetime values with appropriate values:
This query should yield a{ "granularity": "all", "interval": "2017-04-12T00:00:00.000/2017-04-13T00:00:00.000", "limit": 5 }
200
response containing up to five JSON response elements. To retrieve more elements, increase the value of thelimit
attribute in the query.
You can now execute more complex queries, such as:
{
"granularity": "day",
"interval": "2017-04-12T19:00:00.000/2017-04-13T20:00:00.000",
"dimensions": [ "url" ],
"filter": {"type": "notequal", "column": "url", "value": null},
"aggregations": [
{ "type": "count_distinct", "name": "distinct_sessions_by_url", "column": "session_id"}
],
"limit": 5,
"order": [ {"column": "distinct_sessions_by_url", "direction": "descending"} ]
}
Glossary
Term | Description |
---|---|
aggregations | An optional query element that defines one or more operations to be applied to retrieved data. Valid types are "count", "count_distinct", "sum", "min", "max", "mean", "stddev", and "variance". |
column | A string, numeric, or timestamp field representing a data category. A column can represent data stored as a category in the service (e.g. "url", "price"), or data derived upon retrieval from the service (e.g. "distinct_urls", "count"). |
dimension | A column that represents a data category explicitly stored in the service, rather than derived upon retrieval. A dimension can be filtered or grouped on. |
filter | An optional query element that includes or excludes matched data from the retrieved set by applying logical operators; for example, data equals a specified value and does not equal a specified value. |
granularity | A query element that determines the scale of time groupings for retrieved data: "hour", "day", "week", "month", or "all" (which yields a single grouping). |
interval | A query element that indicates the ISO-8601 date-time bounds for data targeted by a query. The interval is delimited by a forward slash. If the interval comprises two dates (for example, "2017-03-02T19:00:00.000/2017-03-02T22:00:00.000"), then the first date is inclusive and the second date is exclusive. |
limit | An optional query element that defines the maximum number of results to return. |
order | An optional query element that enforces ordering of data by data type; for example, dimension X values are returned in numerically-ascending order. |
package | A developer-defined bundle of microservices. A partner, team, tenant, or another package can subscribe to a particular package to access the functionality of those bundled microservices. |
table | A collection of dimensions associated with a particular tenant. |
tenant | A registered entity with a shared commercial goal that subscribes to SAP Hybris Profile services and packages to reach that goal. A tenant can also develop and contribute enrichers and context adapters to the SAP Hybris Profile suite. Within YaaS, a tenant is a project. |
If you find any information that is unclear or incorrect, please let us know so that we can improve the Dev Portal content.
Use our private help channel. Receive updates over email and contact our specialists directly.
If you need more information about this topic, visit hybris Experts to post your own question and interact with our community and experts.