Entity Extraction

Overview

The Entity Extraction service of the Text Analysis Entity Extraction package is part of a series of enterprise-grade, natural-language products that eliminate "noise" in unstructured text sources by highlighting salient information. The package's Entity Extraction service identifies an average of 36 entity types per language, including:

People

Name designator (“Ms.”)
Title (“President”)
Person (“Barack Obama”)
People (“Greeks”)
Language (“Greek”)

Places

Address 1 (“53 State Street Floor 16”)
Address 2 (“Boston, MA 02109”)
Locality (“Boston”)
Minor region (“Napa County”)
Major region (“Nevada”)
Country (“Brazil”)
Continent (“South America”)
Geographic feature (“Mount Fuji”)
Geographic area (“Scandinavia”)

Organizations and products

Commercial organization (“AT&T”)
Educational organization (“University of Washington”)
Other organization (“FBI”)
Product (“iPhone”)
Ticker (“NYSE:SAP”)

Times and dates

Day (“Monday”)
Date (“2/14/2016”)
Month (“August”)
Year (“1966”)
Time (“3:47 pm”)
Time period (“3 days”; “from 9 to 5 pm”)
Holiday (“Memorial Day”)

Numbers

Currency (“17 euros”)
Measure (“217 meters”)
Percent (“10%”)
Phone (“610-661-1000”)
National identification number (“555-12-3456”)

Internet

Email address (“john.doe@sap.com”)
IP address (“165.14.2.0”)
URL (“www.sap.com”)
Social media ID (“@SAP”)
Social media topic (“#SAPHANA”)

Use case

An example use of this service is an application that analyzes "hash tag" topics used in social media posts. You crawl social media sites and use simple string matching to save all words that begin with a pound sign (#). After analyzing your database, you realize you erroneously saved CSS color codes, for example, #FC3208, in HTML pages as topics. You switch to using the Entity Extraction service because it automatically detects HTML pages, removes markup such as CSS syntax, and identifies social media topics. You replace your over-simplified string matching code with a call to this service and look in the response for entities whose label attribute value is TOPIC_TWITTER and store the value of the text attribute.
See the Tutorials div for examples of how you might code this in Python.

This service supports 14 languages:

Arabic
Chinese (Simplified)
Chinese (Traditional)
Dutch
English
Farsi
French
German
Italian
Japanese
Korean
Portuguese
Russian
Spanish

The service accepts input in a wide variety of formats:

Abobe PDF
Generic email messages (.eml)
HTML
Microsoft Excel
Microsoft Outlook email messages (.msg)
Microsoft PowerPoint
Microsoft Word
Open Document Presentation
Open Document Spreadsheet
Open Document Text
Plain Text
Rich Text Format (RTF)
WordPerfect
XML

The size of each input file is limited to 1 MB.

The tenant parameter is not required because the service is stateless and no data is persisted.

API Reference

/

post

Identify entities such as people, places, and organizations contained within a document. For a detailed explanation of entity extraction, including which entities are available in which languages, see the Entity Extraction section of the SAP HANA Text Analysis Language Reference Guide.

Request
Response

Headers

Authorization: required (string)
Used to send a valid OAuth2 access token.
Example:
```
Bearer access_token
```

Query Parameters

languageCodes: (string - default: All language codes supported by entity extraction.)
Comma-separated list containing 2-letter language codes. Include one or more languages in which the input might possibly be written. If multiple languages are listed, languageIdentification is implicitly invoked, and entities in the document's most dominant language are extracted.
Supported language codes for entity extraction:
- ar - Arabic
- de - German
- en - English
- es - Spanish
- fa - Persian
- fr - French
- it - Italian
- ja - Japanese
- ko - Korean
- nl - Dutch
- pt - Portuguese
- ru - Russian
- zh - Simplified Chinese
- zf - Traditional Chinese
Example:
```
en
```

Body

Type: application/json

Example:

{
  "text": "Vitamin supplement FeddyFüüd® is produced by Fraglets Food Corporation. Their CEO is Mr. Harry Cadow."
}

Type: application/octet-stream

Example:

file-as-binary-stream

HTTP status code 200

Body

Type: application/json

Schema:

{
    "$schema": "http://json-schema.org/draft-04/schema#",
    "id": "http://www.sap.com/schemas/json/taaas",
    "title": "Text Analysis as a Service (TAaaS) results",
    "definitions": {
        "output_token": {
            "type": "object",
            "properties": {
                "normalizedToken": {
                    "description": "A normalized representation of the token. Normalization includes converting words to lower case, converting umlauts (ä to ae, for example), and removing diacritics. This value is empty when the partOfSpeech property is \"punctuation\".",
                    "type": "string"
                },
                "offset": {
                    "description": "The offset in characters relative to the beginning of the document. If the document's MIME type is other than text/plain, offset is relative to the document after text analysis converted it to plain text.",
                    "type": "number",
                    "minimum": 0
                },
                   "paragraph": {
                    "description": "The relative paragraph number containing the token (indicates that the nth paragraph contains this token).",
                    "type": "number",
                    "minimum": 1
                },
                "partOfSpeech": {
                    "description": "",
                    "type": "string"
                },
                "sentence": {
                    "description": "The relative sentence number containing the token (indicates that the nth sentence contains this token).",
                    "type": "number",
                    "minimum": 1
                },
                "stems": {
                    "description": "The token's base form(s); i.e., the forms referenced in a dictionary. For example, the singular nominative for nouns or the infinitive for verbs. This property is empty unless the token has a stem that differs from the token.",
                    "type": "array",
                    "minItems": 0,
                    "items": [
                        {
                            "type": "string"
                        }
                    ]
                },
                "token": {
                    "description": "The original, non-normalized form of the word as it appeared in the input.",
                    "type": "string"
                }
            }
        },
        "entity": {
            "type": "object",
            "properties": {
                "id": {
                    "description": "The ordinal position of this entity among other entities found in the input.",
                    "type": "number",
                    "minimum": 1
                },
                "label": {
                    "description": "The linguistic or semantic type of the entity, for instance \"PERSON\" or \"StrongPositiveSentiment\".",
                    "type": "string"
                },
                "labelPath": {
                    "description": "Identical to the label property unless the type is hierarchical; e.g., \"SOCIAL_MEDIA/TOPIC_TWITTER\". In this example, the label property would be TOPIC_TWITTER.",
                    "type": "string"
                },
                "normalizedForm": {
                    "description": "A normalized representation of the entity. For more information, see the description of normalizedToken in this schema.",
                    "type": "string",
                    "minLength": 0
                },
                "offset": {
                    "description": "The offset in characters relative to the beginning of the document. If the document's MIME type is other than text/plain, offset is relative to the document after text analysis converts it to plain text.",
                    "type": "number",
                    "minimum": 0
                },
                "paragraph": {
                    "description": "The relative paragraph number containing the entity (indicates that the nth paragraph contains this entity).",
                    "type": "number",
                    "minimum": 1
                },
                "parent": {
                    "description": "The value of the parent entity's \"id\" property. This property is not included if the token has no parent. Used to indicate that there is a lingustic relationship between two entities.  For example, it is used by sentimentAnalysis to relate topics to their enclosing sentiments.",
                    "type": "number",
                    "minimum": 1
                },
                "sentence": {
                    "description": "The relative sentence number containing the entity (indicates that the nth sentence contains this entity).",
                    "type": "number",
                    "minimum": 1
                },
                "text": {
                    "description": "The original, non-normalized form of the entity as it appeared in the input.",
                    "type": "string"
                }
            }
        }
    },
    "type": "object",
    "properties": {
        "language": {
            "description": "2-letter code indicating the primary language of the input text.",
            "type": "string",
            "minLength": 2,
            "maxLength": 2
        },
        "mimeType": {
            "description": "MIME type of the input.",
            "type": "string"
        },
        "textSize": {
            "description": "Number of characters in the input (after conversion to plain text if mimeType other than text/plain).",
            "type": "number"
        },
        "tokens": {
            "description": "",
            "type": "array",
            "items": [
                {
                    "$ref": "#/definitions/output_token"
                }
            ]
        },
        "entities": {
            "description": "Entities extracted from the input when entityExtraction or a variety of factExtraction invoked, otherwise empty.",
            "type": "array",
            "minItems": 0,
            "items": [
                {
                    "$ref": "#/definitions/entity"
                }
            ]
        }
    },
    "required": ["language","mimeType","textSize"]
}

Example:

{
  "entities": [
    {
      "id": 1,
      "label": "NOUN_GROUP",
      "labelPath": "NOUN_GROUP",
      "normalizedForm": "",
      "offset": 0,
      "paragraph": 1,
      "sentence": 1,
      "text": "Vitamin supplement"
    },
    {
      "id": 2,
      "label": "PRODUCT",
      "labelPath": "PRODUCT",
      "normalizedForm": "",
      "offset": 19,
      "paragraph": 1,
      "sentence": 1,
      "text": "FeddyF\u00fc\u00fcd\u00ae"
    },
    {
      "id": 3,
      "label": "COMMERCIAL",
      "labelPath": "ORGANIZATION/COMMERCIAL",
      "normalizedForm": "",
      "offset": 45,
      "paragraph": 1,
      "sentence": 1,
      "text": "Fraglets Food Corporation"
    },
    {
      "id": 4,
      "label": "TITLE",
      "labelPath": "TITLE",
      "normalizedForm": "",
      "offset": 78,
      "paragraph": 1,
      "sentence": 2,
      "text": "CEO"
    },
    {
      "id": 5,
      "label": "PERSON",
      "labelPath": "PERSON",
      "normalizedForm": "",
      "offset": 85,
      "paragraph": 1,
      "sentence": 2,
      "text": "Mr. Harry Cadow"
    }
  ],
  "language": "en",
  "mimeType": "text/plain",
  "textSize": 104
}

HTTP status code 400

Request syntactically incorrect. Any details will be provided within the response payload.

Body

Type: application/json

Schema:

{
  "$schema":"http://json-schema.org/draft-04/schema#",
  "title":"error",
  "description":"Schema for API specified errors.",
  "type":"object",
  "properties":
  {
    "status":
    {
      "type":"integer",
      "description":"original HTTP error code, should be consistent with the response HTTP code",
      "minimum":100,
      "maximum":599
    },
    "type":
    {
      "type":"string",
      "description":"classification of the error type, lower case with underscore eg validation_failure",
      "pattern":"[a-z]+[a-z_]*[a-z]+"
    },
    "message":
    {
      "type":"string",
      "description":"descriptive error message for debugging"
    },
    "moreInfo":
    {
      "type":"string",
      "format":"uri",
      "description":"link to documentation to investigate further and finding support"
    },
    "details":
    {
      "type":"array",
      "description":"list of problems causing this error",
      "items":
      {
        "$schema":"http://json-schema.org/draft-04/schema#",
        "title":"errorDetail",
        "description":"schema for specific error cause",
        "type":"object",
        "properties":
        {
          "field":
          {
            "type":"string",
            "description":"a bean notation expression specifying the element in request data causing the error, eg product.variants[3].name, this can be empty if violation was not field specific"
          },
          "type":
          {
            "type":"string",
            "description":"classification of the error detail type, lower case with underscore eg missing_value, this value must be always interpreted in context of the general error type.",
            "pattern":"[a-z]+[a-z_]*[a-z]+"
          },
          "message":
          {
            "type":"string",
            "description":"descriptive error detail message for debugging"
          },
          "moreInfo":
          {
            "type":"string",
            "format":"uri",
            "description":"link to documentation to investigate further and finding support for error detail"
          }
        },
        "required":["type"]
      }
    }
  },
  "required":["status" , "type" ]
}

Example:

{
  "status": 400,
  "message": "There are validation problems, see details section for more information",
  "moreInfo": "https://api.yaas.io/patterns/errortypes.html",
  "type": "validation_violation",
  "details": [
    {
      "field": "hybris-tenant",
      "message": "size must be between 1 and 36",
      "type": "invalid_header"
    }
  ]
}

HTTP status code 401

Given request is unauthorized. Bad or expired token. Reauthenticate the user. Any details will be provided within the response payload.

Body

Type: application/json

Schema:

{
  "$schema":"http://json-schema.org/draft-04/schema#",
  "title":"error",
  "description":"Schema for API specified errors.",
  "type":"object",
  "properties":
  {
    "status":
    {
      "type":"integer",
      "description":"original HTTP error code, should be consistent with the response HTTP code",
      "minimum":100,
      "maximum":599
    },
    "type":
    {
      "type":"string",
      "description":"classification of the error type, lower case with underscore eg validation_failure",
      "pattern":"[a-z]+[a-z_]*[a-z]+"
    },
    "message":
    {
      "type":"string",
      "description":"descriptive error message for debugging"
    },
    "moreInfo":
    {
      "type":"string",
      "format":"uri",
      "description":"link to documentation to investigate further and finding support"
    },
    "details":
    {
      "type":"array",
      "description":"list of problems causing this error",
      "items":
      {
        "$schema":"http://json-schema.org/draft-04/schema#",
        "title":"errorDetail",
        "description":"schema for specific error cause",
        "type":"object",
        "properties":
        {
          "field":
          {
            "type":"string",
            "description":"a bean notation expression specifying the element in request data causing the error, eg product.variants[3].name, this can be empty if violation was not field specific"
          },
          "type":
          {
            "type":"string",
            "description":"classification of the error detail type, lower case with underscore eg missing_value, this value must be always interpreted in context of the general error type.",
            "pattern":"[a-z]+[a-z_]*[a-z]+"
          },
          "message":
          {
            "type":"string",
            "description":"descriptive error detail message for debugging"
          },
          "moreInfo":
          {
            "type":"string",
            "format":"uri",
            "description":"link to documentation to investigate further and finding support for error detail"
          }
        },
        "required":["type"]
      }
    }
  },
  "required":["status" , "type" ]
}

Example:

{
  "status" : 401,
  "message" : "Authorization: Unauthorized. Bearer TOKEN is invalid",
  "type" : "insufficient_credentials",
  "moreInfo" : "https://api.yaas.io/patterns/errortypes.html"
}

HTTP status code 403

Given authorization scopes are not sufficient and do not match required scopes.

Body

Type: application/json

Schema:

{
  "$schema":"http://json-schema.org/draft-04/schema#",
  "title":"error",
  "description":"Schema for API specified errors.",
  "type":"object",
  "properties":
  {
    "status":
    {
      "type":"integer",
      "description":"original HTTP error code, should be consistent with the response HTTP code",
      "minimum":100,
      "maximum":599
    },
    "type":
    {
      "type":"string",
      "description":"classification of the error type, lower case with underscore eg validation_failure",
      "pattern":"[a-z]+[a-z_]*[a-z]+"
    },
    "message":
    {
      "type":"string",
      "description":"descriptive error message for debugging"
    },
    "moreInfo":
    {
      "type":"string",
      "format":"uri",
      "description":"link to documentation to investigate further and finding support"
    },
    "details":
    {
      "type":"array",
      "description":"list of problems causing this error",
      "items":
      {
        "$schema":"http://json-schema.org/draft-04/schema#",
        "title":"errorDetail",
        "description":"schema for specific error cause",
        "type":"object",
        "properties":
        {
          "field":
          {
            "type":"string",
            "description":"a bean notation expression specifying the element in request data causing the error, eg product.variants[3].name, this can be empty if violation was not field specific"
          },
          "type":
          {
            "type":"string",
            "description":"classification of the error detail type, lower case with underscore eg missing_value, this value must be always interpreted in context of the general error type.",
            "pattern":"[a-z]+[a-z_]*[a-z]+"
          },
          "message":
          {
            "type":"string",
            "description":"descriptive error detail message for debugging"
          },
          "moreInfo":
          {
            "type":"string",
            "format":"uri",
            "description":"link to documentation to investigate further and finding support for error detail"
          }
        },
        "required":["type"]
      }
    }
  },
  "required":["status" , "type" ]
}

Example:

{
  "status": 403,
  "message": "Given request does not have required scopes. It is not authorized to perform this operation.",
  "type": "insufficient_permissions"
}

HTTP status code 500

Body

Type: application/json

Schema:

{
  "$schema":"http://json-schema.org/draft-04/schema#",
  "title":"error",
  "description":"Schema for API specified errors.",
  "type":"object",
  "properties":
  {
    "status":
    {
      "type":"integer",
      "description":"original HTTP error code, should be consistent with the response HTTP code",
      "minimum":100,
      "maximum":599
    },
    "type":
    {
      "type":"string",
      "description":"classification of the error type, lower case with underscore eg validation_failure",
      "pattern":"[a-z]+[a-z_]*[a-z]+"
    },
    "message":
    {
      "type":"string",
      "description":"descriptive error message for debugging"
    },
    "moreInfo":
    {
      "type":"string",
      "format":"uri",
      "description":"link to documentation to investigate further and finding support"
    },
    "details":
    {
      "type":"array",
      "description":"list of problems causing this error",
      "items":
      {
        "$schema":"http://json-schema.org/draft-04/schema#",
        "title":"errorDetail",
        "description":"schema for specific error cause",
        "type":"object",
        "properties":
        {
          "field":
          {
            "type":"string",
            "description":"a bean notation expression specifying the element in request data causing the error, eg product.variants[3].name, this can be empty if violation was not field specific"
          },
          "type":
          {
            "type":"string",
            "description":"classification of the error detail type, lower case with underscore eg missing_value, this value must be always interpreted in context of the general error type.",
            "pattern":"[a-z]+[a-z_]*[a-z]+"
          },
          "message":
          {
            "type":"string",
            "description":"descriptive error detail message for debugging"
          },
          "moreInfo":
          {
            "type":"string",
            "format":"uri",
            "description":"link to documentation to investigate further and finding support for error detail"
          }
        },
        "required":["type"]
      }
    }
  },
  "required":["status" , "type" ]
}

Example:

{
  "status" : 500,
  "message" : "Invalid server settings. Please contact administrator.",
  "type" : "internal_service_error",
  "moreInfo" : "https://api.yaas.io/patterns/errortypes.html"
}

HTTP status code 503

Occasionally, this error occurs during processing. Please consider implementing a retry mechanism in your client application for stable processing.

Body

Type: application/json

Schema:

{
  "$schema":"http://json-schema.org/draft-04/schema#",
  "title":"error",
  "description":"Schema for API specified errors.",
  "type":"object",
  "properties":
  {
    "status":
    {
      "type":"integer",
      "description":"original HTTP error code, should be consistent with the response HTTP code",
      "minimum":100,
      "maximum":599
    },
    "type":
    {
      "type":"string",
      "description":"classification of the error type, lower case with underscore eg validation_failure",
      "pattern":"[a-z]+[a-z_]*[a-z]+"
    },
    "message":
    {
      "type":"string",
      "description":"descriptive error message for debugging"
    },
    "moreInfo":
    {
      "type":"string",
      "format":"uri",
      "description":"link to documentation to investigate further and finding support"
    },
    "details":
    {
      "type":"array",
      "description":"list of problems causing this error",
      "items":
      {
        "$schema":"http://json-schema.org/draft-04/schema#",
        "title":"errorDetail",
        "description":"schema for specific error cause",
        "type":"object",
        "properties":
        {
          "field":
          {
            "type":"string",
            "description":"a bean notation expression specifying the element in request data causing the error, eg product.variants[3].name, this can be empty if violation was not field specific"
          },
          "type":
          {
            "type":"string",
            "description":"classification of the error detail type, lower case with underscore eg missing_value, this value must be always interpreted in context of the general error type.",
            "pattern":"[a-z]+[a-z_]*[a-z]+"
          },
          "message":
          {
            "type":"string",
            "description":"descriptive error detail message for debugging"
          },
          "moreInfo":
          {
            "type":"string",
            "format":"uri",
            "description":"link to documentation to investigate further and finding support for error detail"
          }
        },
        "required":["type"]
      }
    }
  },
  "required":["status" , "type" ]
}

Example:

{
  "status": 503,
  "message": "A temporary service unavailability was detected. Refer to the error details response for a re-attempt strategy.",
  "type": "service_temporarily_unavailable",
  "moreInfo" : "https://api.yaas.io/patterns/errortypes.html"
}

HTTP status code 504

This error occurs if text analysis takes longer than 20 seconds. There could be two reasons for this error:

Processing takes longer because the current service load is high. Please try again later. You could also consider implementing a retry mechanism in your client application for more stable processing.
The text to be processed is too big or too complex. Sometimes, even small texts take a long time to process. Please split your text in smaller chunks and send it separately to the service.

Body

Type: application/json

Schema:

{
  "$schema":"http://json-schema.org/draft-04/schema#",
  "title":"error",
  "description":"Schema for API specified errors.",
  "type":"object",
  "properties":
  {
    "status":
    {
      "type":"integer",
      "description":"original HTTP error code, should be consistent with the response HTTP code",
      "minimum":100,
      "maximum":599
    },
    "type":
    {
      "type":"string",
      "description":"classification of the error type, lower case with underscore eg validation_failure",
      "pattern":"[a-z]+[a-z_]*[a-z]+"
    },
    "message":
    {
      "type":"string",
      "description":"descriptive error message for debugging"
    },
    "moreInfo":
    {
      "type":"string",
      "format":"uri",
      "description":"link to documentation to investigate further and finding support"
    },
    "details":
    {
      "type":"array",
      "description":"list of problems causing this error",
      "items":
      {
        "$schema":"http://json-schema.org/draft-04/schema#",
        "title":"errorDetail",
        "description":"schema for specific error cause",
        "type":"object",
        "properties":
        {
          "field":
          {
            "type":"string",
            "description":"a bean notation expression specifying the element in request data causing the error, eg product.variants[3].name, this can be empty if violation was not field specific"
          },
          "type":
          {
            "type":"string",
            "description":"classification of the error detail type, lower case with underscore eg missing_value, this value must be always interpreted in context of the general error type.",
            "pattern":"[a-z]+[a-z_]*[a-z]+"
          },
          "message":
          {
            "type":"string",
            "description":"descriptive error detail message for debugging"
          },
          "moreInfo":
          {
            "type":"string",
            "format":"uri",
            "description":"link to documentation to investigate further and finding support for error detail"
          }
        },
        "required":["type"]
      }
    }
  },
  "required":["status" , "type" ]
}

Example:

{
  "status": 504,
  "message": "Service is not reachable: Upstream service connection timeout.",
  "type": "service_temporarily_unavailable",
  "moreInfo" : "https://api.yaas.io/patterns/errortypes.html"
}

An empty entities array is normal

It is not an error if the service returns an empty entities array. Not all text contains entities as defined and recognized by the service. For example, the English sentence "It's the end of the world as we know it, and I feel fine" contains no entities, nor do any of these translations:

Spanish:

German:

Korean:

Russian:

Es el fin del mundo tal como lo conocemos, y me siento bien.

Es ist das Ende der Welt, wie wir es kennen, und ich fühle mich gut.

우리가 알고있는대로 그것은 세상의 종말이며, 나는 기분이 좋아집니다.

Это конец света, как мы его знаем, и я чувствую себя прекрасно.

The label and labelPath members

Some of the entity types that the service identifies are given general categories and then subdivided into more specific classifications. For example, URI is a general category with the subtypes EMAIL, IP, and URL. The label member of the entities array is the most specific entity type of an extracted entity. The labelPath member includes the general category and the subtype, separated by a forward slash ("/").
That means, if a document contains the web address "http://www.sap.com" in its text, that string is extracted as an entity with its label attribute's value set to URL and its labelPath set to URI/URL.
If the entity type does not have subtypes, for example, PERSON, the label and labelPath values are identical.

Default Language

The default language is either the first value listed in the languageCodes input parameter (see Setting a subset of languages in this topic) or English if the languageCodes input parameter is not specified.

Setting a subset of languages

You can instruct the service to choose from a specific, reduced set of languages by setting the languageCodes input parameter. This forces the service to choose from one of the languages you supply.
Use this setting with caution. If, for example, you set languageCodes to Danish, German, or Dutch and the input text is in Russian, the service cannot return Russian. It must return the default.

Meaning of the textSize value

The returned attribute textSize represents the amount of character data in the input, not the number of bytes. If the input is in plain text file without accented characters, textSize equals the input file's size. However, if the input is a binary file such as a PDF or Microsoft Word document, the textSize will probably be much smaller than the file size, especially if the file contains a lot of non-textual data such as an embedded image.

Annotated JSON schema

The JSON schema contains the descriptions of the objects and members of the JSON response that the Entity Extraction service returns. To read the schema, click the POST link in the API Reference, then click the RESPONSE tab.

Further references

You can find extensive details on the capabilities and behavior of SAP's entity extraction technology in the Entity Extraction chapter of the SAP HANA Text Analysis Language Reference Guide (PDF).

Python Tutorial

This tutorial mirrors the use case described in the Overview.

In this tutorial, you are using the Entity Extraction service to find topic tags, for example, "#Brexit", "#selfie", and "#InternationalWomensDay", in social media posts. You store the tags in a collection that you will analyze later.

Get an access token

To use the service, you must pass an access token in each call. Get the token from the OAuth2 service.

import requests
import json

# Replace the two following values with your client secret and client ID.
client_secret = 'clientSecretPlaceholder'
client_id = 'clientIDPlaceholder'

s = requests.Session()

# Get the access token from the OAuth2 service.
auth_url = 'https://api.beta.yaas.io/hybris/oauth2/v1/token'
r = s.post(auth_url, data= {'client_secret':client_secret, 'client_id':client_id,'grant_type':'client_credentials'})
access_token = r.json()['access_token']

Call the service

The POST request body for this service includes a single value: the text upon which to perform entity extraction. Your variable socialpost contains a post that your application read from a social platform's API. In some cases, the post is in HTML, in others it is plain text. You pass it as application/binary and let the service automatically determine its format, remove markup if present, and return hashtags found in the remaining text.

# The Entity Extraction service's URL
service_url = 'https://api.beta.yaas.io/sap/ta-entities/v1/'

# HTTP request headers
req_headers = {}

# Set content-type to 'application/json' to pass plain text to the service. Specify the text's encoding as UTF-8.
req_headers['content-type'] = 'application/octet-stream'
req_headers['Cache-Control'] = 'no-cache'
req_headers['Connection'] = 'keep-alive'
req_headers['Accept-Encoding'] = 'gzip'
req_headers['Authorization'] = 'Bearer {}'.format(access_token)

# Make the REST call to the Entity Extraction service. Pass the binary data in raw form. Do not base64-encode the data.
response = s.post(url = service_url,  headers = req_headers, data = socialpost)

Here is a sample, HTML-formatted post:

<!DOCTYPE html>
<!--[if gt IE 8]><!--> <html lang="en" class="no-js logged-in "> <!--<![endif]-->
  <head><meta charset="utf-8">
    <meta http-equiv="X-UA-Compatible" content="IE=edge">
    <link href="https://www.foo.com/" rel="alternate" hreflang="x-default" />
    <link rel="mask-icon" href="//foo-a.akamaihd.net/images/ico/favicon.svg" color="#262626">
  </head>
<body class="">
  <b>@morsefit76</b>, <b>@emily.g.davies</b>, <b>@everydayrenee1</b> and <b>@katiaeloera</b> like this<br>
  <b>@super_dupes</b> Love the motivation <i>#strong</i> <i>#healthylife</i> <i>#takecareofyourself</i>
</body>
</html>

The first few lines of the JSON response this service would return for that post are:

{
    "mimeType": "text/html", 
    "entities": [
        {
            "sentence": 1, 
            "text": "@morsefit76", 
            "label": "ID_TWITTER", 
            "paragraph": 1, 
            "offset": 0, 
            "normalizedForm": "", 
            "id": 1, 
            "labelPath": "SOCIAL_MEDIA/ID_TWITTER"
        },

And the last few lines of the response are:

        {
            "sentence": 1, 
            "text": "#takecareofyourself", 
            "label": "TOPIC_TWITTER", 
            "paragraph": 1, 
            "offset": 127, 
            "normalizedForm": "", 
            "id": 8, 
            "labelPath": "SOCIAL_MEDIA/TOPIC_TWITTER"
        }
    ], 
    "textSize": 148, 
    "language": "en"
}

Each entity extracted from the input appears in the response in the "entities" array, in order of appearance. Every entity has eight attributes:

sentence
text
label
paragraph
offset
normalizedForm
id
labelPath

For a detailed description of each attribute in the response, see the link to the JSON schema in the Details section of this service.

In your application, you are interested only in the value of the text attribute when the label attribute's value is TOPIC_TWITTER. The function save_hashtag is your sample application's way of storing extracted hashtags.

# Print result
if response.status_code == 200:
    # De-serialize the JSON reply and get the entities list.
    response_dict = json.loads(response.text)
    # If the service returns no entities, it's not an error. For example "I am not a crook." contains
    # no entities as far as this service is concerned. Thus, the 2nd parameter of the get() call is
    # left out.
    entities = response_dict.get('entities')
    for e in entities:
        e_is_a_hashtag = None
        for key, value in e.iteritems():
            if (key == 'label' and value == 'TOPIC_TWITTER'):
                e_is_a_hashtag = True
            if key == 'text':
                hashtag = value
        if e_is_a_hashtag:
            save_hashtag(hashtag)
else:
    print 'Error', response.status_code
    print response.text

Send feedback
If you find any information that is unclear or incorrect, please let us know so that we can improve the Dev Portal content.
Get Help
Use our private help channel. Receive updates over email and contact our specialists directly.
hybris Experts
If you need more information about this topic, visit hybris Experts to post your own question and interact with our community and experts.

Overview

People

Places

Organizations and products

Times and dates

Numbers

Internet

Use case

API Reference

/

/

post /

Headers

Query Parameters

Body

HTTP status code 200

Body

HTTP status code 400

Body

HTTP status code 401

Body

HTTP status code 403

Body

HTTP status code 500

Body

HTTP status code 503

Body

HTTP status code 504

Body

An empty entities array is normal

The label and labelPath members

Default Language

Setting a subset of languages

Meaning of the textSize value

Annotated JSON schema

Further references

Python Tutorial

Extract "hash tag" topics from social media

Get an access token

Call the service