Recipe 5: Using Pre-Built Content (a glossary example)

If you've been following our recipes in the hopes that you can actually see Egeria work with some metadata, then this is the recipe you've been waiting for! But rather than injecting metadata by hand, we will take the easy path, and load in pre-existing content via archive files.

In this recipe we will:

Load a business glossary archive into our metadata server
Explore the server to find what glossaries exist
Query the metadata server to find terms

Concepts

Open Metadata Archives are an important feature - not only for the ability to load pre-built content but also as a mechanism to create, and well, archive from your metadata repositories for sharing or backup. More information can be found at Open-Metadata-Archives. All sorts of content can be loaded from archive files including Egeria Open Metadata Types, Governance Definitions, Reference Data, and ... Glossaries.

This recipe will load and explore a metadata glossary to give you a taste for how we can use Egeria. As we proceed, you'll notice that our cURL commands are getting a bit more complex - so in subsequent recipes we will often switch to using the emerging Egeria Python Client available on pypi. In the next recipe we will set up and start using the Egeria Python Client.

1. Load a business glossary archive into our metadata server

In this recipe we can use any Egeria Metadata Access Store - but since our previous recipe used active-metadata-store, our examples will assume that is the name of our metadata server. If you have a different one running then just use that name instead. If you don't have any server running then refer back to our previous recipes.

OK, now that we have established that you have active-metadata-store running on some platform (let's assume https://localhost:9443), we can first test that our platform is ready by doing our simple probe request like:

curl -k https://localhost:9443/open-metadata/platform-services/users/tom/server-platform/origin

We now get back the response we were expecting:

Egeria OMAG Server Platform (version 4.3)

If you don't get an output to your console similar to this then perhaps take a look at Recipe 3 - and if that doesn't help, feel free to drop me an email.

OK, right, now that Egeria is up and running and we have a metadata server available, we can load an archive file with the glossary. If you look in the Egeria Distribution folder you will see something like:

As you can see there are quite a number of content files listed - many useful pre-built utilities and samples. The one we will use in this recipe is nicely circled in green - the CocoSustainabilityArchive. This is a very small subset of a much larger glossary we built to help folks jumpstart their Sustainability initiatives; the small size also means it is a bit more manageable for us to review the output of queries..

The command to load an archive into a metadata server can be found in the generated Swagger for the Egeria platform - as we described more completely in Recipe 3, if you review the website at Swagger Site, and look for addOpenMetadataArchiveFile in the section called Platform, you'll find the following REST call to load an archive file. While we could execute the REST call with cURL, it might be easier to use the swagger interface. Here is a screenshot showing the call filled out (you need to press the Try-It button):

Executing this command in should load the archive into the active-metadata-store OMAG server repository. If it works correctly, the output should be similar to:

    {
        "class":"VoidResponse",
        "relatedHTTPCode":200
    }

If that's what you got then GREAT! If not, look at the provided exception and see if you can work out what it is telling you..they are usually pretty descriptive.

2. Explore the server to find what glossaries exist

Now that we have a glossary, lets try some queries! First, lets do the obvious and ask Egeria to show us what glossaries we can find. Glossaries are a kind of metadata asset, and for these samples we will use APIs from the Asset Manager Open Metadata Access Service (OMAS) that is one of the services configured on the active-metadata-store OMAG server that we are using. The full list of services activated on the active-metadata-store are displayed as messages on the console display during startup (look in the terminal window where you started Egeria).

On the swagger page we can open up the Asset Manager OMAS section and look for the findGlossaries API which looks like the following:

This time let's use one of our other demo users from the mythical CocoPharmaceuticals company - erinoverview - she has the permissions that are needed to run this request (and garygeeke, as an administrator, may not). You'll note that in addition to the serverName and userId, we have two other parameters - startFrom and pageSize. When we have a lot of results to return, it can be critical to return them a batch (or page) at a time. So if we had 1000 objects that matched the query (unlikely, but we could easily have a 1000 glossary terns), we could ask Egeria to return them to us 50 at a time. In this case we would set the pageSize to be 50 and issue this request in a loop, starting from 0 (e.g. startFrom) and then the next call would set startFrom to 50 and so on, until no more requests yielded information. So for now we will start from 0 and set pageSize to be 100. For now, we can safely ignore the last two parameters (forLineage and forDupliateProcessing) and cover these another time.

Ok, so we issue the request and we get a response of:

{
  "class": "GlossaryElementsResponse",
  "relatedHTTPCode": 200,
  "elementList": [
    {
      "elementHeader": {
        "class": "ElementHeader",
        "headerVersion": 0,
        "status": "ACTIVE",
        "type": {
          "typeId": "36f66863-9726-4b41-97ee-714fd0dc6fe4",
          "typeName": "Glossary",
          "superTypeNames": [
            "Referenceable",
            "OpenMetadataRoot"
          ],
          "typeVersion": 1,
          "typeDescription": "A collection of related glossary terms."
        },
        "origin": {
          "sourceServer": "active-metadata-store",
          "originCategory": "CONTENT_PACK",
          "homeMetadataCollectionId": "be351568-97ec-4c34-bca5-aff93f326d9e",
          "homeMetadataCollectionName": "Coco Pharmaceuticals Sustainability Project",
          "license": "Apache 2.0"
        },
        "versions": {
          "createdBy": "Egeria Project",
          "createTime": "2023-07-20T10:48:16.784+00:00",
          "version": 1
        },
        "guid": "f9b78b26-6025-43fa-9299-a905cc6d1575",
        "classifications": [
          {
            "class": "ElementClassification",
            "headerVersion": 0,
            "status": "ACTIVE",
            "type": {
              "typeId": "33ad3da2-0910-47be-83f1-daee018a4c05",
              "typeName": "CanonicalVocabulary",
              "typeVersion": 1,
              "typeDescription": "Identifies a glossary that contains unique terms."
            },
            "origin": {
              "sourceServer": "active-metadata-store",
              "originCategory": "CONTENT_PACK",
              "homeMetadataCollectionId": "be351568-97ec-4c34-bca5-aff93f326d9e",
              "homeMetadataCollectionName": "Coco Pharmaceuticals Sustainability Project"
            },
            "versions": {
              "createdBy": "Egeria Project",
              "createTime": "2023-07-20T10:48:16.784+00:00",
              "version": 1
            },
            "classificationOrigin": "ASSIGNED",
            "classificationName": "CanonicalVocabulary",
            "classificationProperties": {
              "scope": "Across Coco Pharmaceuticals"
            }
          }
        ]
      },
      "glossaryProperties": {
        "class": "GlossaryProperties",
        "qualifiedName": "Glossary:Sustainability",
        "displayName": "Sustainability Glossary",
        "description": "Terminology associated with Coco Pharmaceutical's sustainability initiative.",
        "language": "English",
        "usage": "For all Coco Pharmaceutical employees wishing to understand more about sustainability and the organization's efforts to improve its operations."
      }
    }

This is what we know about the glossaries in active-metadata-store. There is a lot of information here - and though it might be useful to an advanced user or a UI, we probably don't need to understand all the details just now. Lets try to understand the overall anatomy - and hi-lite some key things that we should understand. First lets collapse sections of the JSON to show the structure:

Here is how to interpret each of the numbered sections:

class: this is the class name of the object that describes this response. Useful to make sure we have a response type we were expecting.
relatedHTTPCode: this tells us if processing of the request succeeded - a 200 means success (similar to HTTP codes)
This tells us that what follows is a list of result objects. Each of the result objects has a ->
1. elementHeader - that contains a lot of details about the glossary term - such as its Active, where it came from, different versions that exist and so on. It can be good information - but often the most vital information for making further requests on the glossary and its terms is the GUID that is one of the key properties within the elementHeader. If we look at the full JSON, within the elementHeader, we see that the GUID is "f9b78b26-6025-43fa-9299-a905cc6d1575" - keep this around because we'll use it again later.
2. glossaryProperties are the meat of the glossary - this includes the displayName (what humans see the glossary name as), an internal unique, semi-understandable string called the qualifiedName that is another important way of identifying the glossary, a description of the glossary and its intended use. All good stuff.

Now, you will have noticed that in the body of the REST API there was a parameter named searchString with a value set to be ".*" - as you would guess, the value of the searchString parameter is a regular expression and so the value ".*" means match to any name - or in this case, any glossary name - so, return all glossaries. As you would expect, if you had many glossaries and wanted to search for a particular subset by name, you could form a different search string. In our simple example here, unless you have snuck in another glossary, if you changed the searchString value to "z" you shouldn't get an interesting result - go ahead and give it a go..

3. Do a couple of queries to find glossary terms

Now that we've found a glossary to work with, and learned a bit about queries, lets take a look at the terms in the glossary. The first command that we'll do is retrieve all the terms in the glossary. In swagger, within the Asset Manager OMAS group we will use the find-terms-by-search-string request, filling in the body like so:

Within the body of the request, we have three parameters:

effectiveTime: the time at which we want the query evaluated - Egeria has powerful features for handling time and events. This parameter is optional.
searchString: as with the glossary search command we used above, this can be a regular expression. To retrieve all terms we will specify ".*".
glossaryGUID: this specifies the glossary to search within. If not specified, then Egeria will search all glossaries within the Metadata Access Server.

There are additional ways we can filter and refine our search results..they are optional, we will get into more details in a future recipe dedicated to glossaries.

Ok, so we made our RESTful request - and we get back a big JSON document that lists all the terms. The structure of the JSON has similarities with the glossary JSON we explored above. Rather than try to read the JSON as plain text, I find it useful to cut and paste the JSON into a JSON aware editor or reader so that I can more easily navigate through using folding. As we saw with the glossary search result, there is an elementList, and within that element is there is a section for each term that we find. Digging a bit deeper, we see a structure called glossaryTermProperties that contains the term information such as its qualifiedName, displayName, summary, description, abbreviation. There are often additional related pieces of information that we could navigate to by using the term GUID - but, again, we'll leave that to a more focused, future discussion on glossaries.

So now its time to do a bit more experimenting on your own..here are a few things that you can try by just changing the request body and re-issuing the request. I'll start with an example that retrieves a single term named Hydrofluorocarbon... give it a try:

Here are some other experiments to try:

Return all terms starting with "Hydro"
Return all terms starting with "H"
Return all terms ending with "carbon"
Return all terms that have "carbon" in their name
Return all terms that have the word "scope" in their name

As you can see, searching is pretty easy - and there are many other kinds of searches we can do as well. These concepts work equally well on assets such as databases, or schemas or other things that we might be interested in that we have cataloged with Egeria.

Recap

In this recipe we demonstrated loading Egeria with pre-built content that has been shared with us - in this case a tiny glossary of sustainability terms - and then how we can query Egeria to find glossaries and terms. We started to take a look at how we send HTTP Post requests to Egeria that contain a JSON or string payload - and a beginning on understanding a JSON result. We also started to learn about working with Glossaries in Egeria - which we will cover in greater detail in a future recipe.

In doing this work, we mostly made use of the built-in Swagger portal; entering in the REST commands and executing them with cURL is feasible but even more tedious. As we want to work with Egeria in more sophisticated ways, its more convenient to start to use a programming language - even if we are only writing simple scripts. So in our next recipe, we will start working with the new Egeria Python client that we've been building. Although its still in development, its stable enough to use for our next set of recipes!

Last modified: 02 October 2024