3. Tutorial for Import Configuration Files

Test Data Setup

The following section demonstrates how you can import data from a database into XplainData’s software. To this end, you can unzip the file that contains our test data set (consisting of four CSV files).

Import these CSV files into your relational database. We assume that you use the filenames as table name, e.g. the table citiescatalog stems from the file citiescatalog.csv. Make sure to remember the name of the database, the password and user you use to connect to this database.

After this, your database should consist of two tables which contain transaction data (patients and prescriptions) and two tables which contain catalog data (citiescatalog and atccatalog).

Please note: throughout the rest of this tutorial we will assume that the data has been imported into a MySQL database. We will use the following parameters to access this data within the next examples:

"databaseConnectionConfiguration": {
  "databaseType": "MYSQL",
  "url": "127.0.0.1",
  "portNumber": 3306,
  "user": "youruser",
  "password": "yourpassword",
  "databaseName": "xplain_training"
},

You will have to adopt these settings within the next examples with your parameters (user, password, name of the database, etc.)

Importing catalogs

Create a directory where all data and configuration files will later be found. E.g. create a directory my_xplain_dir. Create two subdirectories: config and data. Do NOT alter the name of these subdirectories.

Create a file import.xgenscript in the config folder. This file will later be used to build all catalogs and to import all relational data. Copy and paste the following into this file:

{
  "tasks": [{
    "name": "Generate ATC catalog",
    "maxTaskThreads": 3,
    "parallelTasks": [{
      "task": "buildAttribute",
      "attributeConfiguration": {
        "databaseConnectionConfiguration": {
          "databaseType": "MYSQL",
          "url": "127.0.0.1",
          "portNumber": 3306,
          "user": "youruser",
          "password": "yourpassword",
          "databaseName": "xplain_training"
        },
        "attributeTableName": "atccatalog",
        "attributeName": "pzn",
        "keyFieldNameInAttributeTable": "pzn",
        "hierarchyLevelColumnNames": ["atc5","atc4", "atc3",
                                      "atc2", "atc1", "pzn"],
        "hierarchyLevelNames": ["Level 1", "Level 2", "Level 3",
                                "Level 4", "Level 5", "PZN"]
      },
      "saveToFile": "atccatalog.xattribute"
    }]
  }]
}

As can be seen this configuration file consists of an array of task sets which will be executed later on. Currently there is only one task set defined, namely “Generate ATC catalog”. Each task set consists of an array of (parallel) tasks to be executed simultaneously. The only parallel task to be executed here is of the type “buildAttribute”.

Before we can start our first build run, we have to set the environment variable xplainpath. Set it to your my_xplain_dir path. For instance, when using Linux / MacOS this can be done via

export xplainpath=/yourPathTo/my_xplain_dir/

After this we can start our first build run. This can be done with the following command:

cd /yourPathTo/my_xplain_dir/config java -jar /pathto/xplain.jar -g import.xgenscript

(/pathto has to be modified to the path where your copy of xplain.jar can be found). The output of this command should be something like:

17:00:29.218 [main] INFO com.xplaindata.ui.generation.GenerationScript - Starting script ...
17:00:29.220 [main] INFO com.xplaindata.ui.generation.GenerationScript - Executing task set "Generate ATC Catalog" ...
17:00:29.225 [pool-1-thread-1] INFO com.xplaindata.xtable.config.AttributeConfiguration - Query string for building the hierarchy of attribute pzn: SELECT atc5, atc4, atc3, atc2, atc1, pzn, pzn FROM atccatalog
17:00:29.225 [pool-1-thread-1] DEBUG com.xplaindata.itable.CategorialAttribute - Query string: SELECT atc5, atc4, atc3, atc2, atc1, pzn, pzn FROM atccatalog
17:00:30.567 [main] INFO com.xplaindata.ui.generation.GenerationScript - Task set "Generate ATC catalog" executed in [ms]: 1347
17:00:30.567 [main] INFO com.xplaindata.ui.generation.GenerationScript - ... script executed in [ms]: 1347

The second catalog will be build accordingly. Copy and paste the following after line 26 into your import.xgenscript file:

,"tasks": [{
  "name": "Generate city catalog",
  "maxTaskThreads": 3,
  "parallelTasks": [{
    "task": "buildAttribute",
    "attributeConfiguration": {
      "databaseConnectionConfiguration": {
        "databaseType": "MYSQL",
        "url": "127.0.0.1",
        "portNumber": 3306,
        "user": "youruser",
        "password": "yourpassword",
        "databaseName": "xplain_training"
      },
      "attributeTableName": "citiescatalog",
      "attributeName": "city",
      "keyFieldNameInAttributeTable": "id",
      "hierarchyLevelColumnNames": ["state","city", "id"],
      "hierarchyLevelNames": ["State", "City", "CityID"]
    },
    "saveToFile": "citycatalog.xattribute"
  }]
}]

Again, execute this script:

java -jar /pathto/xplain.jar -g import.xgenscript

Now, if you check your my_xplain_dir/data folder you should find two files: atccatalog.xattribute and citycatalog.xattribute

Import transaction data

Now that we imported our catalogs we will import our transaction data. Let’s start with the main object: the patient table. In your config folder generate a file patients.xtableconfig and copy the following content:

{
  "databaseConnectionConfiguration" : {
          "databaseType": "MYSQL",
          "url": "127.0.0.1",
          "portNumber": 3306,
          "user": "youruser",
          "password": "yourpassword",
          "databaseName": "xplain_training"
  },
  "dbTableName" : "patients",
  "objectName" : "Patients",
  "dbKey" : "id",
  "dbForeignKeys" : [ ],
  "dimensionConfigurations" : [
    {
      "dbColumnName" : "id",
      "dimensionName" : "Patient",
      "xplainDataType" : "LONG"
    }, {
      "dbColumnName" : "gender",
      "dimensionName" : "Gender",
      "xplainDataType" : "CATEGORIAL"
    }, {
      "dbColumnName" : "Age",
      "dimensionName" : "Age",
      "xplainDataType" : "INT"
    }, {
      "dbColumnName" : "cityID",
      "dimensionName" : "City",
      "xplainDataType" : "CATEGORIAL"
    }
  ]
}

This will later on import all patient data from our data source. Now, generate a file prescriptions.xtableconfig (again in your \config folder) and copy the following content:

{
  "databaseConnectionConfiguration" : {
          "databaseType": "MYSQL",
          "url": "127.0.0.1",
          "portNumber": 3306,
          "user": "youruser",
          "password": "yourpassword",
          "databaseName": "xplain_training"
  },
  "dbTableName" : "prescriptions",
  "objectName" : "Prescriptions",
  "dbKey" : "id",
  "dbForeignKeys" : ["patId"],
  "dimensionConfigurations" : [
    {
      "dbColumnName" : "patId",
      "dimensionName" : "patId",
      "xplainDataType" : "LONG"
    },{
      "dbColumnName" : "id",
      "dimensionName" : "id",
      "xplainDataType" : "LONG"
    },
    {
      "dbColumnName" : "date",
      "dimensionName" : "Date",
      "xplainDataType" : "LONG"
    }, {
      "dbColumnName" : "costs",
      "dimensionName" : "Costs",
      "xplainDataType" : "LONG"
    }, {
      "dbColumnName" : "pzn",
      "dimensionName" : "PZN",
      "xplainDataType" : "CATEGORIAL"
    }
  ]
}

These to .xtableconfig files describe which table attributes will be imported from which tables. Now, the next step is to adopt your import.xgenscript file in order to define how to import these data. Add the following lines to this file:

"tasks" : [
    {
      "name" : "Generate Patient XTable",
      "maxTaskThreads" : 3,
      "parallelTasks" : [
        {
          "task" : "buildXTable",
          "configurationFile" : "patients.xtableconfig",
          "saveToFile" : "Patients.xtable"
        }
      ]
    }
  ],
    "tasks" : [
    {
      "name" : "Generate Prescriptions XTable",
      "maxTaskThreads" : 3,
      "parallelTasks" : [
        {
          "task" : "buildXTable",
          "configurationFile" : "prescriptions.xtableconfig",
          "saveToFile" : "Prescriptions.xtable"
        }
      ]
    }
  ]

Again, execute this script:

java -jar /pathto/xplain.jar -g import.xgenscript

Now, if you check your my_xplain_dir/data folder you should find four files: atccatalog.xattribute, citycatalog.xattribute, patients.xtable and prescriptions.xtable. The first two files contain data stemming from the ATC catalog and the City catalog. The last two files contain the corresponding transactional data, i.e. all patient data and prescription data.