Oracle Data Science - Accelerated Data Science SDK Configuration

How to configure Oracle Data Science's Accelerate Data Science SDK to access Oracle Cloud Infrastructure Object Storage

Oracle Data Science - Accelerated Data Science SDK Configuration

In my last post, I introduced Oracle Data Science, the new tool from Oracle aimed at Data Science collaboration including an Auto-ML Python SDK named Accelerated Data Science. The SDK allows the speed-up and automation of various tasks in the ML pipeline: from feature engineering, model and features selection and model explainability. A very handy tool for newbies and experienced people facing data science problems.

First version, First Hurdle

In the first version of Oracle Data Science, there is a preliminary step to follow before using the Accelerated Data Science SDK. This step is needed to be able to connect Oracle Cloud Infrastructure Object Storage, to save our models in the model catalog.

Please note that all the other steps within the SDK will still be available even without this setting. You will still be able to execute the calls to feature engineering,  model and feature selection, model explainability functions but you'll not be able to save the model in the catalog.

If we want to accomplish this last step, we need to create a private/public key and setup a configuration file. Lucky enough we can run the whole process within the Oracle Data Science Notebook! Let's see all the steps

ADF SDK Configuration

First of all, let's login in the notebook and open a terminal session

We can then create a folder named .oci under the /home/datascience

mkdir ~/.oci

In the next step, we need to generate an API signing Key

openssl genrsa -out ~/.oci/oci_api_key.pem -aes128 2048 

the command will ask for a password which will secure the key

now it's time to generate the public key with

openssl rsa -pubout -in ~/.oci/oci_api_key.pem -out ~/.oci/oci_api_key_public.pem

again the command will ask for a password and then generate the oci_api_key_public.pem file

Another item we need to generate is the key's fingerprint, which can be done with

openssl rsa -pubout -outform DER -in ~/.oci/oci_api_key.pem | openssl md5 -c

the command will ask for the key's password and then output the fingerprint

The next piece of information needed is the Tenancy OCID and the User OCID. The first one can be obtained by navigating to the Governance and Administration section and then selecting Administration and Tenancy Details

The OCID is shown in the main Tenancy Information section

The User OCID can be found by selecting the Identity -> Users

After selecting the User we want to connect with, the OCID is visible

The next step is to upload the public key generated few steps before. Navigate again to the Identity -> Users -> Username screen in the console. Under the global User info, there is an API Keys section

we can click on Add Public Key and paste the content of the oci_api_key_public.pem file generated before

Now it's time to use all the information collected so far to create a config file. The file needs to reside under ~/.oci/ folder and must be named config with the following entries

[DEFAULT]
user=<OCID of the user>
fingerprint=<Fingerprint of the Key>
key_file=<Path to the private Key>
pass_phrase=<Passphrase used to Encrypt>
tenancy=<OCID of the Tenancy>
region=<Region where the Tenancy is hosted>

an example is

[DEFAULT]
user=ocid1.user.oc1........
fingerprint=c8:24:75:00:00....
key_file=~/.oci/oci_api_key.pem
pass_phrase=oracle123
tenancy=ocid1.tenancy.oc1.......
region=eu-frankfurt-1

If the settings are not correct, when trying to save the model you'll get an error like the following

If, settings are correct, the save model step (defined in my previous blog post) will execute correctly. Enjoy your trials with Oracle Data Science!