Introduction to Oracle Data Science Service Part 2

Using oracle-ads Python library to create a secret within an OCI vault. Use this secret to connect to an ADW and read and write data to the AWD.

Introduction to Oracle Data Science Service Part 2

This is part 2 of my blog about the Oracle Data Science Service, if you missed part 1 you can find it here, where we created a project, notebook and custom environment. In this post we will look at using the oracle-ads Python library to create a secret within an OCI Vault, use this secret to connect to an ADW and read and write data to the ADW. We will also use the ocifs Python library to read and write to the OCI Storage Bucket should you require a static dataset as opposed to a database table.

The following will only be possible if your dynamic group (i.e. the 'compute node behind the notebook') is given access to manipulate the secrets in the vault. This can be done with policies to use the vault, use keys, and manage secret-family. The user group your user is a member of will also be required to use the vault and keys.

Since we are creating the secret from a notebook, and the ADW I would like to connect to is on a private network, my notebook should be set up with custom network settings. This allows us to set a VCN (virtual cloud network) with: NAT gateway configured and a private subnet. (Custom networking is not required if the ADW is public/ available through a public network). The created notebook will also require the oracle-ads package to be at least version 2.65. This could be saved onto a custom Conda environment, as in the previous blog post, or installed with pip on any notebook session.

Oracle ADS SDK

Before we start I think it's important to talk about the Accelerated Data Science Software Development Kit. ADS is a Python library that is included as part of OCI Data Science, it has many functions and objects that automate or simplify steps in the data science workflow. In this blog post we are going to use it to connect to various data sources, and to store and retrieve credentials from the OCI Vault, however it also has functions to help with exploring and visualising data, as well as; training, evaluating, and explaining models.

Creating a Vault & Key

The OCI Vault is a service to centrally store encryption keys and credentials for various applications. Keys and Secrets stored in the vault offers you greater security and ensures passwords are not stored in plain text in code or configuration files.

A Vault can be created from within the compartment you’re using, by navigating to Identity & Security → Vault in the OCI Console. Depending on your security/ policy settings creating a Vault might be the role of an admin on your OCI tenancy.

(To create a vault the user would require: manage policies on vault, keys, and secret-family for the compartment you are using.)

This Vault can also store the wallet needed to connect to any ADB (ADW or ATP databases) alongside credentials. Once the secret has been created, it is the only thing required by a notebook to be able to connect to your ADB. Using the Secret from within the notebook ensures the password is never shared with anyone or ever written in the notebook itself.

(This is why the notebook computer/ dynamic group requires the rights to create new secrets and extract existing secrets, in the policy requirements mentioned above.)

From within the Vault you (or an admin) will also need to create a Key. The Key is used to encrypt and read the Secret. All the default settings can be kept and you will just need to name your Key.

Creating a Secret from a notebook

You can create a secret directly from your Data Science Service Notebook. (The following instructions are for an ADW, though the steps are identical for an ATP database). To create this secret from within a notebook you will require:

  • Database wallet;
  • Valid database credentials;
  • ADW service name;
  • Vault OCID;
  • Key OCID;
  • Compartment OCID;
  • ADS SK version >=2.65.

The wallet is a ZIP file that contains connection information and the encryption keys that are needed to establish a secure connection to the database. The wallet can be obtained from your database administrator, Oracle Cloud Infrastructure API, Oracle CLI, or from the Console.

From your Notebook Session you can then upload this zip file either by dragging and dropping it into the file browser, or clicking upload: (for how to create a notebook session please see Part 1 here).

A notebook can then be used to configure the connection. First we need to ensure that the conda environment we’re using has an oracle ADS version of at least 2.65, to do this we can import ads and run: ads.hello()

The output will tell us which ads version we have, here we are using 2.8.1.

We have also imported ADBSecretKeeper from ads, this will allow us to read and write secrets from the OCI Vault. However, we will also need to ensure that we can authenticate ourselves in OCI to be able to use OCI resources. Here we will do this using the resource principal by running: ads.set_auth(auth='resource_principal')

We can then set the ADW connection parameters that we wish to save to the Vault.

# Connection details to ADW
connection_parameters = {
    "user_name": "", # Add username 
    "password": "", # Add password 
    "service_name": "", # Add service name
    "wallet_location": "/home/datascience/Wallet.zip"
    # This requires the wallet to be already uploaded to this notebook session in the location specified here
}

Before saving these credentials to the Vault you can test them by trying to read from the ADW directly:

We can then use ADBSecretKeeper to create a secret, to do this we need to already have created, or the use of, an OCI Vault. You will also need to set the Vault, Key, and Compartment OCIDs related to your tenancy and OCI Objects you wish to use.

## Set required variables
ocid_vault = 'ocid1.vault.' 
ocid_master_key = 'ocid1.key.'
ocid_mycompartment = 'ocid1.compartment.'

# Create secret 
adw_keeper = ADBSecretKeeper(vault_id=ocid_vault,
                            key_id=ocid_master_key,
                            compartment_id=ocid_mycompartment,
                            **connection_parameters)

We can then save this Secret to the Vault, and print out the created secret OCID as follows:

# Save secret with secret name, names must all be unique in vault
adw_keeper.save("my_creds",
    "My DB credentials",
    freeform_tags={"schema":"emp"}, #oci tags
    save_wallet=True #  zip file will also be uploaded to vault
)

print(adw_keeper.secret_id)

Its worth noting that while here we use ADBSecretKeeper to store and retrieve credentials for an Autonomous Database, ADS has classes to store and retrieve other kinds of credentials: BDSSecretKeeper for the OCI Big Data Service; MySQLDBSecretKeeper for Oracle MySQL Database; and AuthTokenSecretKeeper for various kinds of Auth Tokens (for example git tokens).

Using the Secret to connect to an ADW

Once the Secret is stored in the vault, the secret OCID can then be used to connect to the ADW, the_secret is the secret OCID containing our ADW connection details.

import pandas as pd

with ADBSecretKeeper.load_secret(
            the_secret
        ) as adw_creds:  
    df = pd.DataFrame.ads.read_sql("select * from SH.COUNTRIES", connection_parameters=adw_creds)
    print(df.head(2))

Once the secret has been saved to the vault; the wallet file, and the ADW username and password no longer need to be stored on this notebook session or in the code. Only the Secret OCID is required to connect to the ADW.

You can write back to the ADW using ads.to_sql:

# Example Write to ADW using Secret
with ADBSecretKeeper.load_secret(
            the_secret
        ) as adw_creds: 
    df.ads.to_sql(
        "EXAMPLE_WRITE_FROM_DS_SERVICE", # Table name
        connection_parameters=adw_creds, 
        if_exists="replace"
    )

Reading and Writing to the OCI Storage Bucket

These Notebooks can also read and write to an OCI Storage bucket, for this you will need to use the ocifs Python package. Below is an example of reading a csv file into the notebook directly from the bucket. Folder structures within the bucket can be used like any file path between the bucket name and the file name in question.

import pandas as pd
import ocifs # file storage 

bucket_name = "DataScience_Bucket" # Bucket name 
object_name = "wine.csv"  # Data filename as required 

# Read csv file from the Bucket 
df = pd.read_csv("oci://"+bucket_name+"/"+object_name)

Similarly a csv file can be written out in a similar way, using the bucket name and object name.

# Set output filename
output_object_name = "data_writeback.csv"

# Writing to the storage bucket 
df.to_csv("oci://"+bucket_name+"/"+output_object_name)

Summary

In this post we've covered storing credentials in the OCI Vault; using a Secret stored in the Vault to connect to an ADW; as well as creating and writing data to an ADW or to the OCI Storage Bucket.

Coming up in Part 3:

Now we can connect to various data sources, in Part 3 we will look at some of the other features of the Oracle ADS SK, which can help with data exploration, as well as creating, saving, and deploying models from within Oracle Data Science Notebook Sessions.