Oracle Autonomous Database Select AI

Oracle Autonomous Database Select AI

In this article, I am going to provide a quick overview of Autonomous Database Select AI, a new tool available on Oracle ADB that enables users to query the database using natural language.

Select AI can:

  • Take a natural language prompt and execute SQL returning the required data;
  • Take a natural language prompt and return the generated SQL statement;
  • Take a natural language prompt and return a narrative response from the result of a SQL query;
  • Create Chatbot style responses to inputs. 

How does Select AI work?

Select AI combines the capabilities of Large Language Models (LLM) with metadata from your ADB schemas to enable database specific SQL to be generated. 

The pre-trained LLMs, such as those by OpenAI and Cohere, have been trained on specially curated data, and are therefore very effective at interpreting natural language requests, and generating efficient SQL queries. However, these queries would be unlikely to do anything without knowing the names of the schemas, tables, and columns in your data base. LLMs would “hallucinate” these details in the structure of a valid SQL query, and the query would therefore require editing to make it work on your specific schemas and tables. 

The combination of both LLM models and ADB specific metadata means SQL specific to your database can be generated, and then executed on your data, with inherited security and authentication features of the database.

The package that enables integration between a LLM and the Autonomous Database is DBMS_CLOUD_AI. Currently, this package works with OpenAI and Cohere, and will require users to have an account associated with one of these AI providers. 

Pre-reqs

In order to use Autonomous Database Select AI you will require:

  • Access to Oracle ADB (Obviously);
  • An OpenAI or Cohere account with credit. (There is an associated cost when calling these services through an API, information on OpenAI’s pricing can be found here);
  • An ADB admin user to grant user access to DBMS_CLOUD_AI package.

Set up

To get started with Select AI, an administrator on your Autonomous Database instance is required to add your ADB user to the Access Control List (ACL) and grant access to the DBMS_CLOUD_AI package. This can be achieved using the following PL/SQL, where 'MY_USER' is the ADB user you wish to be able to use Select AI. 

BEGIN  
    DBMS_NETWORK_ACL_ADMIN.APPEND_HOST_ACE(
         host => 'api.openai.com',
         ace  => xs$ace_type(privilege_list => xs$name_list('http'),
                             principal_name => 'MY_USER',
                             principal_type => xs_acl.ptype_db)
   );
END;

grant execute on DBMS_CLOUD_AI to MY_USER;

This ADB user then needs to provide credentials to one of the AI providers supported (currently these are OpenAI and Cohere). In the case of OpenAI, you will need to generate a token for API usage, which should be put in the password section of the PL/SQL below. The username will be the email you used to sign-up to, or into, OpenAI, and the credential name could be anything, but it will be required for the next section.

BEGIN
  DBMS_CLOUD.DROP_CREDENTIAL (
   credential_name  => 'OPENAI_CRED');

  DBMS_CLOUD.CREATE_CREDENTIAL(
    credential_name => 'OPENAI_CRED',
    username => 'OPENAI_Username',
    password => 'token' );
END;

You then need to create your DBMS_CLOUD_AI profile, I've named mine RITTMANMEAD, but it could be anything you like. Here, you can list the specific schemas and table names whose metadata will be supplied to the AI provider you are connecting to. This will limit the tables that the generated query can use, since the provider has no knowledge of additional schemas or tables, and therefore this will need to be recreated if you wish to use Select AI with further schemas or tables. 

BEGIN
  DBMS_CLOUD_AI.drop_profile(profile_name => 'RITTMANMEAD');
  DBMS_CLOUD_AI.create_profile(
      profile_name => 'RITTMANMEAD',
      attributes => '{"provider": "openai",
                      "credential_name": "OPENAI_CRED",
                      "object_list": [{"owner": "SH", "name": "customers"},  
                                      {"owner": "SH", "name": "sales"},
                                      {"owner": "SH", "name": "products"},
                                      {"owner": "SH", "name": "countries"}]
       }');
END;

For my examples below I have used the Oracle example schema “SH”, as well as some additional tables in my own schema, which include a breast cancer, and wine data set from Kaggle. 

When you wish to use Select AI, you must set the DBMS_CLOUD_AI at the start of each database session. 

BEGIN
  DBMS_CLOUD_AI.SET_PROFILE(
     profile_name => 'RITTMANMEAD'
  );
END;

More details on these steps can be found on oracle documentation here.

Using Oracle Select AI

Select AI is integrated with Oracle SQL, so you can use it via any SQL IDE, however, if you are using it though a stateless environment, such as SQL Developer Web or Oracle Application Express (APEX), you can only use the DBMS_CLOUD_AI.GENERATE function. As opposed to setting your profile and using the “Select AI” syntax. 

To access the different capabilities you can use the following parameters: 

  • runsql: This is the default action, and specifying it is optional. It runs the sql command created from the natural language prompt and returns the result of the sql query;
  • showsql: This returns the generated SQL query;
  • narrate: This reruns a natural language response from running the generated sql query;
  • chat: AI Generated conversational responses.

Examples: 

Here are a few examples of using Select AI in various IDEs. 

OML Notebooks: 

The majority of my initial tests were done using OML notebooks. (For more on OML notebooks please read my previous blog post which can be found here).

First you must set your profile to the one created above. From here I have tested the showsql and the default runsql actions, the generated query uses the SH.SALES table and the SH.PRODUCTS table.

From the table metadata the LLM has correctly identified and used: 

  • The PRODUCT_NAME column
  • The year extracted from the TIME_ID column
  • The join condition on PROD_ID

The narrate action returns a natural language response to the question “What are the product categories?”. Creating a sentence that lists the 5 values in the product category column. 

The chat action creates chat bot style responses, here I asked it to define SQL for me. 

Since I added my own datasets: BREASTCANCER and WINE, to the tables listed in my profile creation statement, I can also query them. Select AI identifies the correct table to use and that the column “Target” represents people with breast cancer. 

However, there are some instances that may cause misleading results, in the example below, I ask “How many Cameras were sold in 1998?” and the response is 0. We know at least 767 digital cameras were sold in 1998 according to one of our previous questions above. Looking at the query, the issue is that “Camera” is not a value in the PROD_CATEGORY column as the query assumes, but the PROD_SUBCATEGORY column. Since the metadata sent to the LLM is only table and column names, not the content of tables (for security reasons), there is no way for the LLM to identify which column to use. It therefore makes a reasonable, if in this case incorrect, guess. This is a general concern with this kind of model, in that it can confidently state an answer with limited information to base this on, and create syntactically correct queries that don't accurately answer the question.

This can impose limitations around use cases. The tables you wish to be able to query must have clearly named, as must the columns, there should be checks and tests in place to ensure that the result returned is correct, and domain knowledge available to check queries, or structure your natural language prompts with additional information. 

If we have some knowledge of the tables and columns that we are querying, we can structure the prompt in a way that helps the LLM generate a functioning query. For example, here we explicitly specify that the camera value would be found in the sub category column, which is enough information for the generated query to now return the expected result using the PROD_SUBCATEGORY column. 

As well as correctly figuring out joins between supplied tables, and extracting the years from dates joins, I also found it to work with: 

  • Between conditions
  • Ordering of results
  • Unique/ distinct results

SQL Developer Desktop:

Select AI can be used in the same way from SQL developer desktop. You must first set the DBMS_CLOUD_AI at the start of each database session.

SQL Developer Web:

Using SQL Developer Web you can only use the features via the DBMS_CLOUD_AI.GENERATE function as shown in the examples below. You cannot use the runsql action directly but can use the showsql and then execute the sql yourself, or use the narrate action. 

Oracle Application Express (APEX): 

From APEX you can only use the features via the DBMS_CLOUD_AI.GENERATE function as shown in the examples below. From within APEX these can easily be made into a web application, allowing anyone to query the tables you specified in the profile. 

Conclusion

The ability to query your database tables with natural language queries could drastically change who could extract information from your data. Whether that's people without specific SQL skills, who would otherwise not have access to the information contained within the ADB, or users who have competent SQL skills but want to query the data faster or more efficiently, or who may not be familiar with the structure of tables in your ADB.

The actions associated with this are integrated with the database, meaning they can be used in a number of different places as we explored above, and the security and authentication is all defined in the ADB. 

The actions or generated queries do not always produce perfect results, for obvious security reasons the AI provider LLMs only have access to the table structures, and makes, for the most part, very smart decisions on which tables and columns are most related to your prompt. However, they do occasionally create incorrect or misleading queries/results. A lot of the capabilities of these kind of tools depend on having useful, descriptive, and human readable table and column names. Otherwise, your prompts will have to be specific enough to enable the LLM to know which columns or tables to use. 

Select AI does work well on Views and you could spend some time creating some Views with descriptive columns on data you wish to query in this way. I would still advise having the results and queries sense checked to ensure users would not be reporting on incorrect data.