Functions for Structured data with OCI

Functions for Structured data with OCI

For data engineers working in the cloud, Oracle Cloud Infrastructure (OCI) offers a suite of powerful functions designed to simplify data manipulation and analysis. Amongst these are three functions that stand out for their ability to handle structured data in different formats: json_table, xml_table, and the lesser-known unnest.

Let us take a look into each of these functions and explore how they can aid in any cloud-based data workflows.

json_table: JSON to Relational Tables

The json_table function allows you to convert JSON data into a relational table format by extracting specified JSON data elements and returning them as rows and columns. This is particularly useful when working with data extracted from APIs or NoSQL databases, which often come in JSON format.

The Syntax of Oracle JSON_TABLE:

JSON_TABLE(
	json_data
    , path_expression COLUMNS (column_name data_type PATH json_path_exp) [, ...])
  • json_data: The JSON document or column from which to extract data.
  • path_expression: The JSON path expression specifies the elements to extract.
  • column_name: The name to assign to the extracted JSON element.
  • data_type: The data type of the extracted JSON element.
  • json_path_exp: The JSON path expression to access the specific element within the JSON structure.

a working example:

SELECT *
FROM json_table(
  '{"items": [{"id": 1, "name": "Product A"}, {"id": 2, "name": "Product B"}]}'
  COLUMNS (
    id INT PATH '$.items[*].id',
    name VARCHAR2(100) PATH '$.items[*].name'
  )
);

In this example, the JSON string is parsed and two columns id and name are extracted from each item within the items array. The resulting output is as a relational table making it easier to work with and interrogate with the usual SQL queries.

xml_table: Working with XML Data

Similar to json_table, the xml_table function allows you to transform XML data into a table format making it valuable when dealing with data stored in XML files or retrieved from XML-based APIs.

The syntax for xml_table

XMLTABLE
    (‘<XQuery>’ PASSING <xml Table_column >
    COLUMNS <New_Table_Column_Name> <Column Type> PATH <XQuery path>)

XMLTable maps the result of an XQuery evaluation into relational rows and columns. You can query the result returned by the function as a virtual relational table using SQL.

  • <XQuery> (optional): An XQuery expression can be specified here to manipulate the XML data before processing. It's optional and defaults to the identity function (/).
  • PASSING <xml_table_column>: This clause indicates the column containing the XML data you want to process. The column data type should be XMLType.
  • COLUMNS: This keyword introduces the definition of columns for the resulting virtual table.
  • <new_table_column_name>: This defines the name you want to assign to the new column in the relational table.
  • <column_type>: This specifies the data type of the new column.
  • PATH <XQuery_path>: This clause uses an XPath expression to navigate within the XML structure and extract the desired data for the corresponding column.

Here's an example:

SELECT *
FROM xml_table(
  '<products><product id="1"><name>Product X</name></product><product id="2"><name>Product Y</name></product></products>'
  COLUMNS (
    id INT PATH '/products/product/@id',
    name VARCHAR2(100) PATH '/products/product/name/text()'
  )
);

Here, the XML string is parsed, and two columns, id and name, are extracted from each product element. The resulting table structure simplifies further analysis and manipulation using SQL queries.

unnest: Flattening Arrays for Simplified Processing

While not directly related to parsing structured data formats, the unnest function plays a crucial role in working with these functions as it allows you to "unflatten" nested arrays (an array within an array) , making them easier to work with within relational tables.

Imagine you have a table containing customer information, and one column stores a list of customers' previous purchases where this list might be stored as an array within the table. If you were to ask the question “what are the the individual purchases for every customer”, with the data in the current format this analysis would be quite tricky.

The unnest function becomes very useful here as it takes a nested array and "un-nests" it, essentially creating a new row for each element in the original array making it much easier to work with and analyse.

A simple example:

suppose we have a table named customers with the following columns:

customer_id (integer)

name (varchar)

previous_purchases (array of varchar)

Which has the following rows of data:

customer_id

name

previous_purchases

1

Andy Inches

["Shirt", "Jeans", "Hat"]

2

Rene Young

["Dress", "Shoes"]

3

Little Feather

["Jeans", "Shoes"]

If we want to find out how many of each item were purchased overall, working directly with the nested array would be tricky. But with unnest, we can achieve this easily with the following code:

SELECT item, COUNT(*) AS total_purchases
FROM customers
UNNEST(previous_purchases) AS item
GROUP BY item;

which would create the following results:

item

total_purchases

Shirt

1

Jeans

2

Hat

1

Dress

1

Shoes

2

By leveraging these functions, cloud-based data engineers can effectively manage and manipulate structured data in various formats, ultimately simplifying complex data engineering tasks within the OCI environment.