The widely used SQLAlchemy “Python SQL Toolkit and Object Relational Mapper” now supports the Oracle Database 23ai VECTOR data type, enabling your AI applications.
The new support for vectors in SQLAlchemy is courtesy of a code contribution from Suraj Shaw, a member of Oracle Database’s driver development group. A big thanks to Mike Bayer and Federico Caselli from the
SQLAlchemy team for merging the enhancement into SQLAlchemy 2.0.41.
Photo by Clay Banks on Unsplash
Vector data in Oracle Database 23ai
Oracle Database 23ai introduced Oracle AI Vector Search as part of the database at no additional charge. Vectors are commonly used in AI to represent the semantics of unstructured data such as images, documents, video, and audio. With the new database feature set, a VECTOR data type was added, being represented as an homogeneous array of
8-bit unsigned integers, 8-bit signed integers, 32-bit floating point numbers, or 64-bit floating point numbers.
Vector columns in Oracle Database 23ai can be created as type:
VECTOR(<vectorDimensions>, <vectorDimensionFormat>)
where the attributes are:
- vectorDimensions: the number of dimensions for the vector data. For example, a point in 3D space is defined by vector data of 3 dimensions, i.e., the (x,y,z) coordinates. For the BINARY vector format, the number of dimensions should be a multiple of 8.
- vectorDimensionFormat: one of the keywords BINARY, INT8, FLOAT32, or FLOAT64 to define the storage format for each dimension value in the vector. The INT8, FLOAT32, or FLOAT64 formats are supported with Oracle Database 23.4. The BINARY format was introduced in Oracle Database 23.5.
Both the number of dimension and format can be “flexible”, allowing vectors in a column to have different shapes.
For example, the SQL to create a table T that includes a VECTOR column called EMBEDDING using FLOAT64 storage is:
CREATE TABLE t (id INTEGER NOT NULL,
name VARCHAR2(20 CHAR),
embedding VECTOR(3,FLOAT64),
PRIMARY KEY (id));
Oracle Database 23.7 introduced the ability to store sparse vectors as an efficient way to represent VECTOR data where most dimensions have zero values. These are not yet supported through SQLAlchemy.
For more information about vectors in the database, refer to the Oracle AI Vector Search User’s Guide.
Requirements
To use VECTORs, you need Oracle Database 23ai. Information at oracle.com/database/free will show you how to access a free cloud database or alternatively install one also for free. Vector support is being improved in each database “release update” so use the latest available version.
Next, you need at least version 2.0.41 of sqlalchemy. You should also update oracledb, which itself has been keeping pace with the database's vector improvements:
python -m pip install sqlalchemy oracledb --upgrade
Creating VECTORs in SQLAlchemy
[The full code of example used in this post is available on GitHub here].
In SQLAlchemy, the following creates the same table schema as earlier shown in SQL*Plus:
from sqlalchemy import MetaData, Table, Column, Integer, String
from sqlalchemy.dialects.oracle import VECTOR, VectorStorageFormat
metadata = MetaData()
t = Table(
"t",
metadata,
Column("id", Integer, primary_key=True),
Column("name", String(20)),
Column(
"embedding",
VECTOR(dim=3, storage_format=VectorStorageFormat.FLOAT64),
)
)
Indexes on VECTOR columns can aid performance, see the SQLAlchemy documentation VECTOR Indexes. An example “Inverted File Flat (IVF)”
index is:
from sqlalchemy.dialects.oracle import VectorIndexConfig, VectorIndexType, VectorDistanceType
Index(
"iv
Truncated by Planet PHP, read more at the original (another 5671 bytes)