[SOL-539] How to install Row Level Security via Virtual Schema? Created: 17.03.2017  Updated: 06.04.2020

Status: Published
Project: Solution Center
Component/s: None
Affects Version/s: EXASOL 6.0.0, Exasol 6.1.0
Fix Version/s: None

Type: Explanation
Reporter: Captain EXASOL Assignee: Captain EXASOL
Labels: None

Issue Links:
Causing
Related
relates SOL-149 Multi-Tenancy: It is all about trust Obsolete
Solution:

Background

Problem Statement

Row level security is the ability to decide for each row of a table or view if it is visible for certain users. There is no standard approach for row level security among database vendors and the actual requirements may quite differ for different customers.

In EXASOL it is not necessary to implement a specific concept for row level security. Instead, the general concept of virtual schemas allows for a straightforward implementation of row level security that is tailored to actual user needs.

Let's consider the following scenario:

  • Each table in the schema DATA_SCHEMA shall be secured.
  • Users u1 and u2 should only see "their" rows of the tables. In order to make the decision, we assume that every table in DATA_SCHEMA has a column USERNAME which contains the necessary information.
  • In addition, user u1 has the preference that among all of her rows, she should only be allowed to read rows where the attribute ACTIVE is true. User u2 has no such limitation and should be able to see all of her rows.

Remember, that this is just an example to outline the general concept, your scenario could be quite different.

General Approach

  • Create a virtual schema RLS_SCHEMA which "wraps" each table in the DATA_SCHEMA.
  • Users are not given access to tables in DATA_SCHEMA but only to their counterparts in RLS_SCHEMA.
  • When accessing tables in RLS_SCHEMA, the virtual schema adapter adds necessary filters to the queries in order to implement the desired "secured" behavior.

Prerequisites

  • If necessary, read up on the concept of virtual schemas (e.g., Section 3.7 of the User Manual or watch the introduction on Youtube)
  • In this SOLUTION, we assume an instance of EXASOL (at least version 6.0) running on a machine with the IP address 192.168.56.101
  • You need docker in order to follow Step 1
  • You need to configure BucketFS (in EXAoperation). We assume that the default BucketFS is running on port 2580 (see SOL-503)
  • Create a new bucket called py (here we assume the write-password w)

How to install Row Level Security via Virtual Schema?

Virtual schemas are a kind of read-only link to an (external) source and contain virtual tables which look like regular tables except that the actual data is directly retrieved from the provided source. Such a virtual schema needs an adapter script that implements the data access logic. The two main tasks of such a script are to read metadata information of objects included in the schema and to define what parts of an EXASOL SQL query can be pushed down into an appropriate query on the source.

In the specific example, our virtual schema RLS_SCHEMA reads from the original DATA_SCHEMA. In order to obtain corresponding metadata of tables, we need to access EXASOL from inside the virtual schema adapter scripts. Currently, the best way to do so is by using EXASOL's websocket api, which allows for very fast connections to EXASOL (even from inside of user defined functions and adapter scripts).

Step 1: Install EXASOL's websocket API via BucketFS.

For EXASOL 6, the websocket api is not pre-installed. By using BucketFS, one can easily install any packages and libraries.

In order to install the websocket api for Python, please perform the following commands. Your ScriptLanguages file from step 1 may be different depending on which version you are using.

  • Download the script language container
    $ curl 192.168.56.101:2580/default/EXAClusterOS/ScriptLanguages-2018-05-07.tar.gz > sl6.tar.gz
    
  • Import into Docker
    $ docker import sl6.tar.gz sl6
    
  • Start the container with a shared folder ws (mounted at /ws inside the container)
    $ mkdir ws
    $ docker run -v `pwd`/ws:/ws --name=sl6i -it sl6 /bin/bash
    
  • Inside the container, first install git
    $ apt-get install git
    

    Python's pip should already be installed in the container.
    Then install EXASOL's websocket api in the target directory /ws

    $ pip install --target=/ws 'git+http://github.com/EXASOL/websocket-api.git#egg=exasol-ws-api&subdirectory=python'
    

    Exit docker and create a package of the folder ws

    $ tar zcf ws.tar.gz ws
    

    Install it into the bucket:

    $ curl -vX PUT -T ws.tar.gz http://w:w@192.168.56.101:2580/py/ws.tar.gz
    

Step 2: Setup the scenario described above:

  • Create users u1 and u2 and grant rights
  • Create schemas adapter_schema and data_schema
  • Create table user_pref in adapter_schema that defines which users have a preference on the ACTIVE attribute
  • Create sample table t in data_schema
connect sys/exasol;

drop user if exists u1;
drop user if exists u2;

create user u1 identified by "u1";
create user u2 identified by "u2";

grant create session to u1;
grant create session to u2;

drop schema if exists adapter_schema cascade;
drop schema if exists data_schema cascade;

create schema adapter_schema;
create schema data_schema;


create table adapter_schema.user_pref(username varchar(100), wants_only_active bool);

insert into
	adapter_schema.user_pref
values
	('SYS', false),
	('U1', true),
	('U2', false);

CREATE TABLE data_schema.t(a1 varchar(100), a2 varchar(100), userName varchar(100), active bool);
INSERT INTO
	data_schema.t
values
	('a', 'b', 'SYS', true),
	('c', 'd', 'SYS', false),
	('e', 'f', 'U2', true),
	('g', 'h', 'U2', false),
	('i', 'j', 'U1', true),
	('k', 'l', 'U1', false);

Step 3: Create a connection object for the meta data access from inside of the adapter script.

As described above, the virtual schema adapter script needs to be able to connect to the database in order to access metadata. The credentials for this connection could simply be contained inside of the adapter script (as string literals), but this is not a good idea. When defining the adpater script via SQL, the passwords would also end up in SQL logfiles. A better approach is to hide sensitive information in connection objects as the part prefixed with IDENTIFIED BY never appears in logfiles! As an example, we use the users SYS with the standard password:

create connection sys_connection to 'ws://localhost:8563' user 'SYS' identified by 'exasol';

Step 4: Create the virtual schema adapter script.

The adapter script in adapter_schema reads the metadata of tables of a protected schema, in our case DATA_SCHEMA. In the function get_meta_for_schema we extend Python (sys.path.extend...) by the websocket api uploaded into the bucket py which is then used to read metadata. We follow the same procedure in the user_wants_only_active function to determine preferences on the ACTIVE attribute. These two functions are then called in the virtual schema adapter callback adapter_call that implements the desired row level access control behavior for RLS_SCHEMA.

CREATE OR REPLACE PYTHON ADAPTER SCRIPT adapter_schema.rls_adapter AS
import json
import string

# For this example, we only support the data types VARCHAR and BOOLEAN in the wrapped tables
def get_datatype(name, maxsize, prec, scale):
    if name.startswith("VARCHAR"):
        return {"type":"VARCHAR", "size":maxsize}
    if name == "BOOLEAN":
        return {"type":"BOOLEAN"}
    raise ValueError("Datatype '"+name+"' yet not supported in RLS virtual schema")    

# This function reads all the meta data for all the tables in the wrapped schema
def get_meta_for_schema(cn, s):
    import sys
    import glob
    sys.path.extend(glob.glob('/buckets/bfsdefault/py/ws/*'))
    import EXASOL
    c = exa.get_connection(cn)

    tabs = []
    with EXASOL.connect(c.address, c.user, c.password) as connection:
        with connection.cursor() as cursor:
            cursor.execute("select table_name from EXA_ALL_TABLES where table_schema='"+s+"';")
            for row in cursor:
                tabs.append(row[0])

        rtabs = []
        for t in tabs:
            with connection.cursor() as cursor:
                cursor.execute("select column_name, column_type, column_maxsize, column_num_prec, column_num_scale from EXA_ALL_COLUMNS where column_schema='"+s+"' and column_table='"+t+"' order by column_ordinal_position;")
                cols=[]
                for row in cursor:
                    cols.append({"name":row[0], "dataType": get_datatype(row[1],row[2],row[3],row[4])})
                rtabs.append({"name":t, "columns": cols})
        return rtabs

# This function gets the user preference regarding the ACTIVE column
def user_wants_only_active(username,cn):
    import sys
    import glob
    sys.path.extend(glob.glob('/buckets/bfsdefault/py/ws/*'))
    import EXASOL
    c = exa.get_connection(cn)

    with EXASOL.connect(c.address, c.user, c.password) as connection:
        with connection.cursor() as cursor:
            cursor.execute("select wants_only_active from adapter_schema.user_pref where username='"+username+"'")
            if cursor.rowcount == 0: raise ValueError('Config error: no preference for user '+username+' defined in adapter_schema.user_pref')
            if cursor.rowcount > 1: raise ValueError('Config error: more than one preference row for user '+username+' in dapter_schema.user_pref')
            row = cursor.fetchone()
            return row[0]

# This function implements the virtual schema adapter callback
def adapter_call(request):
    # database expects utf-8 encoded string of type str. unicode not yet supported
    root = json.loads(request)
    if root["type"] == "createVirtualSchema":
        if not "properties" in root["schemaMetadataInfo"]: raise ValueError('Config error: required properties: "TABLE_SCHEMA" and "META_CONNECTION" not given')
        if not "TABLE_SCHEMA" in root["schemaMetadataInfo"]["properties"]: raise ValueError('Config error: required property "TABLE_SCHEMA" not given')
        if not "META_CONNECTION" in root["schemaMetadataInfo"]["properties"]: raise ValueError('Config error: required property "META_CONNECTION" not given')
        sn = root["schemaMetadataInfo"]["properties"]["TABLE_SCHEMA"]
        cn = root["schemaMetadataInfo"]["properties"]["META_CONNECTION"]
        res = {
            "type": "createVirtualSchema",
            "schemaMetadata": {"tables":get_meta_for_schema(cn, sn)}
        }
        return json.dumps(res).encode('utf-8')
    elif root["type"] == "dropVirtualSchema":
        return json.dumps({"type": "dropVirtualSchema"}).encode('utf-8')
    elif root["type"] == "setProperties":
        return json.dumps({"type": "setProperties"}).encode('utf-8')
    elif root["type"] == "refresh":
        sn = root["schemaMetadataInfo"]["properties"]["TABLE_SCHEMA"]
        cn = root["schemaMetadataInfo"]["properties"]["META_CONNECTION"]
        return json.dumps({"type": "refresh",
                           "schemaMetadata": {"tables":get_meta_for_schema(cn, sn)}}).encode('utf-8')
    if root["type"] == "getCapabilities":
        return json.dumps({
            "type": "getCapabilities",
            "capabilities": []
            }).encode('utf-8') # database expects utf-8 encoded string of type str. unicode not yet supported.
    elif root["type"] == "pushdown":
        req = root["pushdownRequest"]
        if req["type"] != "select": raise ValueError('Unsupported pushdown type: '+req["type"])
        from_ = req["from"]
        if from_["type"] != "table": raise ValueError('Unsupported pushdown from: '+from_["type"])
        table_ = from_["name"]
        active_filter = ""
        if user_wants_only_active(exa.meta.current_user,root["schemaMetadataInfo"]["properties"]["META_CONNECTION"]):
            active_filter = "and active=True"
        res = {
            "type": "pushdown",
            "sql": "SELECT * FROM "+root["schemaMetadataInfo"]["properties"]["TABLE_SCHEMA"]+"."+table_+" WHERE (userName = current_user or current_user = 'SYS')"+active_filter
        }
        return json.dumps(res).encode('utf-8')
    else:
        raise ValueError('Unsupported callback')
/

Step 5: Create the virtual schema

DROP FORCE VIRTUAL SCHEMA if exists RLS_SCHEMA cascade;
CREATE VIRTUAL SCHEMA RLS_SCHEMA USING adapter_schema.rls_adapter with table_schema='DATA_SCHEMA' META_CONNECTION='SYS_CONNECTION';

Step 6: Explore

SELECT * FROM RLS_SCHEMA.T;

Allow users u1 and u2 to access the information in the virtual schema:

grant select on rls_schema to u1;
grant select on rls_schema to u2;

Take u1's point of view

connect u1/u1;

The original table in DATA_SCHEMA remains inaccessible to U1:

-- [42500] insufficient privileges: SELECT on table T (Session: 1562079848924722784)
select * from DATA_SCHEMA.t;

Only the row with username='U1' and active='TRUE'

select * from RLS_SCHEMA.t;

Now take u2's point of view

connect u2/u2;

Both rows with username='U2' with active either TRUE or FALSE

SELECT * FROM RLS_SCHEMA.T;

Step 7: Metadata to be updated explicitly

Caution: Metadata in virtual tables is cached and needs to be updated explicitly;

connect sys/exasol;

CREATE TABLE data_schema.t2(a1 varchar(100), a2 varchar(100), userName varchar(100), active bool);
INSERT INTO
	data_schema.t2
values
	('a', 'b', 'SYS', true),
	('c', 'd', 'SYS', false),
	('e', 'f', 'U2', true),
	('g', 'h', 'U2', false),
	('i', 'j', 'U1', true),
	('k', 'l', 'U1', false);

The new table is not visible in the RLS_SCHEMA yet

select * from exa_all_tables;

Do the refresh of the meta data

alter virtual schema RLS_SCHEMA REFRESH;

now it is available:

select * from exa_all_tables;

Also joins work as expected

SELECT * FROM RLS_SCHEMA.T join rls_schema.t2 on t.active = t2.active;
Category 1: UDFs and In-Database Analytics - python
Category 2: ETL / Data Integration - Virtual Schema

 Comments   
Comment by Stefan Reich [ 29.08.2017 ]

Updated solution to include creation of the outside-container ws folder.

Generated at Sun Jul 12 14:59:56 CEST 2020 using Jira 7.13.13#713013-sha1:4c0f5f5e3383570393fbbf3d2fd5de1eb4057c36.