Data Validator

DV Supports

  1. Database to Database comparison
  2. CSV to CSV comparison
  3. CSV to Database comparison

Supported databases in DV

  1. MongoDB
  2. Snowflake
  3. MySql
  4. PostqreSql

DV Tags

Tag Name Type Field Type Values Description
<db_config_path> OS path Mandatory if connection details provided in .conf file Path of Database credentials file.
<source_db> String, Single Value, Dictionary Mandatory if DB Comparison Custom, Database name like mysql Type of Source Database
<target_db> String, Single Value, Dictionary Mandatory if DB Comparison Custom, Database name like mysql Type of Target Database
<source_sql> String Mandatory if DB Comparison Source Database Query
<target_sql> String Mandatory if DB Comparison Target Database Query
<keys> String, Comma Seperated Values Mandatory Columns name to be treated as primary keys
<tolerance> Decimal Number Optional Ignores the difference between 2 data up to the value provided in this field
<source_conn> String, Dictionary Mandatory if database comparison Credentials Required for Source Database Connection
<target_conn> String, Dictionary Mandatory if database comparison Credentials Required for Target Database Connection
<source_csv> OS Path, Git Path, S3 Path Mandatory if CSV Comparison Source CSV File Path
<target_csv> OS Path, Git Path, S3 Path Mandatory if CSV Comparison Target CSV File Path
<before_file> String Optional

It contains path of python file having code to modify request object. Path of the file can be relative, absolute, module, GIT url and S3 file.
For Example, path=C:/home/user/gempyp_project/beforeFile.py,class=Before,method=beforeClass and Method are optional in before-file tag

<after_file> String Optional

It contains path of python file having code to modify response object. Path of the file can be relative, absolute, module, GIT url and S3 file.
For Example, path=C:/home/user/gempyp_project/afterFile.py,class=After,method=afterClass and Method are optional in before-file tag

<column_map> Dictionary Optional

This field uses the value of the key as column name while comparison of data. For Example: {”column1”: “col1”, “column2”: “col2”}
While comparing the data of two tables, the “column1” will be renamed as “col1” and “column2” will be renamed as “col2”

<match_case> String, Comma Separated Values Optional It enables case insensitivity while comparing data only for the provided columns.
<cut_out> Number Optional It facilitates termination of execution once the mismatch count exceeds the provided value.
<compare_column> String, Comma Separated values Optional This tag define columns to be compared, other columns are ignored
<round_off> Number Optional It truncates the decimal value after the given number of digits.
<skip_column> String, Comma Separated values Optional This tag define columns to be ignored while comparing
<aside> 🗒️ All the tags are case insensitive. </aside>

Different ways to connect to database server

Scenario 1: Using conf file

db_config_path -- To define connection parameters in the config file(Extension should be .conf)
source_db -- Database name(eg: mysql)
target_db -- Database name(eg: mysql)
source_conn -- source
target_conn -- target

Dummy Data:

<db_config_path>C:/user/dbconnection.conf</db_config_path>
<source_db>mysql<source_db>
<source_conn>mysql connection</source_conn>
<target_db>oracle</target_db>
<target_conn>oracle connection</target_conn>

dbconnection.conf

[mysql connection]
Host=
databaseName=
user=
password=

[oracle connection]
Host=
databaseName=
user=
password=

Scenario 2: Using connection string

source_db: custom
target_db: custom
source_conn: connection string to the database
target_conn: connection string to the database

Dummy Data:
<source_db>custom<source_db>
<source_conn>mysql.connector.connect(host='xxxxxxxxx',user='xxxxxxx', password='xxxxxxxxx',database='xxxxxxxxxxx',port=3306)</source_conn>
<target_db>custom</target_db>
<target_conn>mysql.connector.connect(host='xxxxxxxxx',user='xxxxxxx', password='xxxxxxxxx',database='xxxxxxxxxxx',port=3306)</target_conn>

Scenario 3: Giving connection parameters

source_db: database name(eg: mysql)
target_db: database name(eg: mysql)
source_conn: dict of credentials
target_conn: dict of credentials

Dummy Data:
<source_db>mysql<source_db>
<source_conn>{host='xxxxxxxxx',
user='xxxxxxx',
password='xxxxxxxxx',
database='xxxxxxxxxxx',
port=3306}</source_conn>
<target_db>snowflake</target_db>
<target_conn>{host='xxxxxxxxx',
user='xxxxxxx', password='xxxxxxxxx',
database='xxxxxxxxxxx',
port=3306}</target_conn>

Connection String Samples

  1. Snowflake - snowflake.connector.connect(user="xxxxxxxxxx",password="xxxxxxxxxx",account="xxxx.ap-south-1",warehouse='COMPUTE_WH',database="SNOWFLAKE_SAMPLE_DATA",schema="TPCH_SF100")

  2. MongoDB -
    pymongo.MongoClient(host=['mongodb+srv://xxxxxxxx:xxxxxxxxxxx@moto.bexcm.mongodb.net/test'], connect=True)

  3. MySQL -
    mysql.connector.connect(host='xxxxxxxxx',user='xxxxxxx', password='xxxxxxxxx',database='xxxxxxxxxxx',port=3306)

  4. Postgresql
    pg8000.connect(host='xxxxxxxxx',user='xxxxxxx', password='xxxxxxxxx',database='xxxxxxxxxxx',port=3306)

Deep diving into source and target connection tags

<source_conn> and <target_conn> are defined for database connection .

  • Custom connection string

    Used when we do not use .conf file for defining database connection details

    Value of <source_db>,<target_db> or both tags can be defined as “custom” depending upon the usage.

  • Accepting Dictionary values

    <source_conn>{”host”: “dummyhost”, “username”: “dummyuser”, “password”: “dummy”, “databasename”: “dummyProdDatabase”}</source_conn>

  • Section Name
    In db_config_file, we define connection credentials in sections.

Configuration file for <db_config_path> tag

Configuration file(.conf) contains the credentials for database connection. Multiple database connection details can be provided in one config file. Some

  1. Host- Host name of the database
  2. DatabaseName- Database name for connection establishment
  3. User- User details for connection
  4. Password- Password of user for database connection
[Connection_1]
Host=localhost
databaseName=dummy_database
user=dummy_user
password=dummy_password

[Connection_2]
Host=localhost
databaseName=dummy_database_for_database_2
user=dummy_user_for_database_2
password=dummy_password_for_database_2