Creating Data Quality
This section provides users with comprehensive guidance and support on navigating and utilizing the Infoveave Data Quality creation features effectively.
Creating Data Quality Manually
- The process begins by clicking the New Data Quality button.
- This action opens a dialog where you can select a specific data quality type (e.g., Data Quality using AI or Data Quality) before proceeding to the next step.
Selecting Database and Table
- After selecting the data quality type and clicking “Continue,” the next screen allows you to choose the database and table to be used for the data quality checks.
- In case the desired database is not available, there is an option to add a new connection.
Adding New Connection
- By clicking the “Add new connection” button, a window pops up displaying a list of available databases such as SQLite, DuckDb, Microsoft SQL, Oracle, MySQL, PostgreSQL, MariaDB, and more.
- You can select the database to connect to and configure the connection accordingly.
To know more about Connections visit Introducing Connections
- After choosing the database, you can select the relevant table to perform the data quality checks. Once the database and table are selected, clicking “Next” moves you to the next step.
Data Quality Configuration
The Infoveave Data Quality configuration, provides a structured layout to guide you through the process effortlessly. This layout consists of three main panels, each contributing significantly to the overall working of Data Quality.
Column Panel
Search
The search bar at the top-left corner allows users to quickly find specific columns from the list. By entering the name or part of the name of a column, users can efficiently locate and select the columns they wish to apply data quality rules to.
List of columns
This panel displays a comprehensive list of all columns available within the selected dataset. It includes various attributes from the dataset, which can be selected to apply relevant data quality checks. These fields represent the core data elements that can be validated for consistency, accuracy, and completeness.
Rule Designer
Add Rule
To know more about data quality checks visit Data Quality Dimensions.
Fit view
Fit view adjusts your data quality grids within the screens width and height.
Delete
If you want to delete your data quality, first select it and click on delete button.
Rule view
Enabling the rule view displays the cards according to data quality checks.
Setup Panel
Setup
In setup section enter the name and description.
Configuration
Rule Type This section allows the user to select the type of rule to be applied. In this case, it’s set to a “Lookup static values check,” which is used to validate whether a field contains values that match a predefined list of valid values.
Rule Name Here, the user can provide a name for the rule. This name helps identify and distinguish the rule for future reference or reporting. For example, “Valid Product Names” is the name of the rule in this case.
Static Values This section lists the predefined values that the data will be validated against. Users can input or select the acceptable values, and the data will be checked to ensure it matches any of these options. This ensures that the data adheres to a set of known valid values.
Case Sensitive This toggle enables or disables case sensitivity when validating values against the static list. When activated, the rule will differentiate between upper and lower case characters; if deactivated, it will treat values as case-insensitive.
Success Criteria This section defines the threshold for the rule to pass. The user can specify a condition, such as “Greater than,” and set a numerical value (e.g., 50). This helps determine when the rule is considered successful based on the input data.
Allow Nulls This toggle option allows the user to decide whether null or empty values should be allowed during the validation process. When enabled, the rule will accept null values as valid; when disabled, null values will trigger a validation failure.
Description Here, the user can provide a description for the rule. This is used to explain the purpose or the logic behind the rule, helping users understand its intent. In this case, it explains that the rule checks if the product name column only contains valid product names.
Each section is designed to customize the rule’s behavior to ensure the data conforms to specific quality standards and requirements.
Schedule
Current Schedules This section displays any existing schedules for tasks, giving the user a clear overview of previously set schedules. It serves as a reference point to view or edit current configurations.
Add Schedule This is where the user can configure a new schedule. It includes multiple options to define when and how frequently the task will be executed.
Time Zone This setting allows the user to specify the time zone in which the scheduled task will be executed. The example shows “Asia/Kolkata,” indicating that the task will follow the Indian Standard Time (IST).
Pick Time This field lets the user define the specific time of day when the task should run. In the example, it is set to 06:00, which means the task will execute at 6:00 AM in the selected time zone.
Days of the Week This option allows the user to select which days of the week the task should run. “Every” is selected here, indicating that the task will run every day of the week, though the user can choose specific days if needed.
Days of the Month Similar to the days of the week, this setting allows the user to specify on which days of the month the task should run. In this case, it is set to “Every,” meaning it will run on all days of the month.
Months This option lets the user set in which months the task should run. Again, “Every” is selected, meaning the task is scheduled to run every month.
Do Not Run on Holidays This checkbox option ensures that the scheduled task will not run on holidays. If checked, the system will recognize and skip holidays to avoid conflicting with non-working days.
At (specific time) This field confirms the exact time of execution for the scheduled task, providing clarity on when the task will take place (in this case, at 06:00 AM).
Execute on Sample Data This checkbox option allows the user to run the task on sample data first, before executing it on the full dataset. This feature is useful for testing the task’s behavior in a controlled environment.
Add The final button that, when clicked, saves and applies the schedule configuration. The user will set the parameters above and then use this button to activate the scheduled task.
This scheduling section provides a comprehensive way to set up automated tasks with flexible timing and customization options to fit various operational needs.
Validation Results
- Once the validation is complete, you will see the results in a green table showing the success percentage and other details about the rule configurations. If needed, you can tweak the rules and validate again.
Saving and Executing
- After validating the rules, you can click Save to store the configuration.
- The final step is clicking Execute, which runs the configured data quality checks on the selected database and table.
- The execution of the data quality check displays the results in a table, showing various details such as success rates for different rules. The user can monitor the progress and ensure that the data quality checks are successfully completed.
Creating Data Quality Using AI
- The process begins by clicking the New Data Quality button.
- This action opens a dialog where you can select a specific data quality type (e.g., Data Quality using AI or Data Quality) before proceeding to the next step.
Selecting Database and Table
- After selecting the data quality type and clicking “Continue,” the next screen allows you to choose the database and table to be used for the data quality checks.
- In case the desired database is not available, there is an option to add a new connection.
Adding New Connection
- By clicking the “Add new connection” button, a window pops up displaying a list of available databases such as SQLite, DuckDb, Microsoft SQL, Oracle, MySQL, PostgreSQL, MariaDB, and more.
- You can select the database to connect to and configure the connection accordingly.
- After choosing the database, you can select the relevant table to perform the data quality checks. Once the database and table are selected, clicking “Next” moves you to the next step.
Data Quality Rules
- After clicking on next, this popup appears . Here we get AI generated description . We can share data with AI and include Catalogue Information with AI . We also have option to select documents which helps us in getting some information.
- After entering the fields click on Generate Data Quality Rules button.
- We get all the data quality rules for the selected table, focusing on validity, consistency, and relationships between columns generated by AI.
- Click on Generate Data Quality button.
AI generated Rules
- AI generates the data quality rules for the selected dataset, ensuring that the data meets specified standards and is ready for further analysis. Each rule is accompanied by a citation, explaining its purpose, such as ensuring uniqueness, non-null values, or the correct format for columns like IDs or dates. The relevant column for each rule is specified, ensuring that the validation is applied accurately. Additionally, each rule is assigned a unique name and a rule type.Once reviewed, the rules can be applied to the data by selecting the Generate Data Quality option, which enforces the specified checks across the dataset.
Validation Results
- Name and description is genrated by AI automatically.
Saving and Executing
- After validating the rules, you can click Save to store the configuration.
- The final step is clicking Execute, which runs the configured data quality checks on the selected database and table.
- The execution of the data quality check displays the results in a table, showing various details such as success rates for different rules. The user can monitor the progress and ensure that the data quality checks are successfully completed.