Data Mining
Statistical analysis is the process of collecting and scrutinizing every data sample. Statistical analysis gives you opportunities to explore datasets and analyze datasets efficiently. With Infoveave, you can perform statistical analysis on your data to dig deeper with the tool called Data Mining into the data set and analyse it.
The section covers details on
Statistical Analysis
Using the Infoveave Data Mining tool for statistical analysis
- To perform and set up statistical analysis on your dataset based on measures and dimensions defined in datasource click on AnalysisData Mining .
-
- All What-If formula, those created by you or shared with you will be displayed.
- To formulate a fresh formula click on New. The What-If dialog box will display.

- Provide a name for the statistical analysis
- Select the query from the drop-down list and click on Save.
-
- To learn about creating Queries, see here.

You can find the newly added statistical analysis under the tab My Data Mining.
- To execute the analysis, click on the icon.

- You will now be directed to the Add Filter dialog box window. Define the date range and set the conditional filters.
- To proceed with the chosen configuration click on Apply.
The filter option appears only if your SQL queries have the filters applied
Categories in Statistical Analysis
The following sections will explain the four categories of statistical analysis outcomes. The analysis will be categorize in following four tabs:
Statistical Summary
- Infoveave analyzes the dataset and categorizes the summary info into two section.
-
- Numeric
- Under the Numeric section, only numerical variables are displayed. You can visualize these stats by using the column chart and box-plot.
- Categorical
- Under the Categorical section, variable information regarding frequency and proportions are displayed.
- Numeric

- To apply dimensional filters click on Filters.
- To reconfigure the dimension and measures, click on Select Category and click Apply.
- Click to view variable information regarding frequency and proportions according to observations which are displayed in the tabular format and column chart based on frequency.
- In numeric section, click if you want to visualize as column chart. The visualize dialog box will display the Skewness chart.

- Select the binning method from the drop-down list to visualize the Column Chart.
- Infoveave offers four different binning methods for multiple scenarios. These scenarios are Sturges, Square Root, Freedman Diaconis, and Scott.
- Click to Save to save the histogram as a widget or summary as a widget.
Binning is a way to group a number of more or less continuous values into a smaller number of bins.
- In numeric section, click on box-plot if you want to visualize as a box-plot chart. After clicking on the box-plot chart, the visualize dialog box will be displayed.
- Select the dimension from the drop-down list to visualize the box-plot chart.
- Click Save to save visualization as a widget
Terminologies in Statistical Info.
- Unique: A record within a dataset that repeats once in the given dataset sample.
- Min: Values of the least elements of a sample dataset.
- 1st Quartile: Middle number between the smallest number and the median of the dataset.
- Median: Middle number in a sorted (ascending or descending) list of numbers.
- Mode: Most frequently appeared elements in a dataset.
- Mean: Average value of a dataset.
- 3rd Quartile: Middle number between the largest number and the median of the dataset.
- Max: Values of the highest elements of a sample dataset.
- Standard Deviation: Defined as measure of the amount of variation or dispersion of a set of values in a dataset.
- Variance: Defined as the expectation of the squared deviation of a random variable from its mean.
- Skewness: It is a measure of the degree of lopsidedness in the frequency distribution. It indicates the symmetry between both sides of a curve.
- Kurtosis: It is a measure of degree of tailedness in the frequency distribution. It indicates the amount of probability in the tails.
- Variable Name: Defines the variables available to categorize in a dataset.
- Unique: Display the number of unique values available in a dataset.
- Frequency: Number of times events occurred.
- Proportion: Fraction of total that possesses the same attribute.
- Sturges: Technique for determining the size of each group that the data should be separated into, to try give the optimum size.
- Square Root: Square root is used for the ideal bin width or number of bins, changing the bin width or number of bins can radically change the shape of the histogram.
- Freedman Diaconis: Freedman–Diaconis rule can be used to select the size of the bins to be used in a histogram.
- Scott: Scott rule is optimal for random samples of normally distributed data, in the sense that it minimizes the integrated mean squared error of the density estimate.
Correlation
Correlation is a statistical technique that explains the relationship between two or more variables present in the dataset. It identifies the relationship between measures and dimensions. The relationship between variables can be positive or negative.
- Click Correlation tab to check the correlation between the measures through Scatter plot and Bubble chart.
-
- Infoveave analyzes the dataset and visualizes on the chart based on correlation methods. Based on your requirements, you can add or remove the columns.
- Select the type of correlation methods from the drop-down list as either
-
- Pearson correlation: It is the measure of the linear correlation between two variables x and y, it has a value between +1 and -1.
- Spearman: It assesses how well the relationship between two variables can be described using a monotonic function.
- Click Save to save as Scatter Chart or Heat Map
Correlation values can be in-between the range of -1 to +1.
- If the correlation value is positive, that means there is a positive relationship between first measure and second measure i.e. if one measure will increase, other measure will also increase and vice versa.
- If the correlation value is negative, that means there is a negative relationship between first measure and second measure i.e., if one measure will increase, other measure will decrease and vice versa.
Clustering
The classification of items into relative clusters that exhibit comparable traits or patterns in a dataset is known as clustering. When conducting a clustering analysis, you can choose any two measures and also determine the number of clusters.
- Click Clustering tab to start execution of dividing the population or data points into a number of groups known as clusters which inherit the same characteristics through cluster chart.
- Select measures for X-axis and Y-axis of the chart from the drop-down list. Select the number of clusters for the cluster chart. Based on your selection, Infoveave analyzes the dataset and forms that a certain number of clusters in the chart.
- The analysis will project the details on the number of clusters, X-axis, Y-axis and Number of Observations.
By default, Infoveave sets a number of clusters to be 4. You have an option to change clusters based on your needs.
- Click in tabular data to get the pivot view of respective cluster data. This pivot table show details about all records involved in forming that cluster.
- Click Save as widget to save as Cluster chart or Table. Select widget type, assign widget name and click Save to save it as a widget.
Test
- Click on the Test tab and select the test from the drop-down.
- Infoveave supports eight different data mining tests, which are
- Chi-square test: to compare two categorical variables.
- Equal or given proportion test: if the proportions in two or more populations are equal.
- One sample T-test: to determine whether a sample of observations could have been generated by a process with a specific mean.
- Two sample T-test: to compare two independent groups to see if their means are different and larger than the T-score.
- One sample Z-test: to determine whether two population means are different when the variances are known and the sample size is large.
- Two sample Z-test: to compare two sample groups to determine if they have originated from the same population.
- Two sample F-test: to assess whether the variances of two populations are equal or not.
- Two sample K-S test: to test whether two samples come from the same distribution.
Chi Square Test
Chi-Square Test is used to tests for the strength of the association between two categorical variables. The null hypothesis of the Chi-Square test is that no relationship exists on the categorical variables in the population; they are independent.
After selecting a test, select dimensions as category 1, category 2 and measure as frequency from their respective drop-down list. Infoveave will display tabular and cluster charts based on measure and dimensions you have selected.
By default, Infoveave display observed value in tabular and cluster charts. There are four different values to display in visualization:
- Observed: Actual value of the dataset.
- Expected: Predicted value for the dataset.
- Residuals: Unnecessary value of the dataset.
- Contribution: Contributed value in the dataset.
Select based on your requirements.
Infoveave calculates values such as X-squared, degree of freedom (df), p-value which helps you to decide the dependency between two categories.
Equal or Given Proportions Test
Equal or Given Proportions test used to measure whether a specific characteristic is affected by the population or not. Select Equal or Given Proportions test if the proportions in two or more populations are equal.
After selecting a test, select dimensions as category 1, category 2 from their respective drop-down list. Select confidence value, Infoveave set confidence value by default to 0.95. Infoveave will display tabular and cluster charts based on the dimensions and confidence value you have selected.
Infoveave calculates values such as X-squared, degree of freedom (df), p-value which helps you to decide the difference between two categories.
One Sample T-Test
One-Sample T-Test helps to determine whether a sample of observations could have been generated by a process with a specific mean.
After selecting a test, select measure from drop-down list. Select alternative value either Two-Sided, Less, Greater with respect to μ (mu) value. Select confidence value, Infoveave set confidence value by default to 0.95.
Click on Run test, then Infoveave will display tabular and box-plot charts based on the measure, dimensions and confidence value you have selected.
Infoveave calculates values such as T-score, p-value which helps you to accept or reject null hypothesis.
Two Sample T-Test
Two Sample T-Test is used when you want to compare two independent groups to see if their means are different and larger the T-score, the larger the difference is between the groups you are testing.
- Infoveave supports three ways to create samples to execute the test. These are:
- Category: used to create samples based on two different categories.
- Measure: used to create samples based on two different measures.
- Date: used to create samples based on your selected date range and intervals.
- Check Paired checkbox when the one-to-one relationship exists between values in the two data samples.
- If you have selected category to create two samples category-wise, then follow these steps to perform the T-test:
- Select measure from the drop-down on which the mean is calculated.
- Select one dimension as a category and two dimension-items which are used to compare mean between them.
- Select the alternative for computing the statistical significance of a parameter inferred from a data set.
- Select confidence, Mu which you want to compare with the estimated mean. By default, Infoveave sets confidence level value to 0.95.
- Click Run Test if you want to run an analysis on the basis of your configuration.
- After running the test, Infoveave will display observations and means of dimensions based on the measure you have selected, in a tabular format.
- The box-plot chart will display a comparison between two samples based on the common measure.
- If the p-value is greater than 0.05 (or 5 percent), it can be concluded that there is no difference between the means.
- If you have selected measure to create two samples measure-wise, then follow these steps to perform the T-test:
- Select measures from the drop-down to compare the mean of two measures.
- Select one dimension as a category and two dimension-items which are used to compare mean between them.
- Select the alternative for computing the statistical significance of a parameter inferred from a data set.
- Select confidence, Mu which you want to compare with the estimated mean. By default, Infoveave sets confidence level value to 0.95.
- Click Run Test if you want to run an analysis on the basis of your configuration.
- After running the test, Infoveave will display observations and means of dimensions based on the measure you have selected, in a tabular format.
- The box-plot chart will display a comparison between two samples based on the common measure.
- If the p-value less than 0.05, it can be concluded that there is a difference between the means. As larger the T-score, the larger the difference is between the samples you are testing.
- If you have selected date to create two samples date-wise, then follow these steps to perform the T-test:
- Select measure and dimension from the respective drop-down list.
- Select day, month, or year as the interval for the date range.
- Select range 1 and range 2 to calculate the mean between that date range.
- Select the alternative for computing the statistical significance of a parameter inferred from a data set.
- Select confidence, Mu which you want to compare with the estimated mean. By default, Infoveave sets confidence level value to 0.95.
- Click Run Test if you want to run an analysis on the basis of your configuration.
- After running the test, Infoveave will display observations and means of dimensions based on the measure you have selected, in a tabular format.
- The box-plot chart will display a comparison between two samples based on the common measure.
One Sample Z-Test
One sample Z-test to determine whether two population means are different when the variances are known and the sample size is large.
- After selecting a test, select measure from drop-down list. Select alternative value either Two-Sided, Less, Greater with respect to μ (mu) value.
- Select the Confidence value, Infoveave set confidence value by default to 0.95.
- Click on Run test, then Infoveave will display tabular and box-plot charts based on the measure, dimensions and confidence value you have selected.
- Infoveave calculates values such as Z-score, p-value which helps you to accept or reject null hypothesis.
Two Sample Z-Test
Two sample Z-test let you compare two sample groups to determine if they have originated from the same population.
- Infoveave supports three ways to create samples to execute the test. These are:
- Category: used to create samples based on two different categories.
- Measure: used to create samples based on two different measures.
- Date: used to create samples based on your selected date range and intervals.
- Check Paired checkbox when the one-to-one relationship exists between values in the two data samples.
- If you have selected category to create two samples category-wise, then follow these steps to perform the Z-test:
- Select measure from the drop-down on which the mean is calculated.
- Select one dimension as a category and two dimension-items which are used to compare mean between them.
- Select the alternative for computing the statistical significance of a parameter inferred from a data set.
- Select confidence, Mu which you want to compare with the estimated mean. By default, Infoveave sets confidence level value to 0.95.
- Click Run Test if you want to run an analysis on the basis of your configuration.
- After running the test, Infoveave will display observations and means of dimensions based on the measure you have selected, in a tabular format.
- The box-plot chart will display a comparison between two samples based on the common measure.
- If the p-value is greater than 0.05 (or 5 percent), it can be concluded that there is no difference between the means.
- If you have selected measure to create two samples measure-wise, then follow these steps to perform the Z-test:
- Select measures from the drop-down to compare the mean of two measures.
- Select one dimension as a category and two dimension-items which are used to compare mean between them.
- Select the alternative for computing the statistical significance of a parameter inferred from a data set.
- Select confidence, Mu which you want to compare with the estimated mean. By default, Infoveave sets confidence level value to 0.95.
- Click Run Test if you want to run an analysis on the basis of your configuration.
- After running the test, Infoveave will display observations and means of dimensions based on the measure you have selected, in a tabular format.
- The box-plot chart will display a comparison between two samples based on the common measure.
- If the p-value less than 0.05, it can be concluded that there is a difference between the means. As larger the Z-score, the larger the difference is between the samples you are testing.
- If you have selected date to create two samples date-wise, then follow these steps to perform the Z-test:
- Select measure and dimension from the respective drop-down list.
- Select day, month, or year as the interval for the date range.
- Select range 1 and range 2 to calculate the mean between that date range.
- Select the alternative for computing the statistical significance of a parameter inferred from a data set.
- Select confidence, Mu which you want to compare with the estimated mean. By default, Infoveave sets confidence level value to 0.95.
- Click Run Test if you want to run an analysis on the basis of your configuration.
- After running the test, Infoveave will display observations and means of dimensions based on the measure you have selected, in a tabular format.
- The box-plot chart will display a comparison between two samples based on the common measure.
Two Sample F-Test
Select Two Sample F-test to assess whether the variances of two populations (A and B) are equal which gives F-score, that tells the test’s accuracy and Degrees of freedom are the number of values in a study that have the freedom to vary.
- Infoveave supports three ways to create samples to execute the test. These are:
- Category: used to create samples based on two different categories.
- Measure: used to create samples based on two different measures.
- Date: used to create samples based on your selected date range and intervals.
- Check Paired checkbox when the one-to-one relationship exists between values in the two data samples.
- If you have selected category to create two samples category-wise, then follow these steps to perform the F-test:
- Select measure from the drop-down on which the mean is calculated.
- Select one dimension as a category and two dimension-items which are used to compare mean between them.
- Select the alternative for computing the statistical significance of a parameter inferred from a data set.
- Select confidence, Mu which you want to compare with the estimated mean. By default, Infoveave sets confidence level value to 0.95.
- Click Run Test if you want to run an analysis on the basis of your configuration.
- After running the test, Infoveave will display observations and means of dimensions based on the measure you have selected, in a tabular format.
- The box-plot chart will display a comparison between two samples based on the common measure.
- If the p-value is greater than 0.05 (or 5 percent), it can be concluded that there is no difference between the means.
- If you have selected measure to create two samples measure-wise, then follow these steps to perform the F-test:
- Select measures from the drop-down to compare the mean of two measures.
- Select one dimension as a category and two dimension-items which are used to compare mean between them.
- Select the alternative for computing the statistical significance of a parameter inferred from a data set.
- Select confidence, Mu which you want to compare with the estimated mean. By default, Infoveave sets confidence level value to 0.95.
- Click Run Test if you want to run an analysis on the basis of your configuration.
- After running the test, Infoveave will display observations and means of dimensions based on the measure you have selected, in a tabular format.
- The box-plot chart will display a comparison between two samples based on the common measure.
- If the p-value less than 0.05, it can be concluded that there is a difference between the means. As larger the F-score, the larger the difference is between the samples you are testing.
- If you have selected date to create two samples date-wise, then follow these steps to perform the F-test:
- Select measure and dimension from the respective drop-down list.
- Select day, month, or year as the interval for the date range.
- Select range 1 and range 2 to calculate the mean between that date range.
- Select the alternative for computing the statistical significance of a parameter inferred from a data set.
- Select confidence, Mu which you want to compare with the estimated mean. By default, Infoveave sets confidence level value to 0.95.
- Click Run Test if you want to run an analysis on the basis of your configuration.
- After running the test, Infoveave will display observations and means of dimensions based on the measure you have selected, in a tabular format.
- The box-plot chart will display a comparison between two samples based on the common measure.
Two Sample K S-Test
Select Two Sample K-S test to test whether two samples come from the same distribution.
- Infoveave supports three ways to create samples to execute the test. These are:
-
- Category: used to create samples based on two different categories.
- Measure: used to create samples based on two different measures.
- Date: used to create samples based on your selected date range and intervals.
- Check Paired checkbox when the one-to-one relationship exists between values in the two data samples.
- If you have selected category to create two samples category-wise, then follow these steps to perform the K-S-test:
-
- Select measure from the drop-down on which the mean is calculated.
- Select one dimension as a category and two dimension-items which are used to compare mean between them.
- Select the alternative for computing the statistical significance of a parameter inferred from a data set.
- Select confidence, Mu which you want to compare with the estimated mean. By default, Infoveave sets confidence level value to 0.95.
- Click Run Test if you want to run an analysis on the basis of your configuration.
- After running the test, Infoveave will display observations and means of dimensions based on the measure you have selected, in a tabular format.
- The box-plot chart will display a comparison between two samples based on the common measure.
- If the p-value is greater than 0.05 (or 5 percent), it can be concluded that there is no difference between the means.
- If you have selected measure to create two samples measure-wise, then follow these steps to perform the K-S-test:
-
- Select measures from the drop-down to compare the mean of two measures.
- Select one dimension as a category and two dimension-items which are used to compare mean between them.
- Select the alternative for computing the statistical significance of a parameter inferred from a data set.
- Select confidence, Mu which you want to compare with the estimated mean. By default, Infoveave sets confidence level value to 0.95.
- Click Run Test if you want to run an analysis on the basis of your configuration.
- After running the test, Infoveave will display observations and means of dimensions based on the measure you have selected, in a tabular format.
- The box-plot chart will display a comparison between two samples based on the common measure.
- If the p-value less than 0.05, it can be concluded that there is a difference between the means. As larger the K-S-score, the larger the difference is between the samples you are testing.
- If you have selected date to create two samples date-wise, then follow these steps to perform the K-S-test:
-
- Select measure and dimension from the respective drop-down list.
- Select day, month, or year as the interval for the date range.
- Select range 1 and range 2 to calculate the mean between that date range.
- Select the alternative for computing the statistical significance of a parameter inferred from a data set.
- Select confidence, Mu which you want to compare with the estimated mean. By default, Infoveave sets confidence level value to 0.95.
- Click Run Test if you want to run an analysis on the basis of your configuration.
- After running the test, Infoveave will display observations and means of dimensions based on the measure you have selected, in a tabular format.
- The box-plot chart will display a comparison between two samples based on the common measure.
Managing the Statistical Analysis
Statistical analysis can be managed with the options to share, edit, move to a folder, add a description, and delete
- Share the Statistical analysis with the team/role from the share option.
- Choose between the options
- Share with user
- Share with role
- To share with the users, select the “share with users” tab. Pick the user(s) with whom the Statistical analysis needs to be shared from the share dialogue box.
- Choose the “share with role” tab to make a Statistical analysis available to a certain work role. From the dialog box, choose a role.
- Select the users from the presented list of users.
- If you wish to share your Statistical analysis with all, then select Share With Everyone.
- Select the user from the available list and click to share the Statistical analysis.
- To remove a particular user/role from the shared list, select the user/role and click .
Click Save to share your Statistical analysis with the selected users.
- Shared personals receive a notification after the successful share.
- To unshare a Statistical analysis, just unselect users or uncheck the Share With Everyone.
- Edit any of your Statistical analysis with the Edit option.
- To move to a folder, create a folder by clicking the New Folder option, then drag the Statistical analysis into the desired folder using the drag-and-drop feature.
- You may also choose a folder from a drop-down menu by clicking the folder symbol on the Statistical analysis.
- Click “Add Description” and fill out the description details to add a description to the Statistical analysis.
- Select the Editor Type.
- Add necessary Tags (if required).
- Add Description and Content.
- Click Ok
- If you want to delete Statistical analysis, click on Delete .
- Type in “delete‘ in the text box.
- Click Yes.