Data Mining
Statistical analysis is the process of collecting and scrutinizing every data sample. Statistical analysis gives you opportunities to explore datasets and analyze datasets efficiently. With Infoveave, you can perform statistical analysis on your data to dig deeper with the tool called Data Mining into the data set and analyse it.
The section covers details on
Statistical Analysis
Using the Infoveave Data Mining tool for statistical analysis
 To perform and set up statistical analysis on your dataset based on measures and dimensions defined in datasource click on AnalysisData Mining .

 All WhatIf formula, those created by you or shared with you will be displayed.
 To formulate a fresh formula click on New. The WhatIf dialog box will display.
 Provide a name for the statistical analysis
 Select the query from the dropdown list and click on Save.

 To learn about creating Queries, see here.
You can find the newly added statistical analysis under the tab My Data Mining.
 To execute the analysis, click on the icon.
 You will now be directed to the Add Filter dialog box window. Define the date range and set the conditional filters.
 To proceed with the chosen configuration click on Apply.
The filter option appears only if your SQL queries have the filters applied
Categories in Statistical Analysis
The following sections will explain the four categories of statistical analysis outcomes. The analysis will be categorize in following four tabs:
Statistical Summary
 Infoveave analyzes the dataset and categorizes the summary info into two section.

 Numeric
 Under the Numeric section, only numerical variables are displayed. You can visualize these stats by using the column chart and boxplot.
 Categorical
 Under the Categorical section, variable information regarding frequency and proportions are displayed.
 Numeric
 To apply dimensional filters click on Filters.
 To reconfigure the dimension and measures, click on Select Category and click Apply.
 Click to view variable information regarding frequency and proportions according to observations which are displayed in the tabular format and column chart based on frequency.
 In numeric section, click if you want to visualize as column chart. The visualize dialog box will display the Skewness chart.
 Select the binning method from the dropdown list to visualize the Column Chart.
 Infoveave offers four different binning methods for multiple scenarios. These scenarios are Sturges, Square Root, Freedman Diaconis, and Scott.
 Click to Save to save the histogram as a widget or summary as a widget.
Binning is a way to group a number of more or less continuous values into a smaller number of bins.
 In numeric section, click on boxplot if you want to visualize as a boxplot chart. After clicking on the boxplot chart, the visualize dialog box will be displayed.
 Select the dimension from the dropdown list to visualize the boxplot chart.
 Click Save to save visualization as a widget
Terminologies in Statistical Info.
 Unique: A record within a dataset that repeats once in the given dataset sample.
 Min: Values of the least elements of a sample dataset.
 1st Quartile: Middle number between the smallest number and the median of the dataset.
 Median: Middle number in a sorted (ascending or descending) list of numbers.
 Mode: Most frequently appeared elements in a dataset.
 Mean: Average value of a dataset.
 3rd Quartile: Middle number between the largest number and the median of the dataset.
 Max: Values of the highest elements of a sample dataset.
 Standard Deviation: Defined as measure of the amount of variation or dispersion of a set of values in a dataset.
 Variance: Defined as the expectation of the squared deviation of a random variable from its mean.
 Skewness: It is a measure of the degree of lopsidedness in the frequency distribution. It indicates the symmetry between both sides of a curve.
 Kurtosis: It is a measure of degree of tailedness in the frequency distribution. It indicates the amount of probability in the tails.
 Variable Name: Defines the variables available to categorize in a dataset.
 Unique: Display the number of unique values available in a dataset.
 Frequency: Number of times events occurred.
 Proportion: Fraction of total that possesses the same attribute.
 Sturges: Technique for determining the size of each group that the data should be separated into, to try give the optimum size.
 Square Root: Square root is used for the ideal bin width or number of bins, changing the bin width or number of bins can radically change the shape of the histogram.
 Freedman Diaconis: Freedman–Diaconis rule can be used to select the size of the bins to be used in a histogram.
 Scott: Scott rule is optimal for random samples of normally distributed data, in the sense that it minimizes the integrated mean squared error of the density estimate.
Correlation
Correlation is a statistical technique that explains the relationship between two or more variables present in the dataset. It identifies the relationship between measures and dimensions. The relationship between variables can be positive or negative.
 Click Correlation tab to check the correlation between the measures through Scatter plot and Bubble chart.

 Infoveave analyzes the dataset and visualizes on the chart based on correlation methods. Based on your requirements, you can add or remove the columns.
 Select the type of correlation methods from the dropdown list as either

 Pearson correlation: It is the measure of the linear correlation between two variables x and y, it has a value between +1 and 1.
 Spearman: It assesses how well the relationship between two variables can be described using a monotonic function.
 Click Save to save as Scatter Chart or Heat Map
Correlation values can be inbetween the range of 1 to +1.
 If the correlation value is positive, that means there is a positive relationship between first measure and second measure i.e. if one measure will increase, other measure will also increase and vice versa.
 If the correlation value is negative, that means there is a negative relationship between first measure and second measure i.e., if one measure will increase, other measure will decrease and vice versa.
Clustering
The classification of items into relative clusters that exhibit comparable traits or patterns in a dataset is known as clustering. When conducting a clustering analysis, you can choose any two measures and also determine the number of clusters.
 Click Clustering tab to start execution of dividing the population or data points into a number of groups known as clusters which inherit the same characteristics through cluster chart.
 Select measures for Xaxis and Yaxis of the chart from the dropdown list. Select the number of clusters for the cluster chart. Based on your selection, Infoveave analyzes the dataset and forms that a certain number of clusters in the chart.
 The analysis will project the details on the number of clusters, Xaxis, Yaxis and Number of Observations.
By default, Infoveave sets a number of clusters to be 4. You have an option to change clusters based on your needs.
 Click in tabular data to get the pivot view of respective cluster data. This pivot table show details about all records involved in forming that cluster.
 Click Save as widget to save as Cluster chart or Table. Select widget type, assign widget name and click Save to save it as a widget.
Test
 Click on the Test tab and select the test from the dropdown.
 Infoveave supports eight different data mining tests, which are
 Chisquare test: to compare two categorical variables.
 Equal or given proportion test: if the proportions in two or more populations are equal.
 One sample Ttest: to determine whether a sample of observations could have been generated by a process with a specific mean.
 Two sample Ttest: to compare two independent groups to see if their means are different and larger than the Tscore.
 One sample Ztest: to determine whether two population means are different when the variances are known and the sample size is large.
 Two sample Ztest: to compare two sample groups to determine if they have originated from the same population.
 Two sample Ftest: to assess whether the variances of two populations are equal or not.
 Two sample KS test: to test whether two samples come from the same distribution.
Chi Square Test
ChiSquare Test is used to tests for the strength of the association between two categorical variables. The null hypothesis of the ChiSquare test is that no relationship exists on the categorical variables in the population; they are independent.
After selecting a test, select dimensions as category 1, category 2 and measure as frequency from their respective dropdown list. Infoveave will display tabular and cluster charts based on measure and dimensions you have selected.
By default, Infoveave display observed value in tabular and cluster charts. There are four different values to display in visualization:
 Observed: Actual value of the dataset.
 Expected: Predicted value for the dataset.
 Residuals: Unnecessary value of the dataset.
 Contribution: Contributed value in the dataset.
Select based on your requirements.
Infoveave calculates values such as Xsquared, degree of freedom (df), pvalue which helps you to decide the dependency between two categories.
Equal or Given Proportions Test
Equal or Given Proportions test used to measure whether a specific characteristic is affected by the population or not. Select Equal or Given Proportions test if the proportions in two or more populations are equal.
After selecting a test, select dimensions as category 1, category 2 from their respective dropdown list. Select confidence value, Infoveave set confidence value by default to 0.95. Infoveave will display tabular and cluster charts based on the dimensions and confidence value you have selected.
Infoveave calculates values such as Xsquared, degree of freedom (df), pvalue which helps you to decide the difference between two categories.
One Sample TTest
OneSample TTest helps to determine whether a sample of observations could have been generated by a process with a specific mean.
After selecting a test, select measure from dropdown list. Select alternative value either TwoSided, Less, Greater with respect to μ (mu) value. Select confidence value, Infoveave set confidence value by default to 0.95.
Click on Run test, then Infoveave will display tabular and boxplot charts based on the measure, dimensions and confidence value you have selected.
Infoveave calculates values such as Tscore, pvalue which helps you to accept or reject null hypothesis.
Two Sample TTest
Two Sample TTest is used when you want to compare two independent groups to see if their means are different and larger the Tscore, the larger the difference is between the groups you are testing.
 Infoveave supports three ways to create samples to execute the test. These are:
 Category: used to create samples based on two different categories.
 Measure: used to create samples based on two different measures.
 Date: used to create samples based on your selected date range and intervals.
 Check Paired checkbox when the onetoone relationship exists between values in the two data samples.
 If you have selected category to create two samples categorywise, then follow these steps to perform the Ttest:
 Select measure from the dropdown on which the mean is calculated.
 Select one dimension as a category and two dimensionitems which are used to compare mean between them.
 Select the alternative for computing the statistical significance of a parameter inferred from a data set.
 Select confidence, Mu which you want to compare with the estimated mean. By default, Infoveave sets confidence level value to 0.95.
 Click Run Test if you want to run an analysis on the basis of your configuration.
 After running the test, Infoveave will display observations and means of dimensions based on the measure you have selected, in a tabular format.
 The boxplot chart will display a comparison between two samples based on the common measure.
 If the pvalue is greater than 0.05 (or 5 percent), it can be concluded that there is no difference between the means.
 If you have selected measure to create two samples measurewise, then follow these steps to perform the Ttest:
 Select measures from the dropdown to compare the mean of two measures.
 Select one dimension as a category and two dimensionitems which are used to compare mean between them.
 Select the alternative for computing the statistical significance of a parameter inferred from a data set.
 Select confidence, Mu which you want to compare with the estimated mean. By default, Infoveave sets confidence level value to 0.95.
 Click Run Test if you want to run an analysis on the basis of your configuration.
 After running the test, Infoveave will display observations and means of dimensions based on the measure you have selected, in a tabular format.
 The boxplot chart will display a comparison between two samples based on the common measure.
 If the pvalue less than 0.05, it can be concluded that there is a difference between the means. As larger the Tscore, the larger the difference is between the samples you are testing.
 If you have selected date to create two samples datewise, then follow these steps to perform the Ttest:
 Select measure and dimension from the respective dropdown list.
 Select day, month, or year as the interval for the date range.
 Select range 1 and range 2 to calculate the mean between that date range.
 Select the alternative for computing the statistical significance of a parameter inferred from a data set.
 Select confidence, Mu which you want to compare with the estimated mean. By default, Infoveave sets confidence level value to 0.95.
 Click Run Test if you want to run an analysis on the basis of your configuration.
 After running the test, Infoveave will display observations and means of dimensions based on the measure you have selected, in a tabular format.
 The boxplot chart will display a comparison between two samples based on the common measure.
One Sample ZTest
One sample Ztest to determine whether two population means are different when the variances are known and the sample size is large.
 After selecting a test, select measure from dropdown list. Select alternative value either TwoSided, Less, Greater with respect to μ (mu) value.
 Select the Confidence value, Infoveave set confidence value by default to 0.95.
 Click on Run test, then Infoveave will display tabular and boxplot charts based on the measure, dimensions and confidence value you have selected.
 Infoveave calculates values such as Zscore, pvalue which helps you to accept or reject null hypothesis.
Two Sample ZTest
Two sample Ztest let you compare two sample groups to determine if they have originated from the same population.
 Infoveave supports three ways to create samples to execute the test. These are:
 Category: used to create samples based on two different categories.
 Measure: used to create samples based on two different measures.
 Date: used to create samples based on your selected date range and intervals.
 Check Paired checkbox when the onetoone relationship exists between values in the two data samples.
 If you have selected category to create two samples categorywise, then follow these steps to perform the Ztest:
 Select measure from the dropdown on which the mean is calculated.
 Select one dimension as a category and two dimensionitems which are used to compare mean between them.
 Select the alternative for computing the statistical significance of a parameter inferred from a data set.
 Select confidence, Mu which you want to compare with the estimated mean. By default, Infoveave sets confidence level value to 0.95.
 Click Run Test if you want to run an analysis on the basis of your configuration.
 After running the test, Infoveave will display observations and means of dimensions based on the measure you have selected, in a tabular format.
 The boxplot chart will display a comparison between two samples based on the common measure.
 If the pvalue is greater than 0.05 (or 5 percent), it can be concluded that there is no difference between the means.
 If you have selected measure to create two samples measurewise, then follow these steps to perform the Ztest:
 Select measures from the dropdown to compare the mean of two measures.
 Select one dimension as a category and two dimensionitems which are used to compare mean between them.
 Select the alternative for computing the statistical significance of a parameter inferred from a data set.
 Select confidence, Mu which you want to compare with the estimated mean. By default, Infoveave sets confidence level value to 0.95.
 Click Run Test if you want to run an analysis on the basis of your configuration.
 After running the test, Infoveave will display observations and means of dimensions based on the measure you have selected, in a tabular format.
 The boxplot chart will display a comparison between two samples based on the common measure.
 If the pvalue less than 0.05, it can be concluded that there is a difference between the means. As larger the Zscore, the larger the difference is between the samples you are testing.
 If you have selected date to create two samples datewise, then follow these steps to perform the Ztest:
 Select measure and dimension from the respective dropdown list.
 Select day, month, or year as the interval for the date range.
 Select range 1 and range 2 to calculate the mean between that date range.
 Select the alternative for computing the statistical significance of a parameter inferred from a data set.
 Select confidence, Mu which you want to compare with the estimated mean. By default, Infoveave sets confidence level value to 0.95.
 Click Run Test if you want to run an analysis on the basis of your configuration.
 After running the test, Infoveave will display observations and means of dimensions based on the measure you have selected, in a tabular format.
 The boxplot chart will display a comparison between two samples based on the common measure.
Two Sample FTest
Select Two Sample Ftest to assess whether the variances of two populations (A and B) are equal which gives Fscore, that tells the test’s accuracy and Degrees of freedom are the number of values in a study that have the freedom to vary.
 Infoveave supports three ways to create samples to execute the test. These are:
 Category: used to create samples based on two different categories.
 Measure: used to create samples based on two different measures.
 Date: used to create samples based on your selected date range and intervals.
 Check Paired checkbox when the onetoone relationship exists between values in the two data samples.
 If you have selected category to create two samples categorywise, then follow these steps to perform the Ftest:
 Select measure from the dropdown on which the mean is calculated.
 Select one dimension as a category and two dimensionitems which are used to compare mean between them.
 Select the alternative for computing the statistical significance of a parameter inferred from a data set.
 Select confidence, Mu which you want to compare with the estimated mean. By default, Infoveave sets confidence level value to 0.95.
 Click Run Test if you want to run an analysis on the basis of your configuration.
 After running the test, Infoveave will display observations and means of dimensions based on the measure you have selected, in a tabular format.
 The boxplot chart will display a comparison between two samples based on the common measure.
 If the pvalue is greater than 0.05 (or 5 percent), it can be concluded that there is no difference between the means.
 If you have selected measure to create two samples measurewise, then follow these steps to perform the Ftest:
 Select measures from the dropdown to compare the mean of two measures.
 Select one dimension as a category and two dimensionitems which are used to compare mean between them.
 Select the alternative for computing the statistical significance of a parameter inferred from a data set.
 Select confidence, Mu which you want to compare with the estimated mean. By default, Infoveave sets confidence level value to 0.95.
 Click Run Test if you want to run an analysis on the basis of your configuration.
 After running the test, Infoveave will display observations and means of dimensions based on the measure you have selected, in a tabular format.
 The boxplot chart will display a comparison between two samples based on the common measure.
 If the pvalue less than 0.05, it can be concluded that there is a difference between the means. As larger the Fscore, the larger the difference is between the samples you are testing.
 If you have selected date to create two samples datewise, then follow these steps to perform the Ftest:
 Select measure and dimension from the respective dropdown list.
 Select day, month, or year as the interval for the date range.
 Select range 1 and range 2 to calculate the mean between that date range.
 Select the alternative for computing the statistical significance of a parameter inferred from a data set.
 Select confidence, Mu which you want to compare with the estimated mean. By default, Infoveave sets confidence level value to 0.95.
 Click Run Test if you want to run an analysis on the basis of your configuration.
 After running the test, Infoveave will display observations and means of dimensions based on the measure you have selected, in a tabular format.
 The boxplot chart will display a comparison between two samples based on the common measure.
Two Sample K STest
Select Two Sample KS test to test whether two samples come from the same distribution.
 Infoveave supports three ways to create samples to execute the test. These are:

 Category: used to create samples based on two different categories.
 Measure: used to create samples based on two different measures.
 Date: used to create samples based on your selected date range and intervals.
 Check Paired checkbox when the onetoone relationship exists between values in the two data samples.
 If you have selected category to create two samples categorywise, then follow these steps to perform the KStest:

 Select measure from the dropdown on which the mean is calculated.
 Select one dimension as a category and two dimensionitems which are used to compare mean between them.
 Select the alternative for computing the statistical significance of a parameter inferred from a data set.
 Select confidence, Mu which you want to compare with the estimated mean. By default, Infoveave sets confidence level value to 0.95.
 Click Run Test if you want to run an analysis on the basis of your configuration.
 After running the test, Infoveave will display observations and means of dimensions based on the measure you have selected, in a tabular format.
 The boxplot chart will display a comparison between two samples based on the common measure.
 If the pvalue is greater than 0.05 (or 5 percent), it can be concluded that there is no difference between the means.
 If you have selected measure to create two samples measurewise, then follow these steps to perform the KStest:

 Select measures from the dropdown to compare the mean of two measures.
 Select one dimension as a category and two dimensionitems which are used to compare mean between them.
 Select the alternative for computing the statistical significance of a parameter inferred from a data set.
 Select confidence, Mu which you want to compare with the estimated mean. By default, Infoveave sets confidence level value to 0.95.
 Click Run Test if you want to run an analysis on the basis of your configuration.
 After running the test, Infoveave will display observations and means of dimensions based on the measure you have selected, in a tabular format.
 The boxplot chart will display a comparison between two samples based on the common measure.
 If the pvalue less than 0.05, it can be concluded that there is a difference between the means. As larger the KSscore, the larger the difference is between the samples you are testing.
 If you have selected date to create two samples datewise, then follow these steps to perform the KStest:

 Select measure and dimension from the respective dropdown list.
 Select day, month, or year as the interval for the date range.
 Select range 1 and range 2 to calculate the mean between that date range.
 Select the alternative for computing the statistical significance of a parameter inferred from a data set.
 Select confidence, Mu which you want to compare with the estimated mean. By default, Infoveave sets confidence level value to 0.95.
 Click Run Test if you want to run an analysis on the basis of your configuration.
 After running the test, Infoveave will display observations and means of dimensions based on the measure you have selected, in a tabular format.
 The boxplot chart will display a comparison between two samples based on the common measure.
Managing the Statistical Analysis
Statistical analysis can be managed with the options to share, edit, move to a folder, add a description, and delete
 Share the Statistical analysis with the team/role from the share option.
 Choose between the options
 Share with user
 Share with role
 To share with the users, select the “share with users” tab. Pick the user(s) with whom the Statistical analysis needs to be shared from the share dialogue box.
 Choose the “share with role” tab to make a Statistical analysis available to a certain work role. From the dialog box, choose a role.
 Select the users from the presented list of users.
 If you wish to share your Statistical analysis with all, then select Share With Everyone.
 Select the user from the available list and click to share the Statistical analysis.
 To remove a particular user/role from the shared list, select the user/role and click .
Click Save to share your Statistical analysis with the selected users.
 Shared personals receive a notification after the successful share.
 To unshare a Statistical analysis, just unselect users or uncheck the Share With Everyone.
 Edit any of your Statistical analysis with the Edit option.
 To move to a folder, create a folder by clicking the New Folder option, then drag the Statistical analysis into the desired folder using the draganddrop feature.
 You may also choose a folder from a dropdown menu by clicking the folder symbol on the Statistical analysis.
 Click “Add Description” and fill out the description details to add a description to the Statistical analysis.
 Select the Editor Type.
 Add necessary Tags (if required).
 Add Description and Content.
 Click Ok
 If you want to delete Statistical analysis, click on Delete .
 Type in “delete‘ in the text box.
 Click Yes.