Data Testing
Prior to proceeding to data analysis, it is imperative to verify your measuring instrument's reliability and validity, not only for the purpose of generalisability of results, but also to check whether the data can accommodate for the application of advanced multivariate techniques like exploratory factor analysis and multiple regression analysis.
This post will mainly focus on
- Reliability
- Validity
- Sampling adequacy
- Normality
Reliability
The reliability of a measure is the degree to which a measurement technique can be depended upon to secure consistent results upon repeated application (Wiener et al., 2017). Reliability also measures the internal consistency of a measuring instrument by the amount of intercorrelation between a set of items. There are four general classes of reliability estimates, namely inter-rater, test-retest, parallel-forms and internal consistency reliability.
At dissertation level, a reliability test is most often used to determine the internal consistency of a measuring instrument, which is administered to a group of people on one occasion to estimate reliability. In fact, the reliability of the instrument is judged by estimating how well the items that reflect the same construct yield similar results, by the amount of intercorrelation between a set of items or variables.
There is a variety of internal consistency measures that can be used, among which figure average inter-item correlation, average item-total correlation and split-half reliability. In split-half reliability, the items that purport to measure the same construct are randomly divided into two sets. The entire instrument is administered to a sample of people and the total score is calculated for each randomly-divided half. The split-half reliability estimate is simply the correlation between these two total scores.
Cronbach's Alpha is mathematically equivalent to the average of all possible split-half estimates. When there are multiple Likert questions in a measuring instrument (like in most quantitatively-oriented questionnaires used in dissertations), the Cronbach alpha coefficient is the most commonly used measure of internal consistency (Laerd Dissertation, 2012). The purpose of such a test is to verify whether these sets showed any form of dimensionality or underlying constructs (Ahmad and Sabri, 2013) in view of applying multivariate techniques. However, it should be noted that Cronbach's alpha does not determine the unidimensionality of a measurement procedure, i.e., that a measurement procedure only measures one construct, rather than being able to distinguish between multiple constructs that are being measured within a measurement procedure (Laerd Dissertation, 2012).
Despite the fact that there is no distinct cut-off point for a reliability coefficient, a Cronbach Alpha value of at least 0.7 is acceptable as a proof of reliability (Abraham and Barker, 2014). In some cases, we may even accept values as low as 0.6 (Malhotra, 2019). Moreover, it was further limited at its upper end by Tavakol and Dennick (2011), who argued that a coefficient exceeding 0.95 might mean that some items in the measuring instrument are redundant.
Note
In SPSS, the Cronbach's Alpha coefficient for a set of items may be computed by the Analyze > Scale > Reliability Analysis... functionality.
Validity
The validity of a measure is the degree to which any measurement approach or instrument succeeds in describing or quantifying what it is designed to measure (Wiener et al., 2017). It reflects those errors in measurement that are systematic or constant.
The question of validity is raised in the context of the three aspects: the form of the test, the purpose of the test and the population for whom it is intended. It would therefore be meaningless to simply ask "Is the test valid?". Instead, it would be more appropriate to ask "How valid is this test for the decision that needs to be made?"
There are several forms of validity, namely content, face, criterion, concurrent and construct validity.
In their seminal study, Haynes et al. (1995, p. 238) defined content validity as "the degree to which elements of an assessment instrument are relevant to a representative of the targeted construct for a particular assessment purpose." In other words, do the questions really assess the construct in question, or are the responses by the person answering the questions influenced by other factors?
Face validity could easily be called surface validity or appearance validity since it is merely a subjective, superficial assessment of whether the measurement procedure used in a study appears to be a valid measure of a given variable or construct. It would not be a surprise if the majority of dissertations rely heavily on face validity (also known as logical validity), typically because it is the easiest form of validity to apply. Face validity is arguably the weakest form of validity, so that many would suggest that it is not a form of validity in the strictest sense of the word.
Criterion validity reflects the use of a criterion (a well-established measurement procedure) to create a new measurement procedure to measure the construct you are interested in. Note that the criterion and the new measurement procedure must be theoretically related.
The concurrent validity of a measurement procedure is assessed when two different measurement procedures are carried out at the same time. Concurrent validity is established when the scores from a new measurement procedure are directly related to the scores from a well-established measurement procedure for the same construct; that is, there is consistent relationship between the scores from the two measurement procedures.
Construct validity refers to how well a measure actually measures the construct it intends to measure and is the ultimate goal when developing an assessment instrument (Utvaer and Haugan, 2016). It is for this reason that construct validity is viewed as a process that assesses the validity of a measurement procedure.
Face and content validity are usually tested, though very subjectively, by piloting your questionnaire and gathering feedback from respondents.
Nako and Barnard (2012) suggested a more scientific and objective way of measuring construct validity in the form of factor validity. A factor analysis of the items is conducted and the significance of Bartlett's test of Sphericity is observed. Factor validity simultaneously measures construct validity and checks for dimensionality of variables in order to confirm internal consistency (Ahmad and Sabri, 2013).
The measuring instrument is deemed to have passed the construct validity test if the Sig. value (p-value) is less than 0.05 for Bartlett's test of Sphericity (Field, 2016).
Note
In SPSS, the output for Bartlett's test of Sphericity may be obtained from the Analyze > Dimension Reduction > Factor... functionality. You may leave the Extraction method as Principal Components and use Varimax Rotation.
Sample Adequacy
The adequacy of a sample is primordial in order to obtain accurate and reliable findings. Sample adequacy may be verified by conducting a factor analysis of items, which requires at least needs 10-15 participants per item, a common rule of thumb. In order to conduct a reliable factor analysis, the sample size needs to be big enough (Costello and Osborne, 2005; Tabachnik and Fidell, 2013; Field, 2016). The smaller the sample, the bigger the chance that the correlation coefficients between items differ from the correlation coefficients between items in other samples (Field, 2016). Thankfully, there is no need to actually conduct a factor analysis in order to determine whether the sample size is adequate. The Kaiser-Meyer-Olkin (KMO) measure of sampling adequacy can signal in advance whether the sample size is large enough to reliably extract factors. A minimum KMO value of 0.5 meant that a sample is adequate (Field, 2016).
Note
In SPSS, the output for the Kaiser-Meyer-Olkin (KMO) measure of sampling adequacy is obtained together with that of Bartlett's test of Sphericity from the Analyze > Dimension Reduction > Factor... functionality (see above).
Normality
Normality testing should be carried out so as to determine whether parametric methods and tests may be used to analyse data. According to Laerd Statistics (2018), for sample sizes that are less than 2000, the Shapiro-Wilk test should be used instead of the Kolmogorov-Smirnov.
Note
In SPSS, normality test results (both the Kolmogorov-Smirnov and Shapiro-Wilk) may be obtained by using the Analyze > Descriptive Statistics > Explore... functionality. Insert the variable for which you want to test the normality in the Dependent List box and check Normality plots with tests under Plots...
References
Abraham, J and Barker, K (2014) “Exploring gender difference in motivation, engagement and enrolment behaviour of senior secondary physics students in New South Wales”, Research in Science Education, Vol. 45, No. 1, pp. 59-73.
Ahmad, NS and Sabri, A (2013) “Assessing the unidimensionality, reliability, validity and fitness of influential factors of 8th grades student's Mathematics achievement in Malaysia”, International Journal of Advance Research, Vol. 1, No. 2, pp. 1-7.
Costello, AB. and Osborne, JW (2005) “Best Practices in Exploratory Factor Analysis: Four Recommendations for Getting the Most from Your Analysis”, Practical Assessment, Research and Evaluation, Vol. 10, No. 1, pp. 1-9.
Field, A (2016) Discovering Statistics Using IBM SPSS Statistics (4th edn), Sage Publications Ltd, London.
Haynes, SN, Richard, DCS and Kubany, ES (1995) “Content validity in psychological assessment: A functional approach to concepts and methods”, Psychological Assessment, Vol. 7, No. 3, pp. 238-247.
Laerd Dissertation (2012) “Reliability in Research” [online] Available from http://dissertation.laerd.com/reliability-in-research-p3.php>
Laerd Statistics (2018) “Testing for Normality using SPSS Statistics” [online] Available from https://statistics.laerd.com/spss-tutorials/testing-for-normality-using-spss-statistics.php>
Malhotra, NK (2019) Marketing research: An applied orientation (7th edn), Pearson/Prentice Hall, Upper Saddle River, NJ.
Nako, Z and Barnard, A (2012) “Construct validity of competency dimensions in a leadership assessment and development centre”, African Journal of Business Management, Vol. 6, No. 34, pp. 9730-9737.
Tabachnick, BG and Fidell, LS (2013) Using Multivariate Statistics (6th edn), Pearson Education, Boston.
Tavakol, M and Dennick, R (2011) “Making sense of Cronbach's Alpha”, International Journal of Medical Education, Vol. 2, pp. 53-55.
Utvaer, BKS and Haugan, G (2016) “The Academic Motivation Scale: Dimensionality, Reliability, and Construct Validity Among Vocational Students”, Nordic Journal of Vocational Education and Training, Vol. 6, No. 2, pp. 17-45.
Wiener, BJ, Lewis, CC, Stanick, C, Powell, BJ, Dorsey, CN, Clary, AS, Boynton, MH and Halko, H (2017) “Psychometric assessment of three newly developed implementation outcome measures”, Implementation Science, Vol. 12, No. 1, pp. 1-12.