The validity and reliability of the multiple mini-interview in assessing the capabilities of nursing education PhD candidates: A methodological study

Objective : For a long time, the Single-Station Personal Interview (SSPI) used for nursing PhD admissions has merely evaluated the candidates’ cognitive skills. A new evolution emerged in exams by the introduction of effective evaluation methods such as the Multiple Mini-Interview (MMI) and the emphasis on non-cognitive skills. The aim of study was to determine the validity and reliability of the MMI in evaluating PhD candidates’ competences. Materials and methods : This methodological study was conducted in September 2015 on PhD candidates in Golestan University of Medical Sciences, Iran. 38 nursing PhD candidates who had passed the first stage of the PhD exam and intend for interview, were recruited through census sampling for the second stage. The data was gathered using checklist based on the proposed and headings of the Ministry of Health and Medical Education in six stations with topics on ethical judgments, psychomotor skills, communication skills, rational reasoning and critical thinking. The face and content validity of the tool were assessed by an expert panel. The criterion validity was determined by measuring the correlation between the mean score of the first and the second stages. The construct validity was evaluated by determining the relationship between the mean score of each station and the mean score of MMI. The raters’ agreement was used to assess the reliability. Results : The face and content validity of the tool were approved by the panel of experts. The highest mean score pertained to the thesis station (74.25±9.10). The finding of criterion validity assessing show that there was not significant correlation between the first-stage written exam scores and the MMI scores (r=0.22, P=0.18). There was a significant relationship between the mean score of all stations and MMI in the evaluation of the construct validity (P<0.001). The rater reliability showed an inter-rater agreement at five stations (P<0.05). Conclusions : The validity and reliability of the MMI were approved for nursing PhD exams and PhD interviews held through this method can ensure the consistency and accuracy of postgraduate admissions.


Introduction
Internationally, there has been a gradual increase in the number students enrolling into postgraduate programs 1 . For many years, the PhD exam held for medical admissions has only assessed the knowledge and cognitive capabilities candidates by a theoretical approach. The interview for candidates after accepting in the written exam was held through the Panel Interview (PI) or board interview or the Single-Station Personal Interview (SSPI) methods. Although this traditional method of interviewing was very fastpaced and led to immediate results, it did not examine non-cognitive issues. Studies showed that panel or single-station interviews cannot examine candidates' competency and non-cognitive skills such as critical thinking, communication, and ability for search. Additionally, in this type of interview, the exam blueprint for exam is not clear and this lack of clarity is due to cognitive bias for making decision about some candidate. The bias such as the halo effect and similar-to-me, can lead to choose eligible candidates [1][2] . Recent research in the US has shown that, the measurement of subjects such as integrity, empathy, ethical judgment and professionalism is fundamental in health and medical programs. Traditional interviews will neither be realistic nor will they lead to professionalization without considering these issues into account 3 . Moreover, the mere use of cognitive skills evaluation in traditional methods for admitting postgraduate students, such as PhD students, will cause serious bias 4 . As a result, more efficient and modern approaches were introduced into the field of education, including the effective method called the Multiple Mini-Interview (MMI). The MMI was designed in 2001 at McMaster University Medical School in Canada 5 and was then evolved in other institutions, including the University of Calgary 6 .
The MMI consists of three components, namely 'multiple', 'mini' and 'interview. 'Multiple' indicating the multiplicity of structured interviews for assessing the expected capabilities, 'mini' indicating the brevity of the interviews (which last seven to ten minutes), and 'interview' indicating that the multiple interviews held instead of a long, traditional, single interview comprise a whole. In this method, the number of stations varies from 6-12, 5-10 or 7-15 7 and the stations are concerned with subjects such as problem-solving, reasoning, critical thinking, clinical decision-making, communication skills, ethics and related psychomotor skills, such as searching scientific resources [8][9] . This new method is based on the principles of the Objective Structured Clinical Examination (OSCE), and while maintaining the attributes of a structured interview, it examines noncognitive capabilities. In MMI some abilities such as critical thinking and overcoming ethical dilemmas, competences and professionalizationability to gain a more precise depiction of the candidates. This type of interview evaluates the students' process of thoughts, thinking abilities and answers to questions from different aspects. The benefits of this method include accuracy and repeatability at each station, which give this method superiority over common previous methods. MMI measures a greater number of topics; in addition, in terms of scoring, it has clear and independent criteria for each station. Also, the better coordination among the raters, the coherence of the content of the exam, the allocation of the exact same amount of time to each candidate at each station and the reduced rater bias are other significant advantages of this method 10 . Although the MMI was first introduced to measure non-cognitive traits, later, it became apparent that this method also reduces bias, which was frequently observed in panel interviews 11 .
Over time, the MMI became a common, valid and reliable tool in universities across the US, Canada, Australia, the UK 2,12 and other areas of the world in fields of medicine, dentistry, nursing, pharmacy 8 , occupational therapy and herbal medicine 3,5 . In the US, the MMI has increasingly become an alternative method of admitting postgraduate medical students 13,14 . Based on the academic need-assessments in medical education, the MMI has recently replaced with traditional methods for selecting postgraduate students in South Korea 1 . The use of the MMI is also a predictor of future career success and academic performance [15][16][17][18] . A study by Shulruf et al. in New Zealand and a study by Lancia et al. showed that the use of the MMI for admitting medical students is a predictor of future academic performance in schools 19,20 .
To overcome the challenges of assessing competency and non-cognitive skills, nursing became a pioneer field in the use of the MMI 3 . Nursing is a complex profession that directly deals with patients and is therefore constantly faced with different challenges [21][22] . MMI as a value-based method is helpful for selecting competent candidates in view of their social, rational and ethical skills 23 . The findings of a research by Callwood revealed that the MMI is an appropriate approach for interviewing with nursing candidates in Canada 24 . Candidates therefore need to develop their teamwork skills, logical thinking and critical reasoning 25 .
With the introduction of efficient methods of assessment into postgraduate evaluation, some methods such as the MMI attracted great interest; however, their permanent establishment requires the assessment of their psychometric properties 25,27 .
The widespread use of the MMI as a valid and reliable tool in the world shows that, in most countries, the old PI method has been excluded and replaced by the MMI 10 . In Iran, however, the MMI is still a new approach that was first introduced into some medical universities in 2015. There is no enough evidence to show that the MMI is the best way to choose PhD exam candidates; as a result, for the first time ever in the country, a research is being carried out to assess the psychometric properties of the MMI in the evaluation of the capabilities of PhD exam candidates who have successfully passed the first-stage written exam, so as to gather evidence that encourages the continuation of PhD exam interviews held through this method.

Materials and Methods
This methodological study was conducted in September 2015 on PhD exam candidates in Golestan University of Medical Sciences, Iran. Sampling was performed through the census method, and 47 nursing PhD exam candidates who had successfully passed the first stage of the written exam was considered as a potential participants, therefore, 38 PhD exam candidates who intended for interview were recruited for the second stage.
The data was gathered using checklist, based on the proposed and headings of the Ministry of Health and Medical Education in six stations with topics on non-cognitive skills such as ethical judgments, communication skills, rational reasoning and critical thinking.
The interviewers were composed of a combination of nursing faculty members at Golestan University of Medical Sciences and some other medical universities in Iran. The interviewers had previously participated in a workshop held by the examination committee at the Ministry of Health and Medical Education. Before the interviews, the selected faculty members assessed the face and content validity through frequent meetings. They compiled a blueprint of the topics at each station by examining the ambiguities of the issuses and discussing how to solve them by preparing a checklist of the test materials in each station. The blueprint included how to design the questions and scenarios, having the guidance for each station, the number of the interviews and observers, facilities, and checklists considering 10-minute scheduling at each station.
On the day of the interview, the interviewers were again reminded on coordination, observing the time schedule, maintaining peacefulness of the environment and protecting the candidates' rights while a representative of the nursing board of Iran also attended the interview for observing its processes. They were then settled at the six stations, which assessed their required skills such as mastery in analyzing the MSc thesis, presenting scientific topic using critical skill, searching in databases, English language proficiency, professional competence in dealing with ethical challenges in clinical settings, and portfolio in the field of education, research and clinical situation.
The interviewees were then assigned to groups of six and were evaluated at each station for ten minutes. The total time spent per person was 60 minutes. Finally, after completing the test, all the checklists and scores were collected from the stations. The criterion validity was evaluated by determining the correlation between the score of the first stage of the exam and the total MMI score. The construct validity was evaluated by determining the correlation between the score of each station and the total MMI score. The internal consistency and raters' agreement was used to assess the reliability Data were analyzed using descriptive and inferential statistics in SPSS-16 software. The Shapiro-Wilk test was used to check the normality of the data and Pearson's correlation coefficient was used to evaluate the criterion and construct validity. The rater reliability was examined by the Intra-class Coefficient Correlation (ICC). The significance level was considered 0.05.

Ethical considerations
This article has been derived from a research project approved by Golestan University of Medical Sciences, under the ethics code IR.GOUMS.REC.1395.303.

Results
Of the 47 nursing PhD exam candidates who had successfully passed the first stage of the written exam, 38 were interviewed and nine were absent. Then mean age of participants was 26±3.2 years and majority of them (76%) were female.
The ambiguities of the checklists were resolved by the interviewers who was invited to determine the face validity. Also, based on the proposed headings of the Ministry of Health and Medical Education, the professors approved the content of the checklists of after discussions and debates in expert panel meeting.
The highest mean score at the stations pertained to the thesis station (74.95±9.10) and the lowest score to the searching databases station (44.80±15.60). (Table 1) The findings showed that the mean score of the firststage exam was higher than the score obtained in the interviews (MMI). (Table 2) Assuming the accuracy of the first-stage exam's scores in the accurate assessment of the candidates, the results of the criterion validity and the correlation between scores of the first-stage exam and the MMI scores revealed a coefficient of 0.22, which indicates a direct but statistically insignificant relationship (P=0.18); (Figure 1).

Construct validity
In order to determine the internal structure of the MMI constructs, the relationship between each station's score and the total MMI score was assessed (Table 3). Given that the stations' data scales were not the same, in addition to calculating the correlation between the crude scores of each station and the total MMI score, the data of each station was standardized and the correlation between the standardized scores and the standardized MMI score was also calculated and reported as Adjusted R. The findings showed that the relationship between the score of all the stations and the total MMI score was significant.

Reliability
The overall Cronbach's alpha of the MMI was 0.694, which was slightly overlooked in order to be accepted. Except for station 6 (the educational, research and clinical portfolio), which had a single rater, the details of the reliability assessment of the MMI according to the raters' agreement are presented in Table 4. The results of the raters' reliability test showed an inter-rater agreement in the scores at most of the stations, and only the searching databases station lacked homogeneity between the raters' assessment.

Discussion
The validity and reliability of a six-station MMI were measured in this study and the results are further proof of the previous evidence on the validity of the MMI method throughout the world. In the present study, the highest mean score of candidates pertained to the first station, which was about their MSc. thesis, it seems the candidates had a greater mastery over their thesis project, and it can be the reason for the higher mean scores. The mean score of candidates in searching databases station, however, had the lowest mean score. Nevertheless, in a study by Andrades et al., the ethical challenges station had the lowest score 15 . This difference can be due to the fact that mastering database search skills depends on the candidates' amount of practice and their psychomotor skills and personal differences.
In evaluating the criterion validity, the results of the test showed a low and insignificant correlation between the written exam as the criterion and the MMI, which could be due to the fact that the PhD written exam is not a valid criterion for interviews. That is, the cognitive topics incorporated into the written exam cannot be a criterion for examining non-cognitive topics in the interviews, and another indicator may be required for assessing criterion validity; however, the researcher was not able to find such criterion.
In evaluating the construct validity, before adjusting the scores, only the thesis station did not have a significant relationship with the total MMI score; however, after adjusting the station scores and determining the relationship between the score of each station and the total score, the relationship between all the stations and the total MMI score became significant. The different conditions at each station for assessing their competencies and the variation in non-cognitive skills required the use of a holistic approach and the adjustment of scores (i.e. fair scores).
In evaluating the reliability, the Cronbach's alpha of MMI was low but it was consistent with the results reported in other studies 8 , which could be due to the different design of the questions at different stations. Nevertheless, since the raters reliability is more important in objective structural tests, in this study, the assessment of the raters' reliability and their consensus of opinion at each station showed that, except for the searching databases station, there was an agreement between the three raters at most of the stations.
The limitations of this study include the small sample size and the lowest number of stations compared to similar studies, which was due to the limited interview space and small number of faculty members. Although these limitations do not face structural tests with any methodological challenges, the MMI is recommended to be implemented and assessed on a larger sample size and also in other disciplines and with more specialized professional stations and interviewers.

Conclusion
The psychometric properties of the MMI, including face validity, content validity, construct validity and rater reliability, indicate that MMI is a suitable approach for admitting PhD students in the field of nursing. Using the MMI in PhD interviews can lead to consistent and accurate postgraduate admissions. Since the total MMI score and the construct validity score can also predict academic performance in the future.