wtfumean
Участник
- Статус
- offline
- Регистрация
- 14.04.2021
- Сообщения
- 226
- Репутация
- 20
suggest final recommendations:Based on the analysis of the scale, we recommend removing item 49 as it was found to be problematic. Combining categories 0 and 1 is suggested to avoid confusion among respondents. Removing individuals with evidence of misfit also improves the scale's functioning. The scale demonstrates good reliability, with most items functioning well and demonstrating good separation and discrimination between persons. The category analysis shows that scoring categories are used effectively, but the "0" category label should be eliminated by combining it with the "1" category.
- Reliability Analysis:
- The given table contains the results of reliability analysis. It includes the mean score, standard deviation (P.SD and S.SD), maximum and minimum scores, and the Person reliability (.68) based on the item responses of 330 measured (non-extreme) persons. The Person reliability indicates the extent to which the items used to measure the construct are reliable.
- The spread of items is quite large: on average, the reliability is 1 and this is good, but there is a maximum – items with low reliability (3.81) and this means underfit and there is a minimum of 0.02 and this means overfit.
- Descriptive Analysis:
- The table also provides descriptive statistics for the total score, count, measure, and standard error (SEM) for each item. It also includes the mean square (MNSQ) and standard deviation (ZSTD) for both the INFIT and OUTFIT models. The SEM gives an indication of the precision of the mean estimate. The MNSQ and ZSTD are measures of the deviation of the observed data from the expected data under the model.
- The table also shows the maximum and minimum scores in the data, as well as the maximum extreme score (20) and the percentage of persons who obtained it (5.7%). The table also includes the real root mean square error (RMSE), model RMSE, true SD, and separation for the model, which are measures of how well the model fits the data.
- Separation in this context refers to the extent to which the questionnaire can distinguish between different levels of the construct being measured. A higher separation value indicates a greater ability of the questionnaire to differentiate between groups of persons with different levels of the construct. The table shows that the model has a separation value of 1.74, indicating that the questionnaire can discriminate fairly well between different levels of the construct.
- Overall, the table provides a comprehensive descriptive analysis of the item responses of 330 measured persons and the reliability of the construct measured by the items.
- Mean
- 16.1
- Standard Error of the Mean (SEM)
- .2
- Person Standard Deviation (P.SD)
- 4.4
- Standard Deviation of the Item (S.SD)
- 4.4
- Maximum Value
- 27.0
- Minimum Value
- 4.0
- Dimensionality analysis (Table 23)
- The dimensionality analysis shows that the total raw variance in the observations is 11.8857 Eigenvalue units, and this accounts for 100% of the variance. Of this, 4.8857 Eigenvalue units (41.1%) is explained by measures, which can be further broken down into 1.7361 Eigenvalue units (14.6%) explained by persons and 3.1496 Eigenvalue units (26.5%) explained by items. The remaining raw variance (7.0000 Eigenvalue units, 58.9%) is unexplained.
The standardized residual variance scree plot and variance component scree plot both show a steep drop-off in variance explained after the second component. This suggests that a two-dimensional solution may be appropriate for the data.
The approximate relationships between person measures show that there are some negative correlations between clusters 1-3 and positive correlations between clusters 2-3 in the first contrast. In the second contrast, there are positive correlations between clusters 1-2 and 2-3. In the third contrast, there are positive correlations between clusters 1-2 and negative correlations between clusters 1-3 and 2-3. In the fourth contrast, there are positive correlations between clusters 1-3 and 2-3 and a weaker positive correlation between clusters 1-2. In the fifth contrast, there are positive correlations between clusters 1-3 and 2-3 and a weaker positive correlation between clusters 1-2.
Item 49 stands out strongly. Task 49 goes against the points that go in this school. Perhaps this item needs to be recoded. You should also pay attention to other items whose correlation (positive) is less than 0.02. This means that they measure the latent trait.
Overall, these results suggest that the questionnaire may have two underlying dimensions, and the clusters of items may be related to these dimensions. However, further analysis would be needed to confirm this and fully explore the structure of the data.
- Item continuum (Item Person (Thresholds) Map) (Table 12 or 16)
- How evenly the items are distributed in accordance with the difficulty and abilities of the
The initial item bank of 7 items was reviewed by an expert panel. The expert panel included two master degree students. The participants was asked to grade items (0 = Strongly Disagree, 1 = Disagree 2 = somewhat Disagree, 3 = Agree, 4 = Strongly Agree) according to their relevance in terms of the stress. A goal of study is to analyze and improve current rating stress scale.
These two-step difficulty levels, or “thresholds” needed to cross from one category to the next, are assumed to be consistent across items. The analysis included 350 complete surveys.
In Table 12.2, the full item names are shown located at their calibrations, along with the person distribution. On the graph, M marker is tended to harder meaningful construct. It means that Items that are explicitly ethical in nature were theorized and subsequently designed to be easier to answer, whereas items that are less-clearly ethical in nature were theorized and designed to be more difficult to answer correctly. [Miliken].
We make a hypothesis that people are easier to answer with negative relation to the answers and combine scales “strongly disagree” and “disagree”. After hypothesis examination, we conclude that people located at lower levels of estimated ethical awareness should be able to answer easier items. The “M” represents the mean logit; the “M” on the left side of the vertical axis is the mean for the person ability and on the right is the mean for item difficulty (constrained to be zero for statistical estimation purposes).
Finally, the variable map demonstrates an excellent progression of persons (on the left side). Most scorers fall towards the upper side of the scale, with a spread of persons towards both tails. This finding indicates that most test takers had a low-level range of stress awareness, with a higher number of people falling in the ranges.
Andrich threshold (Fk) is approximately the log-ratio of the frequency of adjacent categories. When category frequency is low, then the Andrich threshold is poorly estimated and unstable. This is useful for inference and for confirming the construct validity of the rating scale. Most users of your findings will assume this is true. This is true when the observed values of the average measures for each category approximate their expected values. For a 5 rating scale we see a monotonous increase of logit from one to other answer. This scale do not show true value due to low observed average measure that is .40, .60 logits. As a solution we propose to combine categories “0” and “1”.
Model Fit Analysis (Table 10 or 14)
To assess the “fit” between observed and expected values, the socalled INFIT and OUTFIT statistics produced by Winsteps, for both persons and items were evaluated. Misfit analyses assess person-by-item residuals, or the differences between the observed and expected responses. Person-level fit statistics can indicate potential outliers and item-level fit statistics provide information about how well items are functioning. A mean squared residual (MSNQ) of 1.5 was used to identify problematic items. One item – 49 had both INFIT and MNSQ values of >1.5; this suggests that all items are not functioning well. We suggest omitting item 49 to reduce a disturbance of the scale. After cancellation of the item, all items are functioning well MNSQ values of >1.5.
The final analysis has achieved an item separation of 6.46 (compared with a suggested goal >3) and item reliability of 0.98 (goal >0.9). This indicates sufficient item separation (meaning there are statistically discernable levels of items) and it reinforces confidence in the construct validity of the instrument, meaning the quantitative results match the hierarchical construct continuum developed in the initial item development phases.
In terms of “person fit,” the typical pattern observed in statistically unusual responses were high scorers who unexpectedly did not endorse an item. This resulted in large residuals when individuals, for example selected “disagree” when “agree” was expected. Thus, although there were individuals with some evidence of misfit, there appear to be justification for removing them from the final analysis.
Finally, the achieved person separation is 1.73 (goal >2), with a reliability of .73 (goal >0.8). This indicates that overall the scale is able to differentiate between low and high performers.
Category analysis (scale categories’ functioning) (Table 3.2)
Category characteristic curves (CCC’s) indicate how well the instrument’s scoring categories are being used.
The curves in this table cross at Andrich thresholds, which represent a 50% probability of a response being in one category or the next. In other words, a high scorer has a higher probability of scoring a 1 (disagree) on an easy item. A low scorer has a higher probability of scoring a 4 (Agree) on a difficult item. The CCC’s in table show excellent use of the scoring categories. These curves demonstrate ordered scoring thresholds (from 1->4) and there is approximately a logit of progression between thresholds. All other categories 2 and 3 are not useful; their probability of responses are below .4 , cross at equal thresholds and have even spacing in between.
We model “0” category label that is indicated by a mean-square of INFIT 2.51. It shows that this category has been used in contexts in which the expected category is far different – respondents