Research Reports |
AA Guccione, PT, DPT, PhD, FAPTA, is Senior Vice President, Practice and Research Division, American Physical Therapy Association, 1111 N Fairfax St, Alexandria, VA 22314-1488 (USA) (andrewguccione{at}apta.org)
TJ Mielenz, PT, PhD, OCS, is Research Faculty, Thurston Arthritis Research Center, and Assistant Professor, Division of Physical Therapy, School of Medicine, University of North Carolina at Chapel Hill, Chapel Hill, NC
RF DeVellis, PhD, is Research Professor, Department of Health Behavior & Health Education, School of Public Health, and Adjunct Professor, Department of Psychology, College of Arts and Sciences, University of North Carolina at Chapel Hill
MS Goldstein, EdD, is Director of Research Services, Practice and Research Division, American Physical Therapy Association
JK Freburger, PT, PhD, is Research Associate and Fellow, Cecil G Sheps Center for Health Services Research, and Assistant Professor, Division of Physical Therapy, School of Medicine, University of North Carolina at Chapel Hill
R Pietrobon, MD, PhD, is Assistant Research Professor, Center for Excellence in Surgical Outcomes, Duke University Medical Center, Durham, NC
SC Miller is Assistant Director of Research Services, Practice and Research Division, American Physical Therapy Association
LF Callahan, PhD, is Associate Professor, Departments of Medicine, Orthopaedics and Social Medicine, School of Medicine; Adjunct Associate Professor, Department of Epidemiology, School of Public Health; and Research Fellow, Cecil G Sheps Center for Health Services Research, University of North Carolina at Chapel Hill
K Harwood, PT, PhD, CIE, is Director of Practice, Practice and Research Division, American Physical Therapy Association
TS Carey, MD, MPH, is Director of the Cecil G Sheps Center for Health Services Research; Professor, Internal Medicine and Social Medicine, School of Medicine; and Adjunct Professor, Department of Epidemiology, School of Public Health, University of North Carolina at Chapel Hill
Address all correspondence to Dr Guccione
Submitted July 8, 2004;
Accepted January 28, 2005
Key Words: Actions Movement Outcomes Physical therapy Reliability Responsiveness Validity
|
|
|---|
Physical therapy is a health profession whose primary purpose is the promotion of optimal health and function. This purpose is accomplished through the application of scientific principles to the processes of examination, evaluation, diagnosis, prognosis and intervention to prevent or remediate impairments, functional limitations, and disabilities as related to movement and health.3
This declaration focused the description of physical therapist practice, perhaps for the first time, away from the specific interventions physical therapists use as its chief defining characteristic and centered instead on its overall process and purpose: Physical therapists manage and prevent movement dysfunctions for the purpose of promoting optimal health and function as defined by the individual receiving services. Function is defined as those activities identified by the individual as essential to support physical, social, and psychological well-being.3 Yet, the degree to which the "promotion of optimal function" may be linked to underlying movement dysfunction has not been fully determined. For example, a movement dysfunction may be caused by an impairment of strength (ie, the force exerted by a muscle to overcome a resistance) that might be measured with a dynamometer and addressed by implementation of a therapeutic exercise program. Yet, we are not assured that an improvement in the strength measurement will directly lead to increased independence in dressing, bathing, or performing home chores. Jette and Keysor4 have previously noted that the currently available evidence regarding the association of impairment with activities of daily living (ADL) is relatively weak.
In response to this weak evidence, researchers have recommended the use of multiple and different measures in determining the effectiveness of physical therapy interventions. In addition to impairment-related measures, physical therapists are urged to use measures of functional ability and patients' participation in their desired social roles. However, as noted in the Institute of Medicine's expansion of the Nagi model of disability,5 a person's level of function and ability to actively participate in life is not entirely dependent on the person's physical capabilities but involves a negotiated interaction between the person and the specific physical and social environments. Outcome instruments that solely concentrate on traditional functional abilities and participation in social roles (eg, bathing, dressing, cooking, shopping, work) may be too broadly dependent upon these negotiated interactions and underplay an important and more immediate association between physical therapy interventions and patient outcomes. Thus, other instruments that measure outcomes that are most nearly directly affected by physical therapy intervention are needed.
The newly revised International Classification of Impairments, Disabilities, and Handicaps (ICIDH), now known as the International Classification of Functioning, Disability and Health (ICF), suggests the theoretical underpinning for an instrument that could be up to the task of measuring the most proximate outcomes of physical therapy.6 The ICF provides a framework for describing health and health-related conditions. One component of the ICF is "Activities and Participation," which includes the domains of self-care and mobility. The ICF defines activity (ie, functional ability) as the execution of a task or action by an individual. The ICF system implicitly suggests that function comprises actions (eg, squatting, kneeling, bending) and tasks such as self-care, personal hygiene, and taking care of one's health that are themselves complex movements (eg, drying oneself, putting on footwear, eating). In general, the "coordinated actions and tasks" described in this scheme most closely resemble items included in traditional measures of ADL. The underlying premise of this hierarchical ordering is that basic ADL and instrumental activities of daily living (IADL) are predicated upon the successful accomplishment of actions or movements. The ICF rendering is consistent with the Guide to Physical Therapist Practice's conceptualization of function that postulates that sensorimotor performance "underlie(s) ... daily, fundamental organized patterns of behaviors."2(p30) Based on our reading of the ICF and in concert with the Guide to Physical Therapist Practice, we identified an individual's ability to perform actions as an essential construct related to movement and health around which an outcome instrument might be designed to best capture what was, and was not, achieved by physical therapy intervention before the intervening and moderating effects of physical and social environments on the performance of basic ADL and IADL. Therefore, we considered that the terms "actions," "mobility actions," and "movements" could be used interchangeably for our purposes.
Conceptualizing function in terms of actions and tasks to develop an outcome measure is not novel. There are a number of tests and measures of physical performance that quantify by direct observation the complex integration of systems that permit an individual to maintain a posture, transition to other postures, or sustain safe and efficient movement,711 as suggested by the description of "mobility" in the ICF.6(p138) However, it is critical to appreciate that such tests typically characterize a person's performance limitations under controlled conditions from the observer's frame of reference. Although each of these performance tests can contribute to an overall understanding of a person's functional limitations by identifying the movement dysfunction that may underlie physical disability, they generally do not capture function as it actually occurs in the patient's natural environment, even when a test contains elements that mimic everyday life. Furthermore, they do not always account for factors that may positively or negatively modify a person's function, which also is influenced by cognition, motivation, social support, and physical environment. Physical performance measures also can be time-consuming for the therapist, a key concern when the time available to spend with patients is constrained. Therefore, we conceptualized that a new instrument should capture patients' function in their own environments from their own perspectives and pose low respondent and therapist burden through the use of self-report.
Self-report approaches have become well accepted in research and have increasingly been integrated into clinical practice. Self-report is now considered to be the most feasible and cost-effective means of gathering standardized functional status data on large numbers of individuals and is preferable to observation-based methods in some circumstances.12,13 Therefore, we determined that a self-report measure was most appropriate to our aims of developing a short, clinically relevant instrument that would further the diagnostic process used by the clinician, serve as a valid outcome measure of care, and pose as little burden as possible on both the patient and the therapist. Arguably, self-assessment of function is most consistent with Sackett and colleagues' tenets of evidence-based practice, requiring that a patient's values be conjoined to best clinical practice and clinically relevant research.14 For this reason, we believed that a new instrument should contain a specific mechanism to allow patients to self-identify those movements that they wanted most to change as a result of physical therapist care. By allowing self-identification of the patient's problems, the instrument also would serve to facilitate communication between patient and therapist regarding what each patient values as an outcome of care.
Difficulty with movement and symptomatology are common outcomes measures relevant to physical therapists. However, the psychological aspect of rehabilitation is acknowledged less often in physical therapy research. Williams and Myers15 spoke to a respondents' level of confidence concerning various movements and postures affected by low back pain. However, the instrument being developed potentially expands upon the work done by these researchers. Williams and Myers' instrument was used among patients with low back pain. Thus, it was restricted to patients with a specific condition, and it is difficult to determine whether the instrument could be cross-validated among a more diverse set of patients. Confidence, as was the case in the study by Williams and Myers, was assessed based on the psychological construct of self-efficacy. Derived from the social psychological literature and developed as an attempt to add to an explanation of motivation, the construct was introduced by Bandura,16 whose theory states that people's beliefs about their capabilities to produce designated levels of performance will exercise control over events that affect their lives.
Bandura's self-efficacy theory distinguishes between, for example, belief that exercise can make a difference and belief that an individual can perform the exercise. A strong sense of self-efficacy stimulates an individual to approach difficult tasks as challenges to be mastered. Conversely, a person with a lower sense of self-efficacy perceives these same challenges as threats to be avoided. Therefore, improvement in any movement disorder may be attributed to a person's belief system, as well as the change attributed to the physical therapist's intervention. We concluded that an outcome measure that could capture the impact of a person's sense of mastery over the ability to perform actions would add an important dimension to our understanding of the relationship between physical therapy intervention and function.
Although many physical therapy interventions are not unique to physical therapists and therefore are not its chief defining characteristic, physical therapist practice is distinctive in its contribution to health care because it specifically targets movement dysfunction through these interventions. If movement dysfunction universally underlies physical therapist practice, then a person could presume that general measures of movement dysfunction would be commonly available in physical therapist practice and research. Yet even a cursory review of the Catalog of Tests and Measures that was included with the CD-ROM version of the Guide to Physical Therapist Practice, second edition,2 indicates that the profession generally lacks the instruments to measure a change in movement across a very wide spectrum of patients as an outcome of physical therapist practice, except with respect to normal childhood development as well as certain medical, and primarily neurological, conditions. Having identified movement as a construct universal to physical therapist practice and having identified difficulty and confidence as critical factors affecting the ability to perform actions, our overall research goal was to develop a clinically relevant outcomes instrument that would capture a patient's experience, including the behavioral dimension of self-efficacy, with minimal burden. Our specific aims in this study were: (1) to develop a self-report instrument that could be used to assess the ability to perform actions or movements across the spectrum of patients receiving physical therapy in adult, outpatient settings and (2) to assess the psychometric properties of the instrument in adult, outpatient settings that primarily provided services to patients with musculoskeletal conditions. Our analyses focused on the musculoskeletal conditions only to perform the known-groups validation and to calculate the frequencies of the 3 activities that the subjects would most like to be able to do without any difficulty.
|
|
|---|
The ICF's implied hierarchy of function6 was adopted as the conceptual starting point for the group. The primary goals of the focus group were: (1) to come up with a comprehensive list of actions, (2) to identify the key dimensions of ability to perform mobility actions that could be assessed from the patient's perspective, and (3) to develop a scale to measure these key dimensions. The group identified 24 movements. Identification of these movements or actions was based on discussion among the group as to which potential actions were most likely to result in the types of movement dysfunction typically seen by a physical therapist practicing in an outpatient setting.
Additionally, the member consultants of the focus group shared the view that confidence or self-efficacy was a major contributor to the outcome of physical therapy intervention. Capturing information that explicitly introduces this behavioral dimension introduces a component to physical therapist practice that potentially explains substantive variance in effectiveness of physical therapy interventions. The group identified the following 3 dimensions of ability to perform the action: difficulty performing the action, pain and symptoms experienced during the action, and confidence or self-efficacy in performing the action.
Five-point Likert scales were developed to measure the patients' perceptions of difficulty, pain and symptoms, and confidence for each of the actions. The Difficulty Scale ranged from 1 ("able to do without any difficulty") to 5 ("unable to do"). The Pain and Symptoms Scale ranged from 1 ("no pain or symptoms") to 5 ("extreme pain or symptoms"). The Confidence Scale ranged from 1 ("fully confident in my ability to perform") to 5 ("not confident in my ability to perform"). Likert scales were chosen because they are easy to understand, facilitate survey completion, and are easy to evaluate from the administrator's perspective. Furthermore, the group developed the name for the instrument. The instrument was given the name Outpatient Physical Therapy Improvement in Movement Assessment Log (OPTIMAL). This name described the purpose for which the instrument was created, as well as lending itself to an acronym that could easily be remembered by users of the instrument. In addition to the instrument itself, the focus group developed a patient intake form to gather demographic and diagnostic data. The form was based on information included in APTA's Guide to Physical Therapist Practice.2 The group decided which of the available data elements were most important to use as part of the instrument.
The OPTIMAL instrument then was reviewed by a second group of researchers (both physical therapists and nonphysical therapists) with backgrounds in musculoskeletal disorders and disease and with knowledge of outcomes and effectiveness research. Based on the feedback from this group, the dimension of pain and symptoms was dropped from the instrument for 2 reasons. First, pain and symptoms represent more than one dimension, making responses to this question difficult to answer from the patient's perspective and difficult to interpret from the clinician's perspective. For example, an individual may have moderate pain, severe weakness, and mild numbness or paresthesia. Second, although pain and symptoms are most often correlated with difficulty (ie, increased pain and symptoms increase difficulty in the task), there could be instances when this is not the case. An individual may have severe leg pain, but may still be able to walk without difficulty. Conversely, an individual may have only minimal leg pain, but have extreme difficulty walking. Because the focus of OPTIMAL was on the ability to perform mobility actions, the instrument was designed to capture difficulty in performing the action and confidence in performing the action. Other minor changes also were made with wording and information gathered on the patient intake form. A final question was added to the baseline instrument, asking the respondent to identify the 3 activities he or she would most like to be able to do without any difficulty. This question was added to help with therapist goal setting. The pilot survey questionnaire is presented in the Appendix.
Pilot Testing of the Survey Instrument
The primary purpose of pilot testing the OPTIMAL instrument was to assess the internal consistency reliability, validity, and responsiveness (at 2- or 4-week intervals) of data obtained with the survey instrument on an adult, outpatient population. To assess the discriminant validity of the survey data, 3 additional nonmobility-related actions were added to the OPTIMAL instrument. These actions were reading, managing a checkbook, and making decisions. Because we also were interested in assessing convergent-related validity of data obtained with the instrument, study participants completed the PF-10. The PF-10 includes the 10 items on the Medical Outcomes Study 36-Item Short-Form Health Survey questionnaire (SF-36)17 that are used to calculate the physical function scale score. The 10 items ask about the level of difficulty with ADL, vigorous activities, and moderate activities. Along with the PF-10 at baseline, participants answered 2 general items about difficulty with actions and confidence with actions. The difficulty item was: "Thinking about all of the activities you would like to do, please mark an X at the point on the line that best describes your overall level of difficulty with these activities today." Below the item was a 100-mm visual analog scale (VAS) anchored on the left with "I have extreme difficulty doing any of the activities that I would want to do" and anchored on the right with "I have no difficulty doing any of the activities that I would like to do." The confidence item was: "Thinking about all the activities you like to do, please mark an X at the point on the line that best describes your overall level of confidence in performing these activities today." Below the item was a 100-mm VAS anchored on the left with "I have no confidence that I can do activities that I would want to do" and anchored on the right with "I have complete confidence that I can do activities that I would want to do." These latter 2 items also were used to assess convergent validity.
Outpatient Sites and Subjects
Four different outpatient systems or facilities, located in the northeastern and midwestern parts of the country, participated in data collection. These settings included 2 urban-based hospital outpatient clinics and 2 rural networks of smaller clinics. The institutional review board (IRB) at each participating site approved participation in the data collection process. If the site did not have an IRB, then IRB approval was provided through the coordinating institution by each participating clinician who signed an unaffiliated investigator agreement.
Each site or network that participated in the study designated an individual as the site coordinator. The coordinator was sent a packet containing patient consent forms, baseline questionnaires, and follow-up questionnaires, as well as therapist instructions on who was eligible to participate in the study and how to collect the data. Any new patient who was being seen for either an initial examination or treatment was eligible to participate in the study if the patient was: (1) 18 years of age or older, (2) spoke or read English, and (3) had the cognitive ability to complete the questionnaire independently. Therapists were specifically instructed to have the patients complete the forms independently. The follow-up forms were completed at either approximately 2 or 4 weeks following intake into the clinical setting. The order in which the Difficulty Scale and Confidence Scale were presented on the questionnaires was alternated. The assignment into 1 of the 2 time frames (either 2 or 4 weeks) and the order of the Difficulty Scale and Confidence Scale were based on randomization. Completed questionnaires were stored in a secure area, returned to APTA, and forwarded from APTA to the coordinating institution for data entry and analysis.
The physical therapists' diagnoses were divided into the following 4 categories: (1) upper extremity (22%), (2) lower extremity (24%), (3) trunk (33%), and (4) general (21%). The upper-extremity, lower-extremity, and trunk diagnoses were all related to the musculoskeletal system, with the trunk diagnoses being all spinal pain or dysfunction. The general category consisted of diagnoses that covered multiple body regions or that were not directly related to the musculoskeletal system. Chronic obstructive pulmonary disease, dizziness, brain injury, multiple sclerosis, and cerebrovascular accident are some examples of the vast diversity of diagnoses in the general category. No one diagnostic group within this general category was sufficiently large enough to perform some of the specific analyses. Data of patients within this general diagnoses category (n=81) were not used in the known-groups validation analyses.
Data Analysis
Descriptive statistics were calculated to assess the demographic and clinical characteristics of study participants.
Item selection and discriminant validity.
Exploratory principal components factor analyses (PCFAs) were conducted to determine the underlying factor structure (ie, constructs) of the difficulty items and confidence items. Four separate analyses were conducted: difficulty items at baseline, difficulty items at follow-up (either 2 or 4 weeks), confidence items at baseline, and confidence items at follow-up. Analyses were conducted for the baseline and follow-up data to determine whether the factor structure remained the same for the first and second administrations of the test. The eigenvalue-greater-than-1 rule and scree tests were used to determine the number of factors present. If more than one factor was present, an oblique rotation was used to allow for correlation between the factors.18,19 Items were dropped if they loaded weakly (ie, a factor loading of <0.3 on all factors) or ambiguously (ie, a factor loading of >0.3 on more than one factor). In addition to identifying the underlying factor structure of the OPTIMAL instrument, the results of these analyses were used to assess the discriminant validity of data obtained with the instrument by examining the loadings of items that were added to the instrument (ie, managing a checkbook, making decisions, reading). Discriminant validity would be demonstrated if these items did not load with the mobility items. That is, the items would be shown to discriminate between physical activities and more cognitive everyday tasks.
Based on the factor structures identified, average difficulty and confidence subscale scores were created for each subject using the baseline data. This was done by calculating an average score for items that loaded on each of the factors. If more than 25% of the items were missing, the average score was not calculated and was recorded as missing. Average scores were used instead of summary scores for 2 reasons. First, average scores allow direct comparison across subscales with different numbers of items. Second, the average score can be interpreted directly from the response options. Two additional PCFAs were done with the average scores for each subscale (ie, higher-order PCFAs): one on the average difficulty subscale scores and one on the average confidence subscale scores. These analyses were performed to determine whether the difficulty items and the confidence items were each measuring a more general, one-dimensional construct (ie, global difficulty and global confidence, respectively).
Internal consistency reliability.
Cronbach alpha reliability coefficients were calculated after the exploratory PCFA for items corresponding to each factor present for the Difficulty Scale and Confidence Scale at baseline. The order of the Difficulty Scale and Confidence Scale was randomly alternated to test whether reliability changed because of order. Cronbach alpha reliability coefficients was calculated for the items corresponding to each factor present for the respective order of the Difficulty Scale and Confidence Scale at baseline or follow-up. If the quality of the responses diminished because of respondent fatigue, then the Cronbach alphas also would decrease because the responses would have more random error.
Construct and convergent validity.
Factorial validity is a specific type of construct validity in which a person seeks confirmation of the hypothesis that items will form aggregates in accordance with prespecified constructs.19 To establish the construct validity of data obtained with the OPTIMAL instrument, 2 separate exploratory PCFAs were conducted on the 2-week and 4-week follow-up data. The results were compared to determine whether the same number of factors were present and to determine whether the content of the factors was the same. Cronbach alpha reliability coefficients also were calculated on items corresponding to each of the factors for the 2- and 4-week follow-up data. The values of the coefficients then were compared. Similar alpha coefficients would indicate stable reliability for the 2-week and 4-week follow-up data. The baseline Difficulty Scale and the baseline Confidence Scale were related to the PF-10 and the VAS by Pearson correlation and scatter plots.
We categorized subjects a posteriori into the same body regions as those of the subscales of the Difficulty Scale and Confidence Scale (upper extremity, lower extremity, and trunk) to test known-groups validation. Known-groups validation uses membership in a group as an attribute to differentiate members of one group from those of another group based on their scale scores and, in our case, demonstrates construct validity.19 This categorization was based on the physical therapists' diagnoses. Analyses of variance (ANOVAs) and t tests were then conducted for known-groups validation, comparing the mean score for each subscale of the Difficulty Scale and Confidence Scale by body region (upper extremity, lower extremity, and trunk). The origin of the subscales of the Difficulty Scale and Confidence Scale is discussed in further detail in the "Results" section.
Range of measurement.
Floor and ceiling effects were evaluated by the proportion of scores at the extremes of the 5-point Likert scales for subjects who were categorized into the same body regions as the subscales of the Difficulty Scale and Confidence Scale (upper extremity, lower extremity, and trunk).20 The floor effect was the proportion of scores reported as 5 ("unable to do" or "not confident in my ability to perform"). The ceiling effect was the proportion of scores reported as 1 ("able to do without any difficulty" or "fully confident in my ability to perform"). Baseline and follow-up scores were used to evaluate the floor and ceiling effects.
Responsiveness.
The responsiveness of the OPTIMAL instrument was measured by effect size using the Cohen D formula (ie, the difference of the baseline mean scores and the follow-up mean scores divided by the standard deviation of the baseline scores).21 The effect size was calculated for the 4-week follow-up period only because this is a reasonable time period to expect clinical change. Paired t tests were conducted to determine significant differences between baseline and 4-week follow-up scores. Using both effect size and t-test results is more complete than just looking at the effect size alone, because the t tests provide assurance of significance. Paired t tests give a sense of how large a sample size needs to be for effects of the reported size to be statistically reliable.
|
|
|---|
|
View this table: [in a new window] |
Table 1. Outpatient Physical Therapy Improvement in Movement Assessment Log (OPTIMAL) Patient Cohort: Clinical, Work, and Demographic Characteristics at Baseline (N=391)
|
|
View this table: [in a new window] |
Table 2. Frequency of the 3 Activities Subjects Would Most Like to Be Able to Do Without Any Difficulty
|
Items 1 through 24 for the Difficulty Scale loaded on 3 factors that explained 72% of the total variance and that were rotated to an oblique solution (Tab. 3). For each item, the primary loading was >0.69 and the secondary and tertiary loadings were never >0.21 (Tab. 4). The factors, loadings, and percentages of variance for the Difficulty Scale at follow-up, the Confidence Scale at baseline, and the Confidence Scale at follow-up were similar (Tabs. 3![]()
6). The 3 factors appear to be representing the construct difficulty with upper-extremity mobility (items 1924), trunk mobility (items 14), and lower-extremity mobility (items 7, 8, and 1018). Items 5 (movingsitting to standing), 6 (standing), and 9 (turning/twisting) were dropped because these factors loaded weakly on more than one factor. The same 3 factors were found and the same 3 items were dropped for the Confidence Scale at baseline and follow-up.
|
View this table: [in a new window] |
Table 3. Results of Exploratory Principal Components Factor Analyses for the Difficulty Scale (Baseline and 2- or 4-Week Follow-up)
|
|
View this table: [in a new window] |
Table 4. Factor Loadings After Oblique Rotation for the Difficulty Scale (Baseline and 2- or 4-Week Follow-up)
|
|
View this table: [in a new window] |
Table 5. Results of Exploratory Principal Components Factor Analyses for the Confidence Scale (Baseline and 2- or 4-Week Follow-up)
|
|
View this table: [in a new window] |
Table 6. Factor Loadings After Oblique Rotation for the Confidence Scale (Baseline and Follow-up)
|
|
View this table: [in a new window] |
Table 7. Average Scores for the Subscales of the Diffriculty and Confidence Scale at Baseline
|
Test order did not appear to play a role in the overall factor structure of the subscales of the Difficulty Scale and Confidence Scale. The Cronbach alphas for 3 subscales of the Difficulty Scale and Confidence Scale were almost identical regardless of whether the Difficulty Scale was first or the Confidence Scale was first. If respondent fatigue was present, then the Cronbach alphas would not have been the same but would have decreased according to order.
Construct and Convergent Validity
The number and the content of factors were the same for the 2- and 4-week follow-up time intervals for both the Difficulty Scale and Confidence Scale. The Cronbach alphas for the subscales of the Difficulty Scale at 2- and 4-week follow-ups, respectively, were: trunk (.82, .87), lower extremity (.95, .96), and upper extremity (.93, .94). For the subscales of the Confidence Scale, the Cronbach alphas for the 2- and 4-week follow-ups, respectively, were: trunk (.87, .87), lower extremity (.95, .95) and upper extremity (.94, .95).
The baseline Difficulty Scale scores had strong correlations with PF-10 scores (.80) and moderate correlations with VAS scores for overall difficulty (.65). The baseline Confidence Scale scores had strong correlations with PF-10 scores (.72) and moderate correlations with VAS scores for overall confidence (.60).
The results of the ANOVAs and the t tests for known-groups validation included: (1) subjects with upper-extremity diagnoses scored higher (meaning having more difficulty) on the upper-extremity subscale of the Difficulty Scale (P<.001) compared with subjects with lower-extremity diagnoses and subjects with trunk diagnoses, but the differences with the subjects with trunk diagnoses were not statistically significant (P=.116), (2) subjects with lower-extremity diagnoses scored higher on the lower-extremity subscale of the Difficulty Scale compared with subjects with upper-extremity diagnoses (P<.001) and trunk diagnoses (P<.001), and (3) subjects with trunk diagnoses scored higher on the trunk subscale of the Difficulty Scale compared with subjects with lower-extremity diagnoses (P<.001) and upper-extremity diagnoses (P=.009). These results are graphically represented in Figure 1.
![]() View larger version (36K): [in a new window] |
Figure 1. Comparing subscales of Difficulty Scale with physical therapists' diagnoses by body region.
|
![]() View larger version (36K): [in a new window] |
Figure 2. Comparing subscales of Confidence Scale with physical therapists' diagnoses by body region.
|
Responsiveness
A common way to classify effect size is by using Cohen's definitions of small (
0.2), medium (>2.0 and
0.5), and large (>5.0) effect sizes.21 The effect sizes for the lower-extremity subscales of the Difficulty Scale and Confidence Scale by lower-extremity diagnoses were in the medium range (0.35 and 0.44, respectively). The effect sizes for the trunk subscales of the Difficulty Scale and Confidence Scale by trunk diagnoses also were also in the medium range (0.21 and 0.36, respectively). The effect size at the 4-week follow-up for the upper-extremity subscale of the Difficulty Scale by upper-extremity diagnoses was very small (0.09). The upper-extremity subscale of the Confidence Scale by upper-extremity diagnoses was medium (0.32), but not in the expected direction. All of the paired t tests between baseline and 4-week follow-up were significant except for the trunk subscale of the Confidence Scale.
|
|
|---|
Overall, the psychometric properties of the 21-item OPTIMAL instrument were strong. The Cronbach alpha coefficients for the 3 subscales for the difficulty items and the confidence items ranged from .85 to .95, indicating excellent reliability. Respondent fatigue also did not affect reliability, because the Cronbach alpha coefficients were similar regardless of order of testing. There were minimal to moderate ceiling effects for some of the OPTIMAL subscales. Future work should extend the range of items to avoid these ceiling effects. There also was evidence for the discriminant validity of data for the instrument based on the results of the factor analysis. The 3 nonmobility items added to the scale loaded on a separate factor. Evidence for the construct validity of data for the OPTIMAL instrument was found in that it performed differentially across known groups. Known-groups validation was done by comparing the mean score for each subscale of the Difficulty Scale and Confidence Scale by diagnostic subgroups created from the physical therapists' diagnoses. Subjects in each diagnostic subgroup scored higher (meaning having more difficulty) on each appropriate subscale. For example, patients with upper-extremity diagnoses scored higher on the upper-extremity subscale for both the Difficulty Scale and Confidence Scale compared with subjects with other diagnoses. The OPTIMAL instrument demonstrated convergent validity by correlating extremely well with the PF-10 (baseline Difficulty Scale: .80; baseline Confidence Scale: .72) and correlating moderately well with the overall VAS scales (baseline Difficulty Scale: .65; baseline Confidence Scale: .60). Construct validity was supported by the fact that the results of the factor analyses on the data from 2 follow-up periods were similar.
The responsiveness of OPTIMAL was assessed by effect size. Four of the 6 subscales had effect sizes ranging from 0.21 to 0.44 at the 4-week follow-up period, which was the most sensitive to change. These effect sizes were primarily in the medium range and indicate that the OPTIMAL instrument is responsive over time. Two of the 6 subscales were less responsive to change over time. The upper-extremity subscale of the Difficulty Scale had a very small effect size and the upper-extremity subscale of the Confidence Scale had a negative effect size, which indicates that the participants became less confident with mobility over time. This finding is probably due to the small number of items in the upper-extremity subscale. The smaller the number of items, the smaller the range of scores, and thus the smaller the sensitivity to change. Nevertheless, all of the paired t tests comparing the baseline scores and the follow-up scores were significant, indicating that the OPTIMAL instrument can detect small changes. Patients generally improve with the passage of time, and the OPTIMAL instrument is expected to reflect this outcome. However, the change of each subscale corresponded to the appropriate diagnoses, providing evidence that the OPTIMAL instrument is responsive over time.
Internal consistency reliability was used in this study instead of test-retest reliability for several reasons. Test-retest reliability is useful only when one can conclude that the phenomenon being measured is stable, and most constructs, including difficulty or confidence with mobility, are not stable over time. Therefore, we believe that internal consistency reliability was the correct method to compute reliability. Test-retest reliability is most useful when its value is close to coefficient alpha, but when the values are different, the interpretation of test-retest reliability is difficult. Test-retest reliability may be higher than coefficient alpha (internal consistency) because the subjects have remembered their responses.18 Test-retest reliability may be lower than coefficient alpha because the test-retest confounds change in the phenomenon with measurement error in the tool.
There are several limitations in the design of this study. The subjects in this study were selected using a nonprobability sampling design, more specifically, convenience sampling. Willing patients at the physical therapy adult outpatient clinics volunteered to be in the study. The number and characteristics of the nonresponders are not known. Patients who offer to participate may introduce bias because they may be somewhat different from the entire adult outpatient physical therapy population. The 4 clinics were diverse and scattered across the United States, but neither the patients nor the clinics were randomly selected. For purposes of psychometric testing, random selection of the participants is less important because instruments are validated on specific populations. The subjects in this study were well educated and had to be able to read English to be included in the study; therefore, low literacy was not an issue. However, other adult outpatient physical therapy clinics may have a greater proportion of patients with lower literacy. In these clinics, similar outcome questionnaires have been administered orally; therefore, OPTIMAL also could be administered orally. The population in this study was a generalizable sample of adult outpatient physical therapy patients. However, future work could look more specifically at the diagnoses of the study participants, including those subjects who were classified as having "general" diagnoses. The sample size in this study was more than sufficient for factor analysis, but may not be large enough to conduct item-response theory analyses. Future research also should be conducted to strengthen the upper-extremity subscale and to further establish the psychometric properties of this instrument.
To use OPTIMAL clinically, the instrument can be administered at the initial evaluation and then either 4 weeks later or at discharge (if sooner than 4 weeks). Although it is preferable to administer the entire OPTIMAL instrument in order to compare populations, administering questions for a specific subscale may be sufficient for some patients if pressed for time in the clinic. For some patients, the entire OPTIMAL instrument is the only appropriate clinical option based on the patient's diagnosis.
The OPTIMAL instrument is an efficient way for the physical therapist to further goal setting from the patient's perspective. The physical therapist would have to decide if the goals chosen by the patient were appropriate for the patient in the given time frames, but if the patient completes the instrument while waiting to be seen, this would save the physical therapist time from asking these questions. The goals frequently chosen by the patient with each regional diagnostic subgroup would certainly be appropriate goals for a patient being seen in physical therapy. To be even more objective, goals could be written to reflect a certain amount of change in data obtained with the OPTIMAL instrument in the time frame from baseline to follow-up.
|
|
|---|
|
|
|---|
![]() ![]() View larger version (233K): [in a new window] |
Appendix. Outpatient Physical Therapy Improvement in Movement Assessment Log (OPTIMAL) Instrument
|
|
|
|---|
This article has been cited by other articles:
![]() |
D. Deutscher, D. L Hart, R. Dickstein, S. D Horn, and M. Gutvirtz Implementing an Integrated Electronic Outcomes and Electronic Health Record Process to Create a Foundation for Clinical Practice Improvement Physical Therapy, February 1, 2008; 88(2): 270 - 285. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. L Craik Till We Meet Again Physical Therapy, July 1, 2007; 87(7): 830 - 832. [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||