TRACY L. WALLWORK, MPhil 1 JULIE A. HIDES, PhD 2 WARREN R. STANTON, PhD 3 Intrarater and Interrater Reliability of Assessment of Lumbar Multifidus Muscle Thickness Using Rehabilitative Ultrasound Imaging Ultrasound imaging has been used previously to characterize the dimensions of the multifidus muscle in healthy populations. 1,4,6,9,11 Measurements have most commonly been performed in the transverse plane, reporting cross-sectional area (CSA) and linear values for the muscle at various lumbar vertebral levels. Reliability of performing these measures has been previously demonstrated. 1,4,9,11 Between-side symmetry in size has also been demonstrated in healthy populations, 1,6,11 and, when linear measurements (anteroposterior distance [thickness] mediolateral distance [width]) are multiplied, the results correlated well with CSA at the L4 and L5 vertebral levels (range, r = 0.92-0.98). 1,6,11 Validity of CSA measurements has been established by comparing measurements of the lumbar multifidus muscle established using ultrasound imaging with those obtained by magnetic resonance imaging (MRI). 4 An alternative to imaging the multifidus muscle in transverse section is to adopt a parasagittal (longitudinal) ori- entation of the transducer. In this plane, the zygapophyseal joints, the overlying multifidus muscle bulk at 2 to 3 vertebral levels, and the thoracolumbar fascia can be visualized. 3,7,12 Apart from the advantage of being able to visualize more than 1 vertebral level at a time, this orientation STUDY DESIGN: Within-session intrarater and interrater reliability study. OBJECTIVE: To establish the intrarater and interrater reliability of thickness measurements of the multifidus muscle in a parasagittal plane, conducted by an experienced ultrasound operator and a novice assessor. BACKGROUND: There is considerable evidence for the important role of the multifidus muscle in segmental stabilization of the lumbar spine. The cross-sectional area of the multifidus muscle has been assessed in healthy subjects and patients with low back pain using real-time ultrasound imaging. However, few studies have measured the thickness of the multifidus muscle using a parasagittal view. METHODS AND MEASURES: The thickness of the multifidus muscle was measured at rest, using real-time ultrasound imaging, in 10 subjects without a history of low back pain, at the levels of the L2-3 and L4-5 zygapophyseal joints. The measure was carried out 3 times at each level by 2 assessors (1 experienced, 1 novice). Intrarater (model 3) and interrater (model 2) reliability was assessed by calculation of an F statistic (analysis of variance), the intraclass correlation coefficient (ICC), and the standard error of measurement (SEM). RESULTS: On the basis of an average of 3 trials, allows measurement of the thickness of the muscle, and muscle contraction can be observed more easily than in the transverse plane. Muscle contraction is seen on the ultrasound image as an increase in thickness of the muscle as it shortens along its length. The parasagittal view the 2 operators showed very high interrater agreement on the measurement of thicknesses at the L2-3 level (ICC 2,3 = 0.96; 95% CI: 0.84 to 0.99) and the L4-5 vertebral level (ICC 2,3 = 0.97; 95% CI: 0.87 to 0.99), with no systematic differences in muscle size across operators (P.05). Interrater reliability was relatively lower for the L2-3 level (ICC 2,1 = 0.85; 95% CI: 0.51 to 0.96) than the L4-5 level (ICC 2,1 = 0.87; 95% CI: 0.52 to 0.97) when a single trial per rater was used, but these values still indicated a high level of agreement. In addition, the novice and experienced operator produced reliable intrarater measurements at L2-3 (ICC 3,1 = 0.89; 95% CI: 0.72 to 0.97 and 0.94; 95% CI: 0.86 to 0.99) and at L4-5 (ICC 3,1 = 0.88; 95% CI: 0.68 to 0.97 and 0.95; 95% CI: 0.86 to 0.99), with no systematic differences in muscle size across trials (P.05). The consistently low SEM values also indicate low measurement error. CONCLUSION: A novice and an experienced assessor were both able to reliably perform this measure at rest for 2 vertebral levels using real-time ultrasound imaging. An average of 3 trials produced higher interrater reliability scores, though using a single trial per rater was also reliable. J Orthop Sports Phys Ther 2007;37(10):608-612. doi:10.2519/jospt.2007.2418 KEY WORDS: back muscles, lumbar spine, muscle assessment, repeatability, ultrasonography 1 Private practitioner, Perth, Western Australia. 2 Senior Lecturer, Division of Physiotherapy, School of Health and Rehabilitation Sciences, The University of Queensland, Brisbane, Australia; Clinical Director, UQ/Mater Back Stability Clinic, Mater Health Services, South Brisbane, Australia. 3 Psychologist and Biostatistician, UQ/ Mater Back Stability Clinic, Mater Health Services, South Brisbane, Australia. This study was approved by the Medical Research Ethics Committee at the University of Queensland, Australia and the Mater Adult Hospital Research Ethics Committee, Mater Health Services, South Brisbane, Australia. Address correspondence to J.A. Hides, Division of Physiotherapy, School of Health and Rehabilitation Sciences, The University of Queensland, Brisbane Queensland 4072, Australia. Email: j.hides@shrs.uq.edu.au 608 october 2007 volume 37 number 10 journal of orthopaedic & sports physical therapy
has already been successfully used to provide feedback of multifidus muscle recruitment during 2 randomized controlled trials. 2,5 Hides et al 2,5 showed that subjects with first-episode low back pain had difficulty recruiting the multifidus muscle and first described the use of ultrasound imaging of the multifidus in the parasagittal plane to provide direct feedback to the subjects. In a separate study using the parasagittal view as a form of visual feedback, a group of subjects without low back pain were able to demonstrate greater improvements in both the performance and retention of a voluntary isometric contraction of the multifidus muscle, when compared with a group who did not receive the ultrasound biofeedback training. 12 While it appears that imaging and measuring the multifidus muscle in parasagittal section may be useful in clinical practice, it is necessary to examine the validity of the measure and the reliability of assessors performing the thickness measure. With regard to validity, the relationship between multifidus muscle thickness change on ultrasound imaging and electromyographic (EMG) signal amplitude has been examined in healthy subjects. 7 The authors showed that muscle thickness change was associated with EMG signal amplitude at the L4 vertebral level (r =.79, P.001), for a range between 19% to 34% of maximal voluntary isometric contraction. Two studies have previously reported on the reliability of raters performing thickness measurements of the multifidus muscles in parasagittal section, 7,12 and a similar reliability study has been performed measuring the thickness of the lumbar erector spinae muscles. 13 Keisel et al 7 conducted on-screen measurements of 8 healthy subjects and reported intrarater reliability values using intraclass correlation coefficients (ICC 3,1 ) of 0.85. Images of the multifidus were also saved for later measurement by 2 raters, and high interrater reliability was also demonstrated (ICC 3,1 = 0.95) when measuring the same image from the display unit. Van et al 12 also reported high reliability for 2 raters who measured 6 healthy subjects (intrarater ICC 1,1 = 0.97 to 0.98, interrater ICC 2,3 = 0.98). For the lumbar erector spinae muscles, Watanabe et al 13 reported high intrarater (r = 0.938-0.962) and interrater (r = 0.9-0.962) reliability estimates. The aim of this investigation was to examine intrarater and interrater reliability of 2 raters (1 experienced and 1 novice tester) performing assessment of lumbar multifidus muscle thickness in parasagittal section in healthy subjects at rest, using real-time ultrasound imaging. The measurements were conducted at rest to allow establishment of standardized measurement protocols in relation to variables, such as location of anatomical landmarks, subject positioning, cursor location, and transducer pressure, without the added variable of muscle contraction. To avoid the confounding influences of pain and muscle inhibition, a healthy subject population was selected. METHODS FIGURE 1. The patient was positioned in prone lying with a pillow placed under the hips to minimize the lumbar lordosis. The multifidus muscle was imaged in parasagittal (longitudinal) section, allowing visualization of the zygapophyseal joints, muscle bulk, and thoracolumbar fascia. Subjects Ten healthy subjects (2 males, 8 females) participated in the study. Their mean SD age was 30.8 18.1 years, height was 165 6.1 cm, and body mass was 58.2 7.4 kg. The exclusion criteria for the study were pregnancy, a history of low back pain, spinal abnormality, spinal or abdominal surgery or severe trauma, neuromuscular or joint disease, and training involving the back muscles within the previous 3 months. This study was approved by the Medical Research Ethics Committee at The University of Queensland, Australia. Informed consent was obtained and the rights of human subjects were protected. Procedure Ultrasound imaging assessment of multifidus muscle thickness was conducted using a Diasonics Synergy ultrasound imaging apparatus equipped with a 5- MHz curvilinear transducer (GE Diasonics Ultrasound Inc, Santa Clara, CA). The ultrasound measurements were performed by 1 experienced and 1 novice assessor. The experienced operator had 17 years experience in assessing muscles using ultrasound imaging. Training of the novice rater included basic education in ultrasound imaging techniques (3 hours) and observation of the experienced operator with instruction in the measurement technique (3 hours). In addition, the novice rater and the experienced operator measured 5 subjects prior to commencing the study. Both raters were experienced physical therapists familiar with clinical assessment of the lumbar spine and location of bony landmarks in the region. For assessment by ultrasound imaging, the subject was positioned in prone lying, with a pillow placed under the abdomen to minimize the lumbar lordosis (FIGURE 1). Each rater manually palpated the lumbar vertebral levels and marked the location of the spinous processes on the skin with a pen. Raters were allowed to confirm their own skin markings using ultrasound imaging in parasagittal section (verified by identification of the sacrum and counting spinous processes on progression in a cephalad direction). All skin markings were removed with alcohol after the first rater conducted measurements, and the subject stood up and moved around. This ensured that each rater independently established anatom- journal of orthopaedic & sports physical therapy volume 37 number 10 october 2007 609
TABLE 1 Multifidus Muscle Thicknesses (Mean SD) of the Average Scores Across 3 Trials Operator Vertebral Level Experienced (cm) Novice (cm) L2-3 2.22 0.35 2.29 0.28 L4-5 2.86 0.26 2.82 0.25 Abbreviations: L2-3, level of the L2-3 zygapophyseal joint; L4-5, level of the L4-5 zygapophyseal joint. ical landmarks and subject positioning. Raters were not allowed to observe each other conducting measurements. The multifidus muscle was imaged in parasagittal (longitudinal) section, 1,4,3 allowing visualization of the zygapophyseal joints, multifidus muscle bulk, and thoracolumbar fascia (FIGURE 2). Examiners were careful to maintain the transducer in the plane of the zygapophyseal joints. With this technique, if the transducer is placed too far laterally, the appearance is similar but in fact it is the lumbar erector spinae muscles and the transverse processes that are imaged rather than the multifidus muscle and the zygapophyseal joints. The thickness of the multifidus muscle was measured at the levels of L2-3 and L4-5 zygapophyseal joints using on-screen calipers. Two vertebral levels were selected, as the morphology of the multifidus muscle varies according to vertebral level. 4 Linear measurements were conducted from the tip of the target zygapophyseal joint to the inside edge of the superior border of the multifidus muscle (FIGURE 2). The measure was carried out 3 times by each assessor for each of the 2 vertebral levels assessed. Reliability analyses were conducted using the SPSS statistical package Version 14 (SPSS Inc, Chicago, IL). This analysis produces an ICC that represents consistency in the rank order of scores and an F statistic, which represents systematic change in size of scores. Separate analyses were conducted for data at the L2-3 and L4-5 vertebral levels to assess consistency between the 2 raters for 1 trial (ICC 2,1, ) and for 3 trials (ICC 2,3 ), and across the 3 trials for each rater (ICC 3,1 ). In addition, consistency of measurement was examined across operators and trials by calculation of the standard error of the measurement (SEM). SEM is the difference between the actual measured score across trials and an estimated true score, and was calculated as SEM = pooled SD (1 - ICC) ½. RESULTS Descriptive data (mean SD) of the multifidus muscle thickness at rest for both raters at the 2 vertebral levels measured, averaged across the 3 trials, is shown in TABLE 1. The results of the interrater and intrarater reliability analyses are shown in TABLES 2 to 4. The novice rater and experienced rater demonstrated a high level of agreement between them for measurement of multifidus thickness from a single trial (TABLE 2) and from the average of 3 trials (TABLE 3). In addition, each rater was reliable across trials (TABLE 4). Nonsignificant F statistics indicate no systematic difference across rater and trial. SEM values suggest stability of measuring thickness of the multifidus muscle in a parasagittal plane. DISCUSSION FIGURE 2. Ultrasound image of the left multifidus muscle in parasagittal section. Thickness measurements of the multifidus muscle were conducted from the tip of the zygapophyseal joint to the inferior edge of the superior border of the multifidus (measurement for L4-5 level is shown). Abbreviations: L45, L4-5 zygapophyseal joint; L5S1, L5-S1 zygapophyseal joint; LM, lumbar multifidus; S, sacrum; TLF, thoracolumbar fascia. Results from this study showed that a novice and an experienced assessor were both able to reliably measure lumbar multifidus muscle thickness at 2 vertebral levels using real-time ultrasound imaging. Using an average of 3 trials per rater produced high interrater reliability scores (ICC 2,3 = 0.96 for L2-3 and ICC 2,3 = 0.97 for L4-5). Actual differences between mean values obtained by the 2 raters were very small in terms of the actual thickness of the muscle (0.07 cm for L2-3 and 0.04 cm for L4-610 october 2007 volume 37 number 10 journal of orthopaedic & sports physical therapy
TABLE 2 TABLE 3 5 [TABLE 1]). In addition, reliability of measurements for both vertebral levels was similar. The novice and experienced operators exhibited high intrarater reliability (ICC 3,1 = 0.89 and 0.94 for L2-3 and ICC 3,1 = 0.88 and 0.95 for L4-5) and high interrater reliability (ICC 2,1 = 0.85 for L2-3 and ICC 2,1 = 0.87 for L4-5) when a single trial per rater was used. The overall results are comparable with those of Watanabe et al 13 for similar thickness measurements performed on the lumbar erector spinae muscles. In their study, interoperator reliability ranged from 0.9 to 0.96 for measurements made in 3 different trunk positions, and intraobserver repeatability ranged from 0.94 to 0.96. 13 The results of the current study are also Interrater Reliability Based on the Average of 3 Measures per Rater Vertebral Level ICC 2,3 L2-3 0.96 (0.84-0.99) 0.06 2.47 (.15) L4-5 0.97 (0.87-0.99) 0.05 1.27 (.29) * P.05 indicates no systematic difference in muscle thickness values between raters. TABLE 4 Interrater Reliability Based on a Single Measure per Rater Vertebral Level ICC 2,1 L2-3 0.85 (0.51-0.96) 0.13 1.12 (.32) L4-5 0.87 (0.52-0.97) 0.10 0.84 (.39) * P.05 indicates no systematic difference in muscle thickness values between raters. Intrarater Reliability Based on 3 Repeated Measures Taken on the Same Day Vertebral Level/Tester ICC 3,1 L2-3 Novice 0.89 (0.72-0.97) 0.11 1.07 (.36) Experienced 0.94 (0.86-0.99) 0.09 0.14 (.87) L4-5 Novice 0.88 (0.68-0.97) 0.09 0.37 (.70) Experienced 0.95 (0.86-0.99) 0.06 2.05 (.16) * P.05 indicates no systematic difference in muscle thickness measurements across trials. comparable with the results obtained by Van et al 12 and Kiesel et al 7 for intrarater reliability (ICC 1,1 = 0.97-0.98 and ICC 3,1 = 0.85, respectively) and interrater reliability (ICC 2,3 = 0.98 and ICC 3,1 = 0.95, respectively). However, there are differences in the methodologies of these studies. Van et al 12 used a similar protocol to that of the current study, while Kiesel et al 7 saved ultrasound images of the multifidus for later measurement by 2 raters to establish interrater reliability of making measurements off the same image. In the current study and that of Van et al, 12 each subject was imaged 3 times (requiring accurate repositioning of the transducer and reproducible pressure on the transducer), and each rater performed his/her own images after positioning the subject and identifying the bony landmarks. A difference is that in the current study 2 vertebral levels were measured. In addition to the findings of high intrarater and interrater reliability for both the experienced and the novice assessors, the reported SEM values were low when compared to the resting thickness value of the muscle. This is important, as the most likely clinical application of the measurement is to assess changes in thickness due to muscle contraction. A low SEM value relative to the resting value would suggest that the ability to detect a real change (exceeding measurement error) would be likely. The smallest real difference (SRD) can be calculated to indicate the magnitude of change that would exceed the expected trial to trial variability. 8 Based on the SEM of 0.06 cm for the experienced operator at L4-5, the SRD can be calculated by the following formula: SEM 2 2.26 cm, where 2.26 represents the value of the t distribution for a 95% CI (df = 9). If this value is divided by the thickness of the muscle at the L4-5 vertebral level (2.86 cm), a change in thickness of 6.6 % would be required to be 95% confident that a real change has occurred. The study of Van et al 12 reported a mean increase of 11.6% in thickness of the multifidus due to voluntary isometric contractions in a group that received ultrasound biofeedback. Kiesel et al 7 measured multifidus thickness in subjects who performed 4 levels of graded resistance, and percentage increases in muscle thickness ranged from 32% to 48%. Furthermore, similar to the results reported by Springer et al, 10 who measured the thickness of the lateral abdominal muscles, the SEM was lower when an average of 3 measurements was used (TABLES 2 and 3). Although the SEM would need to be assessed in each experimental situation, these results would suggest that thickness measurements of multifidus may be clinically useful, with potential to detect changes that exceed measurement error. Before the measurement of multifidus journal of orthopaedic & sports physical therapy volume 37 number 10 october 2007 611
muscle thickness can be adopted in clinical and research situations, it is important for potential assessors to establish that they can perform the measurement reliably. This requires assessors to standardize measurement protocols. In this study, great care was taken with subject positioning and accurate location of anatomical landmarks. Two vertebral levels were selected for measurement as the morphology of the multifidus muscle is known to vary over vertebral levels, and increase significantly in size at each vertebral level on progression caudally. 4 This anatomical feature allowed for a further check of the reliability of the assessors, as accurate relocation of vertebral levels was therefore required. A further consideration in measurement was to define the precise location of the cursors. In this study, the inside border of the superficial fascia of the multifidus muscle was selected. Another important factor that can influence the measurement is transducer pressure. If the assessor uses too much pressure, the thickness of the muscle will be decreased. A limitation of this study is that the measurements were only conducted at rest. Future studies could examine the reliability of performing the thickness measurements on voluntary contraction and in different subject populations. Also, the subject number was small and the reliability of measurements across days was not tested. If a measurement is to be repeated to monitor recovery, it is important that the error related to measurements on different days is known, so that any changes attributed to signs of recovery are shown to be outside the margin of measurement error. CONCLUSION The thickness of the multifidus muscle was measured in parasagittal section by an experienced assessor and a novice assessor. When a standardized protocol was followed, high interrater reliability, as well as high intrarater reliability, for both raters was demonstrated. This thickness measure may be useful in physiotherapy practice and research, as other studies have shown that change in the thickness of the multifidus muscle occurs when the muscle contracts. Future studies could use a similar measurement protocol and investigate the reliability of performing the thickness measurement in the contracted state, in different subject populations, and across days. ACKNOWLEDGEMENTS The authors wish to thank the subjects studied, the staff at the UQ/Mater Back Stability Clinic, and its Director, Ms Linda Blackwell, for their assistance. REFERENCES 1. Hides JA, Cooper DH, Stokes MJ. Diagnostic ultrasound imaging for measurement of the lumbar multifidus muscle in normal young adults. Physiother Theory Pract. 1992;8(1):19-26. 2. Hides JA, Jull GA, Richardson CA. Long-term effects of specific stabilizing exercises for first-episode low back pain. Spine. 2001;26:E243-248. 3. Hides JA, Richardson CA, Jull GA, Davies S. Ultrasound imaging in rehabilitation. Aust J Physiother. 1995;41:187-193. 4. Hides JA, Richardson CA, Jull GA. Magnetic resonance imaging and ultrasonography of the lumbar multifidus muscle. Comparison of two different modalities. Spine. 1995;20:54-58. 5. Hides JA, Richardson CA, Jull GA. Multifidus muscle recovery is not automatic after resolution of acute, first-episode low back pain. Spine. 1996;21:2763-2769. 6. Hides JA, Stokes MJ, Saide M, Jull GA, Cooper DH. Evidence of lumbar multifidus muscle wasting ipsilateral to symptoms in patients with acute/subacute low back pain. Spine. 1994;19:165-172. 7. Kiesel KB, Uhl TL, Underwood FB, Rodd DW, Nitz AJ. Measurement of lumbar multifidus muscle contraction with rehabilitative ultrasound imaging. Man Ther. 2007;12:161-166. 8. Ota S, Ward SR, Chen YJ, Tsai YJ, Powers CM. Concurrent criterion-related validity and reliability of a clinical device used to assess lateral patellar displacement. J Orthop Sports Phys Ther. 2006;36:645-652. 9. Pressler JF, Heiss DG, Buford JA, Chidley JV. Between-day repeatability and symmetry of multifidus cross-sectional area measured using ultrasound imaging. J Orthop Sports Phys Ther. 2006;36:10-18. 10. Springer BA, Mielcarek BJ, Nesfield TK, Teyhen DS. Relationships among lateral abdominal muscles, gender, body mass index, and hand dominance. J Orthop Sports Phys Ther. 2006;36:289-297. 11. Stokes M, Rankin G, Newham DJ. Ultrasound imaging of lumbar multifidus muscle: normal reference ranges for measurements and practical guidance on the technique. Man Ther. 2005;10:116-126. 12. Van K, Hides JA, Richardson CA. The use of real-time ultrasound imaging for biofeedback of lumbar multifidus muscle contraction in healthy subjects. J Orthop Sports Phys Ther. 2006;36:920-925. 13. Watanabe K, Miyamoto K, Masuda T, Shimizu K. Use of ultrasonography to evaluate thickness of the erector spinae muscle in maximum flexion and extension of the lumbar spine. Spine. 2004;29:1472-1477. @ MORE INFORMATION WWW.JOSPT.ORG 612 october 2007 volume 37 number 10 journal of orthopaedic & sports physical therapy