Comparing Performance Category Criteria for U.S. Navy Alternate Physical Readiness Tests

The current Navy Physical Readiness Test (PRT) allows for several alternative methods, in lieu of the 1.5-mile run, with which to assess aerobic fitness. Two of these methods (elliptical trainer and stationary bike) and two additional devices (2-Km rower and 5-Km bike) were evaluated to determine if maximal effort on all devices produced the same performance category as the 1.5-mile run. One hundred thirty-two active-duty military and midshipmen were recruited from the United States Naval Academy and Naval Support Activity Annapolis. Subjects participated in six testing sessions over a six-week period. Subjects performed a 1.5-mile run (n = 118), 12-minute elliptical trainer test (n = 108), 12-minute stationary bike test (n = 115), 2-Km rower test (n = 115), and 5-Km bike test (n = 114). Each performance category attained from the alternate aerobic test device was compared to the performance category attained for the 1.5-mile run. None of the aerobic testing devices performance categories matched well to the 1.5-mile run. The results of the testing sessions support the mandated use of the 1.5-mile run as the sole method of assessing aerobic fitness. Additionally, if an alternate method must be used, the 12-minute stationary bike test using the revised Naval Health Research Center (NHRC) performance categories might be the best option because it has the largest number of performance matching categories. Finally, the results do not support the use of the 12-minute elliptical test, as an authorized alternative, due to its minimal number of matching categories.


Introduction
The Department of Defense (DoD) mandates that each branch of military service performs a semi-annual Physical Fitness Assessment (PFA) on its personnel (DoD, 2004). The information obtained from the PFA provides service members with information regarding their physical fitness level while ensuring personnel possess the minimum requirements necessary to support the mission (Department of the Navy [DON], 2011). Per the DoD instruction, each branch of service is required to assess body composition, muscular strength, muscular endurance, and aerobic fitness (DoD, 2004). However, the DoD allows each service to determine the tests used to assess each of these components of fitness (DoD, 2004).
Due to the large number of personnel in each branch, all four services have opted to implement distance runs to assess aerobic fitness. Although the gold standard of measuring aerobic fitness is the VO2max test, it is not feasible for testing large populations. In addition, multiple research studies have determined distance runs to be good indicators of aerobic fitness and can easily be used to test large populations (Buono, 1988;George, Vehrs, Allsen, Fellingham, & Fisher, 1992;Latour, Peteson, Rittenhouse, & Riner, 2017;Weiglein, Herrick, Kirk, & Kirk, 2011). The Navy and Air Force employ the 1.5-mile run and the Army and Marine Corps employ the 2.0-mile run and 3.0-mile run, respectively.
All four services offer alternate aerobic tests in lieu of their service-specific distance run. For example, the Navy offers a 450meter swim, 500-yard swim, 12-minute elliptical trainer, and 12minute stationary bike (DON, 2011;Naval Administrative Message [NAVADMIN] 293/06). The Army authorizes an 800-yd swim test, 6.2-mile bicycle ride, and 2.5-mile walk test (Department of the Army, 1998). The Air Force offers a 2.0-mile walk test (Department of the Air Force, 2013). The Marine Corps has DOI: 10.29011/ JSIMD-115. 100015 Citation: Latour A, Peterson D, Rittenhouse M, Riner D (2017)  recently added a 5-Km rower test as an alternative Testing method (Marines.mil, 2016). Although each service offers an alternate aerobic test, only the Navy allows service members to participate in alternate tests without a medical waiver (DON, 2011). For the other services, service members must be medically approved in order to participate in an alternate aerobic test.
Recent research has shown that some of the Navy's alternate aerobic tests are not equivalent to the 1.5-mile run and do not correlate well to VO2max (Latour et al., 2017). In fact, none of the Navy's alternate aerobic tests actually correlate well to the 1.5mile run (Latour et al., 2017). Moreover, current findings indicate that the elliptical trainer has a poor correlation to VO2max (r=.20) and to the 1.5-mile run (r=-.34), calling into question their use as valid tests for assessing a service member's level of aerobic fitness (Latour et al., 2017).
Accurately reporting Physical Readiness Test (PRT) event scores are paramount as results can directly impact a sailor's retention and promotion (NAVADMIN 061/16, 2016;NAVADMIN 178/15, 2015). Therefore, in order to ensure minimum physical fitness levels required to support mission requirements as well as prevent inaccurate reporting of service member PRT performance, it is imperative that services implement and employ the most accurate methods of assessing aerobic fitness.

Background
Prior to 2006, the only aerobic tests employed by the Navy were the 1.5-mile run, 450-meter swim, and 500-yard swim. However, Navy leadership expressed the desire to employ additional alternate aerobic tests for those service members who could not or preferred not to run or swim. As a result, the Naval Health Research Center (NHRC) conducted research on the elliptical trainer and stationary bike as additional alternate aerobic test options. At the time, performance standards for the elliptical trainer and stationary bike did not exist thereby requiring NHRC to develop regression equations that could predict 1.5-mile run times based on calorie expenditure (Hodgdon, Hervig, Griswold, Terry, Le, Sausen, & Miller, 2006;Parker, Griswold, & Vickers, 2006). These regression equations have been incorporated in the Physical Readiness Information Management System (PRIMS) and are used to determine a predicted 1.5-mile run time and performance category determination. However, further research has shown that these regression equations can over or under predict actual 1.5-mile run times by as much as two minutes or more (Peterson, 2015a;Peterson, 2015b;Schilling, 2015).
Despite these concerns, it is unlikely that Navy leadership will eliminate all of the different alternate aerobic fitness tests currently employed. In fact, much of the current PRT has been reviewed previously without any substantial changes to the program (Myers, 2015a;Myers, 2015b;Whitehead, Schilling, Peterson, & Weiss, 2012). For example, the stationary bike has become increasingly popular over the last several years (Table 1). In addition, the other alternate cardio-testing methods have decreased in usage over the past several PFA cycles (Table 1). For example, current elliptical trainer participation has decreased to just over four percent total usage. Similarly, current swim usage (includes both the 450-m and 500-yd swim tests) equates to just over two percent total usage. The low usage of these tests, coupled with their increased administrative burden, may not justify their continued implementation and employment. Even so, since the Navy will likely continue employing alternate aerobic fitness tests, it is imperative that the performance category used to score the aerobic fitness portion of the PRT is equitable regardless of the aerobic fitness modality used for testing. The purpose of this study was to identify the performance categories attained on the 1.5-mile run and compare them to the performance categories attained by the other alternate devices to see if they were equitable. This study compared the performance categories attained for each participant on 1.5-mile run, 12-minute elliptical test (using two sets of performance standards), 12-minute stationary bike (using two sets of performance standards), 5-Km bike, and 2-Km rower. If each aerobic test accurately measures aerobic fitness, then the performance category attained from each test should be identical. If the resulting performance categories attained for the 1.5mile run is not the same as those attained from the alternative tests, then the alternative tests may not be accurately or equivocally reporting aerobic fitness levels. In order to ensure PRT standardization and fairness, it is imperative that all aerobic tests produce the same performance category results.

Methods
This report is a follow-on study to a previous study by Latour et al. (2017). The complete methods used and description of each test protocol are provided in the previous study.

Subjects
Active duty military (n = 38) and Naval Academy midshipmen (n = 94) were recruited to participate in the testing (n =132). Participant descriptive information is provided in Table 2.   All subjects completed the Navy's standardized PRT warmup prior to each test and were encouraged to participate in six maximal-effort aerobic tests. These tests included the 12-minute stationary bike, 12-minute elliptical trainer, 5-Km bike, 2-Km rower, 1.5-mile run, and a VO 2 max test.
For the 12-minute stationary bike and elliptical trainer tests, only PRT approved devices were used. Each test was 12 minutes in length with an additional two-minute cool down period. Calories burned at the end of 12 minutes was recorded and used for calculating each individual performance category. The 5-Km bike test was performed on a Monark Ergomedic 874E cycle ergometer. Resistance was added based on the participant's body weight. The participant was instructed to cycle as fast as possible until he/she reached a distance of 5-Km. The time was recorded. The 2-Km rower test was performed on a Concept 2 rower. Each participant was instructed to row as fast as possible until 2,000 meters was attained. The time was recorded. The 1.5-mile run was conducted on a 200-meter indoor track. The participant was instructed to run as fast as possible until 1.5 miles was reached. The time was recorded. Each participant performed a VO2max test on a ParvoMedics TrueOne 2400. The participant was asked to run on the treadmill as long as possible and to provide a maximal effort.
Body weight, height, and age were measured and recorded before the first test. To limit the chance of fatigue and ensure recovery, subjects were required to wait 48 hours between each test; however, all tests needed to be completed in a six-week period to reduce the effect of increased or decreased aerobic performance. All subjects were asked to provide a maximal effort for each test.
For the 12-minute stationary bike, two scoring methods were used. The current PRIMS equation was used to determine a 1.5-mile run time (min:sec). This equation predicted a 1.5-mile run time based on the calories burned (indicated on the device) during the 12-minute stationary bike test. A new NHRC developed scoring system was also used to calculate a performance score based on Calories (Cal) and Body Weight (BW) in pounds (Cal/BW) (See Tables). For the PRIMS equation, the calories burned were entered into the official run time calculator located on the PRIMS website. This same method was used for determining predicted 1.5-mile run times and scores for the elliptical trainer.

Category Differences
Each performance category obtained (i.e., maximum, outstanding, excellent, good, satisfactory, failure) from an individual's 1.5-mile run was compared with the performance categories he or she earned on the alternative aerobic tests. Tables 1, 2, 3, and 4 depict the performance categories, associated points, and run times or criterion measure.
When aerobic tests are completed, the performance categories can be compared to each other. If the performance category is not the same for both aerobic tests, (i.e. 1.5-mile run and elliptical trainer), then there is a performance category difference. Because there are six categories, performance category differences can be a difference of one, two, three, four, or five.
For example, if a 17-year-old male acquired a nine-minute and 45-second 1.5-mile run time, his performance category would be "Excellent" (Table 1). If the same individual performed the elliptical trainer alternate aerobic test and PRIMS predicted his run time at nine minutes and 50 seconds (based on calories burned during the test and the PRIMS regression equation), then he would have acquired a performance category of "Good" (Table 1). This performance data would result in a one-category difference. If the same individual were to perform the stationary bike alternate aerobic test and PRIMS predicted his run time as 12:40 and a performance category of "Failure," then this performance would be a three-category difference (Table 1). Ideally, there should be no difference in categories between an approved device and the 1.5mile run if a max effort is performed on each test and performance category scoring methods are equivalent.

Results
For each device tested, the performance category attained by each individual was compared to his/her respective perfor- mance category attained during the 1.5-mile run. For the stationary bike and elliptical trainer, two sets of performance norms were evaluated. The first set (PRIMS bike) uses a prediction equation that converts the number of calories burned in 12 minutes into a predicted 1.5-mile run time. This calculation was completed using the official Navy PRT calculator located on the secure PRIMS website. Once converted, the performance norms for the 1.5-mile run were used to assign a respective performance category (See Tables). The other set (NHRC bike) uses standalone performance norms based on calories burned and body weight. For the 5-Km bike, performance categories were acquired from Buono's (1988) research (Tables 3 and 4). The performance standards developed by Buono (1988) included only the following performance categories: Outstanding, Excellent, Good, and Satisfactory. Based on the regression equation in Buono's (1988) paper, the researchers expanded the table to include the Navy's additional performance categories (i.e., Maximum and Failure). The rower performance category standards are based on the work performed by Peterson (2015a).   (Buono, 1988).   (Buono, 1988).

Category Prediction
None of the alternate tests matched the 1.5-mile run category by 40% or more (Figure 1). When an alternate test does not match the run category, the tests can either over-or under-predict actual 1.5-mile run times. When an alternate test under-predicts, it estimates run times as slower than actual 1.5-mile run times and individuals receive a poorer performance category than if they had performed the 1.5-mile run. When an alternate test over predicts, it estimates run times as faster than they actually are and individuals attain a better performance category than they would have earned on the 1.5-mile run. Volume 2017; Issue 04

Stationary Bike and Elliptical Trainer
The results indicate that the NHRC bike performance categories matched best with the 1.5-mile performance categories (39%). The PRIMS bike matched 15%. The PRIMS elliptical and NHRC elliptical had the least number of matching categories with 14% and 12% matching categories, respectively.
Overall, the PRIMS bike and PRIMS elliptical scores primarily under-predicted (80% and 83% respectively). The NHRC elliptical trainer primarily under-predicted (88%). In contrast, the NHRC bike was more balanced and under-predicted by 28% while over-predicting by 33%.

Run Category Differences
Category differences were reviewed next in order to determine which devices produced results that were closest to the performance categories for the 1.5-mile run. However, because the PRIMS bike and PRIMS elliptical tests are estimating 1.5-mile run times (instead of directly measuring them), a level of error can be expected. Therefore, it is useful to evaluate the alternate tests not only based on how many categories match but also based on the magnitude of the matching errors. As described previously, category differences can range from one-category difference to fivecategory differences. A device with no category differences and/ or one-category difference, may be acceptable for testing purposes when compared to the 1.5-mile run. If a device has a majority of two-or three-category differences and four-or five-category differences, it may not be acceptable for testing purposes (Figure 2)

Stationary Bike and Elliptical Trainer
Although the PRIMS bike contained the least amount of matching categories (15%), it also had the most performance categories that were one category difference (49%) (Figure 2) Combined, it had 64% matching and one-category differences. The PRIMS bike also contained two-category differences (31%) and three-category differences (5%). The NHRC bike produced the most amount of matching and one-category differences combined (83%), making it the alternate device that produced the least amount of difference between the 1.5-mile run scoring categories. Because of this, it may be the most acceptable device and scoring method to use because it matches the 1.5-mile run best. It also produced 14% two-category differences and 3% three-category differences. This was the least amount of two-and three-category differences from all of the devices tested. Also, the NHRC bike had no four-or five-category differences. Overall, based on the category differences alone, the NHRC bike performed the best.
Both the PRIMS elliptical and the NHRC elliptical performed worst in regard to matching categories. Both of the elliptical scoring methods contained the most three-and four-category differences. The PRIMS elliptical contained 41% two-and threecategory differences while the NHRC elliptical contained 43%. Both elliptical scoring methods also contained four-and five-category differences which produced poor matching results.

5-Km Bike and 2-Km Rower
The only other device that did not have four-or five-category differences was the 5-Km bike. Overall, the 5-Km bike performed reasonably well, with 70% matching and one-category differences combined (Figure 2). It also did not contain any four-or five-category differences. The 2-Km rower contained 62% matching and one-category differences combined. The remainders of its differences were two, three-, four-, and five-category differences. The Volume 2017; Issue 04 2-Km rower contained the highest four-and five-category differences, which compared to the other devices, it matched poorly to the 1.5-mile run. Table 5 shows that the NHRC bike (m = 4.54) and 2-Km rower (m = 4.21) had closest mean category scores to that of the 1.5 mile run (m = 4.48) while the elliptical scoring methods (PRIMS, m = 3.02; NHRC, m = 2.85) had the highest mean category differences from the run.  A Friedman's test was conducted and the results indicate that there was a significant difference in the categories participants achieved using the different exercise devices (χ2 (6) = 306.248, p = 0.000). Post hoc analyses were conducted to compare each device with the 1.5 mile run using a Wilcoxon signed-rank tests with a Bonferroni family-wise error correction which set the significance level at p < .008. Significant category differences were found between the 1.5 mile run and the PRIMS bike (Z = -7.800, p = 0.000), PRIMS elliptical (Z = -7.932, p = 0.000), NHRC elliptical (Z = -8.422, p = 0.000), and 5-Km bike (Z = -7.210, p = 0.000). No significant category differences were found between the 1.5-mile run and the NHRC bike (Z = -0.614, p = 0.539) or the 2-Km rower (Z = -0.907, p = 0.364). These results indicate that the NHRC bike and the 2-Km rower are the best devices, among those tested, to minimize the category difference between an alternate device and the 1.5-mile run.

Comparisons of the NHRC Bike and 2-Km Rower
Because the NHRC bike and the 2-Km rower were the only alternate devices that did not have significant category differences from the 1.5-mile run, they were compared to determine if there was any difference in the categories the two devices produced. A Wilcox on signed ranks test failed to find a significant difference in the mean categories achieved on the NHRC bike and Rower, (Z = -1.716, p = 0.086). Although no difference was found in the categories these devices produced, it is possible that one device has more error in predicting the 1.5-mile run category than the other. To explore this, difference scores were computed for each participant for these devices based on how many category differences there were between their device and 1.5-mile run categories. A Wilcox on signed ranks test found that the 2-Km Rower (m = 1.37) had a significantly higher difference score than the NHRC bike (m = 0.82) (Z = -3.958, p = 0.000) (

Current Navy Alternate Tests
Our results show that the PRIMS bike and PRIMS elliptical primarily under-predicted performance categories when compared to the 1.5-mile run. As a result, the run times computed from the regression equation in PRIMS were much slower than the actual 1.5-mile run times achieved in this study. Participants in this study would have attained a better performance category performing the 1.5-mile run test rather than testing on the stationary bike or elliptical trainer. These findings suggest that the current stationary bike and elliptical trainer performance categories are not equal to the 1.5-mile run. In order to ensure PRT standardization, the alternate aerobic tests should produce the same outcome (performance category) as the 1.5-mile run. If the alternate cardio tests do not produce the same performance category as the 1.5-mile run, then they should not be used interchangeably.
NHRC has developed an alternate scoring method based on Calories burned and Body Weight (Cal/BW). It was proposed that this revised method would: 1) be easier to compute, 2) not require the use of run prediction equations, and 3) use standalone norms (instead of a prediction equation) similar to the other PRT modalities. This change would significantly reduce the administrative burden on the respective command as well as the Physical Readiness Program Office. Our data showed that the proposed NHRC bike performance categories improved the matching categories to 39%. It produced the highest percentage of matching and one-category differences of all the devices tested. Our statistical analysis also demonstrated that the NHRC bike categories did not differ significantly from the categories achieved in the 1.5-mile run and had significantly lower category difference scores than the 2-Km rower (the only other device that had categories that did not differ from the run). However, in terms of the elliptical trainer, the NHRC elliptical actually produced fewer matching categories than did the PRIMS elliptical. Volume 2017; Issue 04

Additional Alternate Aerobic Tests
The 2-Km rower and 5-Km bike were also reviewed in this study. In a previous study, the 2-Km rower was recommended as an alternate aerobic fitness test for the Navy because of its relatively high correlation to VO2max and the 1.5-mile run, thereby performing just as well as the stationary bike. Additionally, the 2-Km rower is readily available in most Navy facilities, and its cost is the lowest of all devices tested (Latour et al., 2017). Even though the 5-Km bike has a moderate correlation to VO2max, it was not recommended for use as an alternate aerobic test because of its higher cost and more complicated testing procedures (Latour et al., 2017).
Using the performance categories created by Peterson (2015a), the 2-Km rower had a similar number of matching and one-category differences as the 5-Km bike and PRIMS bike, and because it was more balanced on under-and over-prediction, it was the only device besides the NHRC bike not to differ significantly from the 1.5-mile run categories. Further analyses showed that the 2-Km rower did have a higher average category difference than the NHRC bike. The 5-Km bike tested better than most of the other devices and maintained the second most matching categories. However, it primarily over-predicted performance categories, which means the performance category attained during the 5-Km bike produced better individual results than would have been achieved on the 1.5-mile run. Both the 2-Km rower and the 5-Km bike still have the potential to be adequate aerobic tests although our results indicate that the NHRC bike is the better option.

Limitations
The primary limitation that existed for this study was the lack of failures due to the testing population used. This was likely due to the different (stricter) PRT standards employed by USNA as compared to the Navy. As a result, none of the participants failed the 1.5-mile run. A few participants did fail some of the alternate aerobic devices, but those failures are believed to be a result of their unfamiliarity with the equipment. Although the participants were provided the opportunity to familiarize themselves with the equipment prior to testing, many failed to do so.

Conclusion
The authors conclude that the 1.5-mile run is the most accurate option for testing the aerobic fitness of Navy personnel. As a result, the 1.5-mile run should be the Navy's standard method of assessment and it should be mandatory for all service members to participate unless they are on a medical waiver. If the Navy desires the continued implementation of alternate aerobic capacity tests, this research recommends the use of the NHRC bike. Overall, the NHRC bike appears to be the best alternate scoring method because it had the most matching categories and one-category differences when compared to the 1.5-mile run. The 2-Km rower is another possible option for alternate aerobic testing, but future research is needed in order to develop revised performance categories. Although the 5-Km bike performed well, it is not recommended due to its cost and complexity of testing procedures (Latour et al., 2017). Finally, neither the PRIMS elliptical nor NHRC elliptical are recommended as an alternate aerobic test due to their poor correlation to VO2max and limited number of matching performance categories (Latour et al., 2017).