Keywords

1 Introduction

The purpose of this research is to compare the HTC Vive and Oculus Rift Virtual Reality (VR) systems from a human factors perspective. The research focuses on assessing subjective and objective usability differences between the two Head-Mounted Displays (HMDs), and their respective controllers, through teleportation scenarios within a Virtual Environment (VE). Specifically, this research aims to investigate the usability of each device through statistical analysis of a usability survey and performance data.

1.1 Virtual Reality, Augmented Reality, and Mixed Reality

The Reality-Virtuality (RV) Continuum [1] serves as a guide in referencing Mixed Reality (MR), Augmented Reality (AR), Augmented Virtuality (AV), and all respective positions which fall between the Real Environment (RE) and the complete VE (see Fig. 1). In the RV Continuum, MR is defined as an environment in which aspects of real and virtual worlds are combined within a single display. AR falls to the left of the RV Continuum, with the RE composing the majority of the environment. AV, although similar to AR, falls more to the right of the RV Continuum, as the VE is supplemented with real world objects, such as people or things. The AV/AR distinction stems from the proportions of real and synthetic objects in the environment. VR, defined as being immersed within a VE that is entirely synthetic, is the focus of this paper. It can be considered colloquial to the VE, and is thus outside the range of MR due to its complete visual virtuality. The VE may mimic components of the real world, such as the constraints of gravity and time, or be entirely fabricated with no governing physical or temporal laws at play.

Fig. 1.
figure 1

Depiction of Reality-Virtuality (RV) Continuum (Milgram, P., Takemura, H., Utsumi, A., & Kishino, F. (1994). Augmented reality: a class of displays on the reality-virtuality continuum. SPIE, 2351, 282–292.)

1.2 Societal Applications

The use of VR has branched out into the world of entertainment, as well as more serious circles involving training and rehabilitation [2, 3]. A VR-based training system grants subjects the ability to navigate critical, sometimes dangerous situations without the real-world risk associated with the task: for example, VR-based training was reported to be effective at portraying real job hazards in a safe and controlled way for South African miners [4]. A study done by Bouchard et al. [5] concluded that VR may also be used as a preventative measure for Posttraumatic Stress Disorder (PTSD) in military occupations, as it allows individuals to better train and prepare for stressful circumstances in the real world. The controlled nature of VR lends itself to rehabilitation undertakings, in how highly realistic VEs are able to evoke therapeutic levels of anxiety, while offering complete control over an anxiety-inducing stimulus [3, 6]. Furthermore, VR has been bolstered as an economic choice in comparison to traditional training and rehabilitation methods; a company may invest less than $2500 USD and use VR technology with any number of people for years to come [6].

1.3 Related Work

VR technology has created interest in assessing different areas of performance with HMD systems. In a prior usability study, an HMD system was shown to underperform when compared to a desktop, mostly due to a lack of familiarity; there was no further data comparing the HMD system to another HMD system [7]. An additional study examined various locomotion methods, but not the same method on two different devices: the study utilized a joystick, physical walking, and a point-and-teleport method [8]. These studies illustrate the overall lack of research comparing HMD-to-HMD system usability.

HMD system comparison studies are focused on assessing technological functions, as opposed to usability. Although a study found the Vive performed slightly better than the Oculus in terms of quality of experience, this research used a small participant group in a sorting task [9]. Researchers attributed the Vive’s success to its strong sensor system. Another study cited the Vive’s tracking system as a positive attribute over the Oculus; yet both devices presented low visual jitter, and minimal tracked head movements [10]. In terms of visual performance, the Oculus was shown to outperform the Vive in lower distance compression, where HMD weight, field of view, and lens type were thought to be contributing factors [11].

1.4 Usability

Usability is defined as a system’s level of “effectiveness, efficiency and satisfaction” [12]. In this study, the subjective usability subscales assessed by the participants were ease of use, comfort, visual quality, and effectiveness. For the purpose of this study, the following operational definitions are given:

  • Ease of use typically refers to “the degree to which a person believes that using a particular system would be free of effort” [13].

  • Comfort relates to physical comfort while wearing the HMD, such as how the HMD fits on the head or if it causes visual stress.

  • Visual quality relates to how clearly the image is presented through the HMD.

  • Effectiveness relates to whether or not the participant is able to achieve the experimental goal [14].

Objective usability data includes numerical performance acquired from the virtual scenarios. Within the simulation, several different forms of data were logged. Overall, performance data focused on how well an experimental task was completed, in terms of completion, time-to-completion, and efficiency (i.e., an equation resulting in a mix of completion and time-to-completion). The specific performance measurements will be detailed in the method section: per scenario effectiveness, time duration, total time duration, completion rate, and total time-based efficiency.

1.5 Research Questions

Below are the Research Questions (RQs) of interest, based on outcomes from the three virtual scenarios (i.e., an object collection game controlled via teleportation locomotion). There are two main research avenues: determining if differences exist between the Rift and Vive, in terms of usability; and determining if measures of objective usability (i.e., performance data) relate to measures of subjective usability (i.e., survey data).

  • RQ1: Is there a statistically significant difference in objective usability (i.e., per scenario effectiveness, per scenario time duration, total time duration, completion rate, and total time-based efficiency) between the Vive and Rift?

  • RQ2: Is there a statistically significant difference in subjective usability survey subscales (i.e., ease of use, comfort, visual quality, and effectiveness) between the Vive and Rift?

  • RQ3: Is there a correlational relationship between the subjective usability survey subscales and objective usability for the Vive and Rift?

2 Method

2.1 Participants

Participants were recruited from the University of Central Florida (UCF). To be considered as a participant in the study, the individual had to be a U.S. citizen, be at least 18 years-of-age, have had normal or corrected-to-normal vision, have had no previous history of seizures, and not be colorblind. For this study, there were 40 participants with an age range of 18-to-30; 14 participants were males (M = 20.93; SD = 3.36) and 26 were females (M = 21.38; SD = 1.55). After study completion, the participant was monetarily compensated up to $10 USD, for his or her time and travel.

2.2 Experimental Design

A between-subjects design, with one independent variable, was used to measure user-differences in the teleportation tasks. The independent variable was the type of HMD system, with two conditions (i.e., Vive and Rift). The dependent variable was the objective and subjective usability data of the teleportation task scenarios.

Teleportation Task Battery.

All three virtual scenarios required the participant to use a handheld controller to navigate the virtual environments: either the Vive controller or the Oculus Touch controller was used exclusively for all scenarios, with its respective headset. Although the teleportation locomotion was the same per each system, the buttons for teleportation differed between controllers. In the Vive, one held down a trackpad to initiate the laser, tilted or laterally moved the controller to indicate direction of the laser, and let go of the trackpad to jump to the laser’s projected location. The Rift’s Oculus Touch controller incorporated a joystick instead of a trackpad, though kept the same control scheme.

In each scenario, a usability game was given: the participant’s goal was to collect a set number of objects, within a five-minute time limit. By teleporting into the desired object, it became collected. The user viewed the VE from a first-person perspective. In the first and second scenarios, a new collectable object would only appear after the current collectable was retrieved. The first scenario involved a flat-floor room filled with red-tile floors; thirty blue tiles could light up to be collected. The second scenario involved a lightly populated forest with hills; thirty blue circles could be shown as collectables. In the third scenario, which involved a small village, the collectables of twenty bright-blue spheres were presented simultaneously (see Fig. 2). As illustrated, the scenarios increased in complexity.

Fig. 2.
figure 2

The virtual environment of the third scenario, seen from the experimenter’s monitor. The user’s goal was to collect all bright blue spheres within five minutes.

2.3 Testbed

The testbed used to run the experiment and collect data for this task was one desktop computer (see Table 1 for the desktop specifications). The task scenarios were developed in the Unity game engine. Unity was selected for its high-quality graphics, user-friendly graphical interface, and capability to support different software development kits for the Vive and Rift.

Table 1. Desktop specifications.

2.4 Data Logging

As the participant completed each scenario, objective usability data was tracked and logged into an Excel file. Each individual scenario recorded two measurements: per scenario effectiveness and time duration. Per scenario effectiveness was calculated by dividing the number of objects collected by the total possible objects one could collect per a scenario. Per scenario time duration, expressed in minutes, was measured by the time it took the participant to complete the scenario, starting from the beginning of the scenario and ending either after collection of all objects, or until the maximum allotted time of five minutes passed, per each scenario.

After all scenarios were completed, three measurements were tracked and logged into the Excel file: total time duration, completion rate, and total time-based efficiency. Total time duration, expressed in seconds, was measured by combining all scenario time durations. The completion rate was found by dividing the number of successful scenarios by the total three scenarios and multiplying the quotient by 100% [14]:

$$ Completion\,rate = \frac{\text{Number of scenarios completed successfully}}{\text{Total number of scenarios undertaken}}\,\times\,100\% $$
(1)

A scenario was considered complete if the participant was able to collect all objects before the allotted time ran out. Total time-based efficiency, expressed in objects collected per second, incorporated per scenario time duration and completion rate:

$$ Time - based\,\,efficiency = \frac{{\sum\limits_{j = 1}^{R} {\sum\limits_{i = 1}^{N} {\frac{{n_{ij} }}{{t_{ij} }}} } }}{\text{NR}} $$
(2)

For the purpose of this study, N represents the three scenarios, R represents the number of participants (which will always be one), nij is the result of object collection being successful or not (i.e., completion rate), and tij represents time spent by the user to complete the scenario (i.e., per scenario time duration) [14].

Further, participants reported their responses to a subjective usability survey at the end of the final scenario. The usability survey was developed in-house and assessed the system’s utility. The survey comprised 14 statements, rated from strongly disagree (1) to strongly agree (5). There were also three additional open-form comment sections, regarding positive, negative, and freeform thoughts about each participant’s given device.

2.5 Procedure

Prior to experimentation, each participant was randomly assigned to a condition (i.e., either the Vive or Rift). The participants’ actions reflect the experimental procedure:

  1. 1.

    Signed informed consent

  2. 2.

    Completed color blindness test

  3. 3.

    Completed demographics questionnaire

  4. 4.

    Read interface PowerPoint training

  5. 5.

    Presented with scenario instructions

  6. 6.

    Completed the first scenario

  7. 7.

    Received 1-minute mandatory break

  8. 8.

    Presented with scenario instructions

  9. 9.

    Completed the second scenario

  10. 10.

    Received 1-minute mandatory break

  11. 11.

    Presented with scenario instructions

  12. 12.

    Completed the third scenario

  13. 13.

    Received 1-minute mandatory break

  14. 14.

    Completed usability survey

  15. 15.

    Completed receipt details

  16. 16.

    Received dismissal from the study

3 Results

3.1 Preliminary Data Analysis

Tests for normality, homogeneity of variance, and outliers were conducted for the usability data points to determine the data distribution. Specific results from the Kolmogorov-Smirnov test for normality indicated a violation of assumption for a normal distribution. Homogeneity of variance indicated no major discrepancies in the data for removal. However, a selection of non-parametric tests were used for data analysis.

The usability survey was tested for reliability using Cronbach’s alpha. The Cronbach’s alpha reported for the usability survey was .80, which is considered preferable [15]. As a result, the survey was included for data analysis. In total, there were 17 participants in the Vive condition, and 23 participants in the Rift condition.

3.2 Inferential Statistics

In terms of RQ1, a statistically significant difference was found between the Vive (Md = 95, n = 17) and Oculus (Md = 90, n = 23) within scenario 3 for per scenario effectiveness. The Mann-Whitney U Test reported U = 127.5, z = −1.194, p = .056, r = −0.302. There were no other reported statistically significant differences between the Vive and Oculus for usability performance.

For RQ2, there was a statistically significant difference between the Vive and Oculus Rift HMD systems on the usability survey. A Mann-Whitney U Test showed there was a significant difference between Oculus (Md = 4, n = 23) and Vive (Md = 4.33, n = 17) in the effectiveness subscale, U = 100.5, z = −2.663, p = .008, r = −0.421. Further analysis showed a statistically significant difference between the survey questions pertaining to effectiveness in the questionnaire. Regarding the system’s ability for learning real-world skills (i.e., “I could use this device to learn real-world skills”), differences were found between the Vive (Md = 4, S.E = .550, n = 17) and Oculus (Md = 4, S.E. = .935, n = 23), U = 105.5, z = −2.721, p = .007, r =−0.430. Regarding the system’s ability to be used for a range of applications (i.e., “This device would be beneficial for a broad range of applications”), differences were found between the Vive (Md = 4, S.E = .550 n = 17) and Oculus (Md = 4, S.E. = .935 n = 23), U = 119, z = −2.314, p = .021, r = −0.366.

For RQ3, results indicated both positive and negative relationships between the usability survey and the usability performance measures for the Vive and Rift conditions. Table 2 illustrates the Spearman’s rho correlation results for the conditions.

Table 2. Correlations between the subjective and objective usability data

4 Discussion

4.1 HMD Differences in Usability: Research Questions 1 and 2

The usability dimensions overall favor the Vive. In terms of performance measures, participants in the Vive condition scored significantly higher in the third scenario. The per scenario effectiveness, or how many objects were collected with the Vive, speaks to the fluidity of the device, allowing one to effectively visually search and interface with a scenario. Although simpler tasks (i.e., the first and second scenarios) may not result in performance differences, more complex tasks could benefit from the Vive. Further, complex tasks using the Rift may benefit from performance aids to complete the same complex tasks at the level of Vive users. Therefore, matching a device to the complexity of a task should be considered.

The significant usability survey results were also grounded in effectiveness. The effectiveness subscale was made up of three statements, two of which were significant when sub-subscales were treated independently: “I could use this device to learn real-world skills” dealt with transference, and “This device would be beneficial for a broad range of applications” dealt with generalizability. Based on the generalizability question, it may save time and money to select the Vive if a user intends on having a broad array of applications within one device. Similarly, the Vive has an inclination to help users learn real-world skills. Given how VR is often used to train, the Vive appears to outline an actual system preference for learning. More analysis may help determine if this learning preference translates into demonstrable learning measures.

Overall, the Vive was better in terms of effectiveness. If a stakeholder were to choose the better, or more usable, device for a locomotion-by-teleportation task, Vive would be recommended. This choice is based on how all other usability aspects were equal. That is, if one was seeking a device with the best comfort, one may choose between the Vive and Rift without any comfort distinction (at least for short sessions). Yet, all measurements being equal, effectiveness matters the most: it held the only distinction between the devices.

Although more research could elaborate on the benefits leading to these improved ratings, immediate differences may relate to the controller scheme or HMD specification provided. Given the scenarios were identical between conditions, this leads to questioning the naturalness of the controller interface and the fidelity of the HMD. A future analysis would be, since the Vive is already preferable in usability, whether the Vive’s effectiveness is based off the controller or other technical aspects.

4.2 HMD Correlations in Usability: Research Question 3

At a practical level, if one is interested in an HMD’s role in facilitating a high completion rate within the Vive, a key indicator of performance may be comfort. Although this is a correlation, a direction of causality attributable to the system, and not performance, is plausible.

Total time-based efficiency had many correlations with subjective usability (Table 2). Note how performance does not indicate completion, but completion at a quick rate. This level of efficient performance may be suited to first responders, bomb disposal, or surgeons. At a practical level, one should consider how different layers of usability relate to performance in these domains. All aspects of subjective usability correlated significantly with efficiency in the Vive condition, whereas only two aspects correlated significantly in the Oculus condition. This overall trend shows how different aspects of usability matter in respect to performance tasks.

Two significant correlations suggested a reversal effect: visual quality was negatively related to time duration in the first scenario (with the Oculus); whereas in the second scenario, visual quality was positively related to time duration (with the Vive). In other words, Oculus participants were able to complete task one quicker as visual quality increased. This may be due to a minimal environment, with little clutter helping the user learn the initial task. The better the visual quality, the quicker one could perform the new task. However, by the second trial (here, in the Vive), the user’s attention shifted more towards the compelling visuals rather than completing the task. The environment may have been a curious distraction, especially since the interface would be learned at this point. Yet, it is unclear the rationale behind the effect being device-specific.

5 Limitations

The limitations of this experiment centered on controllers and scenario instructions. Vive controllers lacked changeable batteries; when both controllers lost power, participants had to use a controller tethered to a computer. As a result, participant arm movement was restricted. When both the Vive and Oculus controllers started to lose power, positional tracking was interrupted, which may have caused the participants to accidentally teleport out of bounds. Further, although there were scenario instructions on how to complete the task objective, there were no practice tests to ensure the participants mastered the task skills. Future experiments may consider the aforementioned limits to improve experimental design.

6 Conclusion

The present research examined the HTC Vive and Oculus Rift systems from a usability perspective. Overall, the results suggest the Vive was a stronger candidate, at least within the given scenario tasks. Objectively, the most complex task was easier with the Vive. However, more research is needed to elaborate if and how the Vive is preferable for different applications (e.g., locomotion types), and especially for learning tasks, to thus confirm the subjective results.