Usability how many people
If you want a single number, the answer is simple: test 5 users in a usability study. Testing with 5 people lets you find almost as many usability problems as you'd find using many more test participants. This answer has been the same since I started promoting " discount usability engineering " in Doesn't matter whether you test websites, intranets, PC applications, or mobile apps.
With 5 users , you almost always get close to user testing's maximum benefit-cost ratio. However, these exceptions shouldn't worry you much: the vast majority of your user research should be qualitative — that is, aimed at collecting insights to drive your design , not numbers to impress people in PowerPoint. The main argument for small tests is simply return on investment : testing costs increase with each additional study participant, yet the number of findings quickly reaches the point of diminishing returns.
There's little additional benefit to running more than 5 people through the same study; ROI drops like a stone with a bigger N. Sadly, most companies insist on running bigger tests.
During the UX Conference , I surveyed participants about the practices at their companies. The average response was that they used 11 test participants per round of user testing — more than twice the recommended size.
Clearly, I need to better explain the benefits of small- N usability testing. An opinion poll needs the same number of respondents to find out who will be elected mayor of Pittsburgh or president of France. The variance in statistical sampling is determined by the sample size, not the size of the full population from which the sample was drawn. In user testing, we focus on a website's functionality to see which design elements are easy or difficult to use.
The probability of an event is the ratio of the number of cases favorable to it m , to the number of all cases n possible when nothing leads us to expect that any one of these cases should occur more than any other, which renders them, for us, equally possible. If we speak in the frame of formula based on the binomial distribution. The probability of detecting a problem during testing at least once will be:.
Multiply both parts of this formula by the total number of problems. We shorten them on the left side and it turns to:. Now back to the Nielsen formula Nielsen and Laundaure, We got 2 identical formulas. Within the frame of this formula, both designations are identical. This point is explained in the publication Nielsen :. Nielsen and Landauer used lambda rather than p, but the two concepts are essentially equivalent.
Throughout this article, we will use p. Therefore, you can write the formula in the form:. In the future, I will not separate Lewis-Virzi and Nielsen formulas. In general, this formula does not claim any uniqueness. It will be difficult to find at least one textbook on probability theory, wherever this formula is mentioned in the section on multiplying probabilities.
Probability of occurrence of at least one of the independent events A1, A2, …, An, is equal to the difference between the 1 and the multiplication of the probabilities of opposite events. For us, A1, A2, … An are events of detecting a problem during the testing of the first, second … n-th user. If events A1, A2, …, An have the same probability, what is mean that the probability of detecting a problem remains the same from test to test, then the formula takes a simple form :.
It means, that our event happens when we rolled 2 of 6 faces. To demonstrate testing involving 5 users, we need 5 dices.
Throw dices:. We see, that during testing A event happened 2 times. We see, that our event occurred in 17 of 20 tests. You can try it yourself. But now we are talking about one event — one problem, but in 5 users rule, we make an assumption about all problems. But now the event means that 5 users were able to find an issue. We believe, that in each experiment we can detect any number of problems.
So, if we have 6 problems in the interface, the probability of finding wich in a single test is 0. And now we see, that testing 5 users will find 5 of 6 interface issues. If 20 — the same: throw 20 cubes. We will find 17 problems. If But in reality, all problems are different. This means that the probability of detecting problems in a single test may differ.
If, for example, we have 6 problems in the interface, then in the general case we will have 6 formulas:. Incorporating fixes as soon as possible, even if after just one participant. Commitment to run follow-up tests with as many participants as needed to confirm the fixes. Found a problem! So is it real? Medium and large tests: Risk management Small iterative usability tests reduce certain kinds of risks: timeliness of feedback, impact of feedback, and time to market.
Small, medium, large: What size of test fits you? Assess these three factors to determine whether five participants is enough for you: Are problems hard to find? Are the participants expert, the usability engineer less familiar with the domain, or the system complex and flexible? Is your organization equipped to do iterative small- n tests? Are you set up for ongoing recruitment-and-test operations, committed to iteration, and able to deliver software changes rapidly?
What are the safety and business risks of missing uncommon problems in your design? References Faulkner, L. Beyond the five-user assumption: Benefits of increased sample sizes in usability testing. Behavior Research Methods, Instruments, and Computers, 35 3 , FDA , June Medlock, M. The rapid iterative test and evaluation method: Better products in less time. In: Bias, R. Elsevier: Amsterdam, Netherlands. Nielsen, J. How many test users in a usability study?
A mathematical model of the finding of usability problems. Perfetti, C. Eight is not enough. Sauro, J.
R Quantifying the user experience: Practical statistics for user research. Waltham, MA: Morgan Kaufmann. Schrag, J. Using formative usability testing as a fast UI design tool. Spool, J. Testing web sites: Five users is nowhere near enough. Virzi, R. Redefining the test phase of usability evaluation: How many subjects is enough?
Human Factors, 34, Wiklund, M. Usability testing: Validating user interface design. Message from the CEO, Dr. Eric Schaffer — The Pragmatic Ergonomist I really think the idea that a 5 person test is appropriate has substantially hurt the industry. Leave a comment here. Subscribe Sign up to get our Newsletter delivered straight to your inbox. Privacy HFI believes that every User should know how it utilizes the information collected from Users.
Disclosure of Information HFI may disclose personally identifiable information collected from Users to its parent, subsidiary and other related companies to use the information for the purposes outlined above, as necessary to provide the services offered by HFI and to provide the Website itself, and for the specific purposes for which the information was collected. Two important issues for problem-discovery studies are that it can be hard both to define a problem and rank the importance of discovered problems.
According to D. Caulton, problems are often a factor of the interaction between a user and the product, not necessarily a static feature of a user interface. Not only might a specific problem exist for just some participants, a specific problem might exist for just a single participant on one day, but not on another day. Therefore, it can be very difficult to agree on what actually constitutes a problem. Plus, the ranking of problems is highly subjective. When setting up a usability study, you need to consider the probable mean percentage and what minimal level of problem discovery is necessary—in other words, the average percentage of problems you hope to find, as well as the minimum percentage.
As Table 1 shows, going from 5 to 10 participants greatly increases the expected level of problem discovery, but going from 15 to 20 participants has far less impact. These numbers are very important to understand.
We can have fewer participants in a study if undiscovered problems would have a small impact on users, but we must have more participants if the stakes are high—for example, in life- or enterprise-critical situations. We can also have fewer participants if there will be opportunities to find important problems during a later round of testing.
Another important consideration is the complexity of the study itself. Nielsen is often criticized for citing research from simple studies with well-defined tasks. Clearly, the more complicated the tasks, the more complex the study should be, and the more participants may be necessary.
In such a case, it is also necessary to consider training issues. If the entire target user population will receive exactly the same training, this effectively reduces the study complexity, so you can use fewer participants. Studies to evaluate a prototype of a novel user-interface design often concern the discovery of severe show-stopper problems.
Testing usually reveals such severe errors relatively quickly, so these tests often require fewer participants. So, contrary to popular thinking, there is no one-size-fits-all solution for problem-discovery studies. Context and complexity have a big impact on the appropriate number of participants necessary for such a study. This reality is probably a factor in the diversity of widely cited advice. For example, both Virzi and Perfetti and Landesman found that the appropriate number of participants for many studies ranges between three and twenty.
Turner believes seven participants may be ideal, even for complex studies. In summary, research suggests that from three to twenty participants can provide valid results, and a good baseline is between five and ten participants.
In general, there should be more participants for more complex, highly critical projects; while fewer participants are necessary when testing more novel designs. Often, new design B is tested against the currently implemented design A. Comparative studies generally use usability metrics such as task-completion rate and time on task.
These metrics are highly objective, so usability professionals often want to present statistically significant results from such studies. These results can be compelling, but they have some common pitfalls. To reject the null hypothesis , we must be reasonably sure that different groups of participants and the target audience are equally skillful—that is, that design B did not have a better result because group B was significantly more skillful.
This can be difficult to determine, so is a reason for caution when quoting such results.
0コメント