The Timing Mega-study: Comparing a Range of Experiment Generators, Both Lab-based and Online
Overview
Environmental Health
General Medicine
Affiliations
Many researchers in the behavioral sciences depend on research software that presents stimuli, and records response times, with sub-millisecond precision. There are a large number of software packages with which to conduct these behavioral experiments and measure response times and performance of participants. Very little information is available, however, on what timing performance they achieve in practice. Here we report a wide-ranging study looking at the precision and accuracy of visual and auditory stimulus timing and response times, measured with a Black Box Toolkit. We compared a range of popular packages: PsychoPy, E-Prime®, NBS Presentation®, Psychophysics Toolbox, OpenSesame, Expyriment, Gorilla, jsPsych, Lab.js and Testable. Where possible, the packages were tested on Windows, macOS, and Ubuntu, and in a range of browsers for the online studies, to try to identify common patterns in performance. Among the , Psychtoolbox, PsychoPy, Presentation and E-Prime provided the best timing, all with mean precision under 1 millisecond across the visual, audio and response measures. OpenSesame had slightly less precision across the board, but most notably in audio stimuli and Expyriment had rather poor precision. Across , the pattern was that precision was generally very slightly better under Ubuntu than Windows, and that macOS was the worst, at least for visual stimuli, for all packages. did not deliver the same level of precision as lab-based systems, with slightly more variability in all measurements. That said, PsychoPy and Gorilla, broadly the best performers, were achieving very close to millisecond precision on several browser/operating system combinations. For response times (measured using a high-performance button box), most of the packages achieved precision at least under 10 ms in all browsers, with PsychoPy achieving a precision under 3.5 ms in all. There was considerable variability between OS/browser combinations, especially in audio-visual synchrony which is the least precise aspect of the browser-based experiments. Nonetheless, the data indicate that online methods can be suitable for a wide range of studies, with due thought about the sources of variability that result. The results, from over 110,000 trials, highlight the wide range of timing qualities that can occur even in these dedicated software packages for the task. We stress the importance of scientists making their own timing validation measurements for their own stimuli and computer configuration.
Laera G, Del Missier F, Laloli S, Zuber S, Kliegel M, Hering A Mem Cognit. 2025; .
PMID: 40080255 DOI: 10.3758/s13421-025-01700-5.
Dyck S, Klaes C NPJ Sci Learn. 2025; 10(1):3.
PMID: 39820476 PMC: 11739496. DOI: 10.1038/s41539-025-00296-4.
The cost of perspective switching: Constraints on simultaneous activation.
Segal D Psychon Bull Rev. 2025; .
PMID: 39806243 DOI: 10.3758/s13423-024-02633-x.
Amenta S, Foppolo F, Badan L J Cogn. 2025; 8(1):14.
PMID: 39803176 PMC: 11720858. DOI: 10.5334/joc.420.
Do the effects of sleep problems on cognitive function differ according to age in daytime workers?.
Asaoka S, Nishimura R, Nozoe K, Yamamoto R Sleep Biol Rhythms. 2025; 23(1):13-20.
PMID: 39801932 PMC: 11717744. DOI: 10.1007/s41105-024-00546-9.