Abundant prior research has compared effects of physical and virtual manipulatives on students’ conceptual learning. However, most prior research has been based on conceptual salience theory; that is, it has explained mode effects by the manipulative’s capability to draw students’ attention to conceptually relevant (visual or haptic) features. Yet, research based on embodied schema theory suggests that other mechanisms, which do not rely on students’ explicit attention to specific features, also affect students’ learning from manipulatives. This paper presents a study that contrasts predictions by different theoretical perspectives by comparing multiple versions of physical and virtual manipulatives. Specifically, we conducted a lab experiment with 119 undergraduate students who learned about 3 concepts related to atomic structure using 1 of 4 versions of energy diagram manipulatives. The 4 versions varied the representation mode (i.e., physical vs. virtual) and the actions students used to manipulate the representation (i.e., via actions that draw attention or activate embodied schemas). We assessed students’ learning via reproduction and transfer posttests and interviews that measured the quality of students’ explanation and the gestures they used while explaining the concepts. Our results suggest that embodied schema theory accounts for effects on the reproduction posttest, whereas conceptual salience theories account for effects on the transfer posttest. Further, when physical manipulatives offered relevant haptic cues, we found an advantage of physical manipulatives on transfer. We interpret these results based on the complexity of embodied schema and conceptual salience learning mechanisms and the complexity of the assessment tasks. (PsycInfo Database Record (c) 2021 APA, all rights reserved)