Producing output has been claimed to be crucial for developing language proficiency; however, the effects of output modalities on the learning process and outcomes have scarcely been investigated together and thus require further research. This action research examined the effects of task modality on language-related episode (LRE) type, resolution, and degree of form knowledge gain. Two groups of Japanese secondary school students performed word- and picture-cued sentence reconstruction tasks that required the use of the passive form, conducted either in speaking or speaking + writing. Task interactions were analysed in terms of LREs, and the passive form knowledge was measured using error correction tests and written and oral storytelling tests. The results showed that the speaking + writing tasks elicited more LREs, and lexis-based LREs were produced more often than the other types of LREs in both groups. Moreover, nearly 80% of all LREs were correctly resolved. As for the passive form knowledge, the speaking group outperformed the other group for the error correction test in the delayed post-test and on the written storytelling test; however, these results exhibited small effect sizes. This study provides practical suggestions for the choice of task modality according to the aim of task-based learning.