These triangles typically cover only a couple of tiles, so processing them involves deciding whether to render them & dispatching to the appropriate tiles, one at a time.
After completing interpolation, possibly subsampling the registered texture if any, & maybe running a fragment-shader we need to merge the results.
An output-tile cache could significantly reduce reads/writes. Clustering each tile's processing could help, as could loosely-sorting the tile processing order.
4/4 Fin for now!