Onetwo vla andpi

The reasoning step matters, incorprate the reasoning trace in small model together with DiT policy training can lead to both planing ability and action ability One reason is that small model like 3B cannot plan very well and co-training make sense here. But the thing will be different if we use a model using a larger prior. Also incorprateing the reasoning trace here is what matters.