Optimizer parallelism also known as zero redundancy optimizer [37] implements optimizer point out partitioning, gradient partitioning, and parameter partitioning across gadgets to lessen memory consumption though trying to keep the communication charges as low as feasible.That's why, architectural details are the same as the baselines. Also, optim