XGBoost uses the method of additive training in which it models the residual of the previous model.
This is sequential though, how does it to parallel computing then?
Xgboost doesn't run multiple trees in parallel like you noted, you need predictions after each tree to update gradients.
Rather it does the parallelization WITHIN a single tree my using openMP to create branches independently.
To observe this,build a giant dataset and run with n_rounds=1. You will see all your cores firing on one tree. This is why it's so fast- well engineered.