The "procedural tree structure" in my code is literally for trees, like arboles, with leaves and acorns and such. It's just an example of a complex cpu-side algorithm that is running every frame.
The draw call itself to the gpu will be very fast in the grand scheme of things. It just adds some data to a list in the driver. It doesn't represent the rendering time, nor does rendering actually start during that call. You have to use gpu timers like http://www.lighthouse3d.com/tutorials/opengl-timer-query/ to see how long the drawing takes.
Basic rendering has roughly 3 steps: update a bunch of buffers, bind the shader, and issue draw commands. Binding the shader and issuing the draw commands are asynchronous; they merely add things into an internal command buffer in the driver (you do this explicitly in Vulkan). Updating buffers is the tricky part. You can't just write directly into GPU memory. The driver has to coordinate the writes so that there are no conflicts with ongoing asynchronous jobs, such as the previous frame which is still actually in progress on the GPU. If you try to update a buffer which is still being used by another render job (such as the previous frame), the driver will stall your cpu thread and wait until the gpu is finished with the buffer before doing the copy. This is called a pipeline stall and is only one of many ways of causing one. As such, any buffer which is updated every frame needs to cycle through three different gpu buffers so as to not stall rendering. Yes, three. You can realistically expect to have three frames in flight at various stages of progress at the same time. You can also partition one buffer into three sections and render from an offset, if you use the correct flags when creating the buffer and the correct functions for updating a region of it.