isn’t this more or less what they’re doing now? The difference is that the ads are coming from different server and have an overlay on top with a timer and a skip. As long as the ads are coming from a different server they will be detectable. Also as long as the ads have overlays they are also detectable. They would need to make the ads be served from the same server that serves the video and eliminate the overlays.
the reason they are not doing it is because the ads are personalized. So if they want to bake an ad onto a video they will end up with countless videos each on with their own unique ads which is not viable logistically. So they can only do it on-the-fly. But re-encoding each video on-the-fly for each user is also a nightmare logistically, if not impossible at all.
or get an open source, free and privacy friendly one from f-droid in case you haven’t tied your hands with an iphone