We could follow the fashion of solution 12.1 and 12.2, which is when square < x, assign mid to res and do left = mid + 1. The readability is higher as in every iteration we are trying to keep the local optimum and the rest is just the classic binary search, v.s. the current solution which has special logic with the while condition, mid inclusion and right*right check at the end. Thoughts?
btw if you run the code on Windows, where int = long, it could overflow. Use double for mid and square_m instead.