Windows does indeed do BGR as the default byte ordering. If you're using the RGB(), GetRValue(), GetGValue(), and GetBValue() macros then it's a non-issue from a programming standpoint. If you care about alpha, there are analogous macros for working with RGBA and alpha values (I think they were defined in the DirectX SDK). Alpha is usually defined as BGRA pixels, but some accelerators might use ABRG or other format; sticking with the DIB sections will ensure BGRA, though.

Depending on the processor, it may or may not be faster to make an array of pointers to the first pixel of each line and then address the line as framebuf[y][x]. For block clears, consider doing a 32-bit memset operation instead of pixel-by-pixel activities. If you know your instruction set, you can get even more performance by using MMX, SSE, or AVX instructions. If you're looking at that level of optimization, though, you probably ought to switch algorithms...