One of the more famous snippets of assembly is from Michael Abrash's texture mapping loop (expained in detail here):
add edx,[DeltaVFrac] ; add in dVFrac
sbb ebp,ebp ; store carry
mov [edi],al ; write pixel n
mov al,[esi] ; fetch pixel n+1
add ecx,ebx ; add in dUFrac
adc esi,[4*ebp + UVStepVCarry]; add in steps
Nowadays most compilers express advanced CPU specific instructions as intrinsics, i.e., functions that get compiled down to the actual instruction. MS Visual C++ supports intrinsics for MMX, SSE, SSE2, SSE3, and SSE4, so you have to worry less about dropping down to assembly to take advantage of platform specific instructions. Visual C++ can also take advantage of the actual architecture you are targetting with the appropriate /ARCH setting.

