
* Implement Matrix Multiplication
* 1 warp independance
* automatic warp detection
* finish python compiler
* print buffers
* Support for compressed