

* Implement Matrix Multiplication
* print buffers
* 1 warp independance
* finish python compiler
* change from dynamic to static