Forge MCP Server - Turn PyTorch into fast CUDA/Triton kernels on real | MCP Marketplace