Performance/Accuracy Trade-offs of Floating-point Arithmetic on Nvidia GPUs: From a Characterization to an Auto-tuner