The 4080 is a good 10x slower than the V100 in double precision so this doesn't surprise me - it is designed for workstation graphics not HPC. If you're sure that the 3060 also massively outperforms it then you might have something - share some code and we can take a look.
The 4080 is fully supported by CUDA 11.8. Optimizations specific to Ada hadn't been completed by the time that toolkit came out so it's possible it might be executing some sub-optimal code, but usually that's more about it failing to reach its full potential rather than a significant under-performance.