Updated the function ssim(), the optimized version of ssim() is ssim_optimized(), which reduce the computational complexity. After modified, the function create_window() just need to be called once in the main function in train.py, no need to be called in ssim() in every iteration.
* Provide --data_on_cpu option to save VRAM for training
when there are many training images such as in large scene, most of the VRAM are used to store training data, use --data_on_cpu can help reduce VRAM and make it possible to train on GPU with less VRAM
* Fix data_on_cpu effect on default mask
* --data_on_cpu to --data_device
* update readme
* format warning infos