Images are now loaded on the target device as uint8s.
Then they are converted to the target data type (eg. fp32 or fp16).
This speeds up the loading time.
Also, users can opt to store the image as uint8 or as target data type.
This will further reduce memory usage.
Users may want to reduce their memory consumption by using fp16.
However, in my tests, such attempts will result in lower quality renders.
Some data type conversions did not have any impact, so I removed them completely.
* Provide --data_on_cpu option to save VRAM for training
when there are many training images such as in large scene, most of the VRAM are used to store training data, use --data_on_cpu can help reduce VRAM and make it possible to train on GPU with less VRAM
* Fix data_on_cpu effect on default mask
* --data_on_cpu to --data_device
* update readme
* format warning infos