Which GPUs to Get for Deep Learning. Deep learning is a field with intense computational requirements and the choice of your GPU will fundamentally determine your deep learning experience. With no GPU this might look like months of waiting for an experiment to finish, or running an experiment for a day or more only to see that the chosen parameters were off. With a good, solid GPU, one can quickly iterate over deep learning networks, and run experiments in days instead of months, hours instead of days, minutes instead of hours. So making the right choice when it comes to buying a GPU is critical. So how do you select the GPU which is right for you This blog post will delve into that question and will lend you advice which will help you to make choice that is right for you. TL DRHaving a fast GPU is a very important aspect when one begins to learn deep learning as this allows for rapid gain in practical experience which is key to building the expertise with which you will be able to apply deep learning to new problems. Without this rapid feedback it just takes too much time to learn from ones mistakes and it can be discouraging and frustrating to go on with deep learning. With GPUs I quickly learned how to apply deep learning on a range of Kaggle competitions and I managed to earn second place in the Partly Sunny with a Chance of Hashtags Kaggle competition using a deep learning approach, where it was the task to predict weather ratings for a given tweet. In the competition I used a rather large two layered deep neural network with rectified linear units and dropout for regularization and this deep net fitted barely into my 6. GB GPU memory. Should I get multiple GPUs Excited by what deep learning can do with GPUs I plunged myself into multi GPU territory by assembling a small GPU cluster with Infini. Band 4. 0Gbits interconnect. I was thrilled to see if even better results can be obtained with multiple GPUs. Asus M5A97 LE R2. AMD 970FX AM3 ATX Mothebroard,Asus. The slightly redesigned 2019 Porsche Cayenne isnt supposed to make its way before our eyes until Tuesday, but Fridays are more fun anyway. Here, have a look at. Hardware. Servers, storage and various appliances are cloud computings building blocks. Slickdeals slickdeals hot topics techbargains DynaGlo Premium Charcoal Grill for 72. Free Shipping Pickup Discount Xbox One Digital Games Dead Rising 3. We design processors, gaming graphics cards, and APUs using technology with a high level of visual computing capabilities. Explore AMD products now at AMD. I quickly found that it is not only very difficult to parallelize neural networks on multiple GPUs efficiently, but also that the speedup was only mediocre for dense neural networks. Small neural networks could be parallelized rather efficiently using data parallelism, but larger neural networks like I used in the Partly Sunny with a Chance of Hashtags Kaggle competition received almost no speedup. Later I ventured further down the road and I developed a new 8 bit compression technique which enables you to parallelize dense or fully connected layers much more efficiently with model parallelism compared to 3. However, I also found that parallelization can be horribly frustrating. I naively optimized parallel algorithms for a range of problems, only to find that even with optimized custom code parallelism on multiple GPUs does not work well, given the effort that you have to put in. You need to be very aware of your hardware and how it interacts with deep learning algorithms to gauge if you can benefit from parallelization in the first place. Setup in my main computer You can see three GXT Titan and an Infini. Band card. Is this a good setup for doing deep learning Since then parallelism support for GPUs is more common, but still far off from universally available and efficient. The only deep learning library which currently implements efficient algorithms across GPUs and across computers is CNTK which uses Microsofts special parallelization algorithms of 1 bit quantization efficient and block momentum very efficient. With CNTK and a cluster of 9. GPUs you can expect a new linear speed of about 9. Pytorch might be the next library which supports efficient parallelism across machines, but the library is not there yet. If you want to parallelize on one machine then your options are mainly CNTK, Torch, Pytorch. These library yield good speedups 3. GPUs. There are other libraries which support parallelism, but these are either slow like Tensor. Flow with 2x 3x or difficult to use for multiple GPUs Theano or both. If you put value on parallelism I recommend using either Pytorch or CNTK. Using Multiple GPUs Without Parallelism. Another advantage of using multiple GPUs, even if you do not parallelize algorithms, is that you can run multiple algorithms or experiments separately on each GPU. You gain no speedups, but you get more information of your performance by using different algorithms or parameters at once. This is highly useful if your main goal is to gain deep learning experience as quickly as possible and also it is very useful for researchers, who want try multiple versions of a new algorithm at the same time. This is psychologically important if you want to learn deep learning. The shorter the intervals for performing a task and receiving feedback for that task, the better the brain able to integrate relevant memory pieces for that task into a coherent picture. If you train two convolutional nets on separate GPUs on small datasets you will more quickly get a feel for what is important to perform well you will more readily be able to detect patterns in the cross validation error and interpret them correctly. You will be able to detect patterns which give you hints to what parameter or layer needs to be added, removed, or adjusted. So overall, one can say that one GPU should be sufficient for almost any task but that multiple GPUs are becoming more and more important to accelerate your deep learning models. Multiple cheap GPUs are also excellent if you want to learn deep learning quickly. I personally have rather many small GPUs than one big one, even for my research experiments. So what kind of accelerator should I get NVIDIA GPU, AMD GPU, or Intel Xeon PhiNVIDIAs standard libraries made it very easy to establish the first deep learning libraries in CUDA, while there were no such powerful standard libraries for AMDs Open. CL. Right now, there are just no good deep learning libraries for AMD cards so NVIDIA it is. Even if some Open. CL libraries would be available in the future I would stick with NVIDIA The thing is that the GPU computing or GPGPU community is very large for CUDA and rather small for Open. CL. Thus, in the CUDA community, good open source solutions and solid advice for your programming is readily available. Pro Dvd Rip Linux more. Additionally, NVIDIA went all in with respect to deep learning even though deep learning was just in it infancy. This bet paid off. While other companies now put money and effort behind deep learning they are still very behind due to their late start. Currently, using any software hardware combination for deep learning other than NVIDIA CUDA will lead to major frustrations. In the case of Intels Xeon Phi it is advertised that you will be able to use standard C code and transform that code easily into accelerated Xeon Phi code. This feature might sounds quite interesting because you might think that you can rely on the vast resources of C code. However, in reality only very small portions of C code are supported so that this feature is not really useful and most portions of C that you will be able to run will be slow. I worked on a Xeon Phi cluster with over 5. Xeon Phis and the frustrations with it had been endless. I could not run my unit tests because Xeon Phi MKL is not compatible with Python Numpy I had to refactor large portions of code because the Intel Xeon Phi compiler is unable to make proper reductions for templates for example for switch statements I had to change my C interface because some C1. Intel Xeon Phi compiler. All this led to frustrating refactorings which I had to perform without unit tests. It took ages. It was hell. And then when my code finally executed, everything ran very slowly. There are bugs or just problems in the thread scheduler which cripple performance if the tensor sizes on which you operate change in succession. For example if you have differently sized fully connected layers, or dropout layers the Xeon Phi is slower than the CPU.