UK-based technology company Omnitek has recently teamed up with Oxford’s Department of Engineering Science to sponsor a DPhil student in artificial intelligence to investigate novel techniques for implementing deep learning acceleration on silicon chips.
Omnitek’s CEO Roger Fawcett explains how the partnership will push boundaries in neural network development.
Omnitek is a world leader in the design of intelligent video and vision systems based on programmable logic chips called FPGAs and SoCs. There has been a rapid rise recently in the application of artificial intelligence (AI) to video and vision applications such as face recognition, autonomous driving, 4K to 8K upscaling, security, and handwriting recognition – to name but a few.
To support this trend, Omnitek has applied its skills to develop the world’s highest performance convolutional neural network running on an FPGA and intends to use the results of the Oxford DPhil research to remain at the cutting edge of this very fast-moving field.
As a leading establishment in AI, Oxford University and its Active Vision Laboratory were a natural choice for Omnitek. We have employed many top Oxford graduates and postgraduates at Omnitek.
The emergence of new compute engines
Following the seminal 2012 paper by Krizhevsjy et al on ‘ImageNet Classification with Deep Convolutional Networks’, the use of deep neural networks has resulted in considerable progress being made in areas such as image classification, Go computers, handwriting recognition, natural language processing, financial services and speech recognition.
Neural networks rely heavily on massively parallel multiply-accumulate and other operations. The intense compute power required has meant that GPUs have been preferable to CPUs, due to their intrinsic parallelism. This choice of GPUs has also been promoted by the development of general-purpose programming languages such as CUDA and machine learning frameworks such as CAFFE and Tensor Flow.
GPUs are highly versatile and very capable, but they do not necessarily provide the optimum architecture for machine learning applications, as the overall architecture is still highly influenced by the needs of 3D graphics rendering. This is compensated to some degree by the sheer size of GPUs, but at the expense of power consumption and latency, both of which are incompatible with many embedded applications.
Improved accuracy of neural networks has been at the expense of higher complexity. There is also a need to create both low power embedded implementations as well as high performance reconfigurable data centre compute engines. Recent research has shown that compute efficiency can be achieved by a variety of techniques, some of which cannot be capitalised by GPUs.
These trends have led to significant research into alternative compute engines, including FPGAs to capitalise on the algorithmic innovations and achieve the highest performance per $ and per watt.
Many companies are developing bespoke silicon compute engines in the form of ASICs or ASSPs. While these have the potential to deliver the optimum performance per $ and per watt, it takes a long time to bring such products to market, and they are inflexible. FPGAs, on the other hand, can be bought off the shelf and configured in the field to implement any logic function a user requires. They can also be reconfigured in a matter of milliseconds in situ in an appliance or data centre. This means they can be configured specifically for an application and adapted on the fly to optimise for each workload. Further, as research reveals new neural network architectures and optimisation techniques, these FPGAs can be reconfigured in the field to benefit the user immediately, contrasting dramatically with the many months it would take to deliver replacement equipment if ASICs or ASSPs were used instead. Thus, FPGAs have many advantages when it comes to delivering leading edge technology that’s evolving as rapidly as deep neural networks.
The research
The research will seek to find ways to meet industry demand for ever higher performance in neural networks, primarily by improving Omnitek’s current ‘DPU’ product for FPGAs, through areas such as:
- Mathematical study of optimum trade-offs between accuracy, bit depth, fixed and floating-point maths, and use of Winograd transform for various neural network topologies for FPGA implementation.
- Extension of the DPU architecture to include RNN/LSTM and other (potentially novel) neural network topologies.
- Training acceleration.
- Integration of video and vision processing functions into machine learning algorithms to optimise complete solutions.
Review of the latest machine learning research from the perspective of optimum computation hardware architectures. - The development of machine learning algorithms and implementation within Omnitek’s DPU architecture.
Conclusion
Omnitek anticipates that the combination of its own skills and expertise, Oxford University and the Active Vision Laboratory’s leadership in AI, and the choice of FPGAs as the delivery platform for the DPU will enable developers of equipment based on machine learning to benefit rapidly from the latest research and deliver enhanced capabilities in applications ranging from automotive to medical, security to industrial and defence to entertainment.