Decision-making through artificial neural networks with minimal latency is critical for numerous applications such as navigation, tracking, and real-time machine action systems. This requires machine learning hardware to process multidimensional data at high throughput. Unfortunately, handling convolution operations, the primary computational tool for data classification tasks, obeys challenging runtime complexity scaling laws. However, homomorphically implementing the convolution theorem in a Fourier optics display light processor can achieve a non-iterative O(1) runtime complexity for data inputs beyond 1,000 × 1,000 large matrices. Following this approach, here we demonstrate data streaming multi-kernel image batching using a Fourier Convolutional Neural Network (FCNN) accelerator. We show image batch processing of large-scale matrices as 2 million dot product multiplications performed by a digital light processing module in the Fourier domain. Furthermore, we further parallelize this optical FCNN system by exploiting multiple spatially parallel diffraction orders, achieving a 98x throughput improvement over state-of-the-art FCNN accelerators. A comprehensive discussion of the practical challenges associated with working at the edge of system capabilities highlights the problem of crosstalk and resolution scaling laws in the Fourier domain. Accelerating convolution by exploiting massive parallelism in display technology brings non-Van Neumann-based machine learning acceleration.
In recent years, heterogeneous machine learning accelerators have become of significant interest to science, engineering,
and industry. At the same time, demand for data security has increased significantly, especially in the looming post-
quantum encryption era. From a hardware processing point of view, both are challenged by electronic capacitive
interconnect delay and energy, and, in the case of heterogeneous systems such as electronic-photonic accelerators,
by parasitic domain crossings. With analog optical AI accelerators having demonstrated high throughout potential
(TOPS to even POPS) and high operation efficiency (TOPS/W), they have not demonstrated the ability to perform AI
classification task on encrypted data.
Here, we present an optical hashing and compression scheme that is based on SWIFFT - a post-quantum hashing
family of algorithms. High degree optical hardware-to-algorithm homomorphism allows to optimally harvest well-
understood potential of free-space processing: innate parallelism, low latency tensor by-element multiplication and
Fourier transform. The algorithm can provide several orders of magnitude increase in processing speed by replacing
slow high-resolution CMOS cameras with ultra-fast and signal-triggered CMOS detector arrays. Additionally, the
information acquired in this way will require much lower transmission throughput, less in silico processing power,
storage, and will be pre-hashed facilitating cheap optical information security. This technology has the potential to
allow heterogeneous convolutional 4f classifiers to get closer in performance to their fully electronic counterparts
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.