Multispectral imagery is instrumental across diverse domains, including remote sensing, environmental monitoring, agriculture, and healthcare, as it offers a treasure trove of data over various spectral bands, enabling profound insights into our environment. However, with the ever-expanding volume of multispectral data, the need for efficient compression methods is becoming increasingly critical. Enhanced compression not only conserves precious storage space, but also facilitates rapid data transmission and analysis, ensuring the accessibility of vital information. In particular, in applications such as satellite imaging, where bandwidth constraints and storage limitations are prevalent, superior compression techniques are essential to minimize costs and maximize resource utilization.
Neural network-based compression methods are emerging as a solution to address this escalating challenge. While autoencoders have become a common neural network approach to image compression, they face limitations in generating customized quantization maps for training images, relying on feature extraction. However, the integration of bespoke quantization maps alongside feature extraction can elevate compression performance to levels previously considered unattainable. The concept of end-to-end image compression, encompassing both quantization maps and feature extraction, offers a comprehensive approach to represent an image in its simplest form.
The proposed method considers not only the compression ratio and image quality but also the substantial computational costs associated with current approaches. Designed to capitalize on similarities within and across spectral channels, it ensures accurate reproduction of the original source information, promising a more efficient and effective solution for multispectral image compression.
Edge computing in remote sensing often necessitates on-device learning due to bandwidth and latency constraints. However, limited memory and computational power on edge devices pose challenges for traditional machine learning approaches with large datasets and complex models. Continuous learning offers a potential solution for these scenarios by enabling models to adapt to evolving data streams. This paper explores the concept of leveraging a strategically selected subset of archival training data to improve performance in continual learning. We introduce a feedback-based intelligent data sampling method that utilizes a log-normal distribution to prioritize informative data points from the original training set, focusing on samples which the model struggled with during initial training. This simulation-based exploration investigates the trade-off between accuracy gains and resource utilization with different data inclusion rates, paving the way for the deployment of this approach in real-world edge devices. This approach can lead to better decision making in the field, improved operational efficiency through reduced reliance on cloud resources, and greater system autonomy for remotely sensing tasks. This will lead to the development of robust and efficient edge-based learning systems that enable real-time, autonomous, and data-driven decisions for critical tasks in remote locations.
The objective of image compression is to reduce irrelevance and redundancy of the image data to be able to store or transmit data in an efficient manner by minimizing the number of bits required to represent an image accurately. JPEG is capable of achieving an image compression ratio of 10:1 with little perceptible loss in image quality using standard metrics, and has become the most widely used standard image compression in the world since its release. Traditionally, compression techniques have relied on linear transforms to approximate 2-D signals (images), and the omission of specific constituent vectors has been mostly arbitrary. These techniques can save incredible amounts of memory while retaining image integrity. Recently techniques have been developed that use neural networks to approximate these signals. These networks offer the advantage of decorrelating image data to find a series of vectors to represent an image that is smaller than traditional techniques by estimating gradient descent, thus finding the minimum number of bits required to represent an image. Expansion to the development of these architectures is happening rapidly through informed design drawing upon other fields that have recently seen increased focus such as computer vision and image analysis applications. A novel efficient neural network is proposed in this work to compress infrared images at state of the art levels while preserving overall image quality to handle the demands spanning from the daily commute to combat environments.
K-means is a popular unsupervised ML algorithm for analyzing and recognizing natural occurring patterns to cluster similar points together. When applied to the color space of an image, it can work to recognize segments of the image where more meaningful clustering can be applied. Color quantization has been employed for decades now to optimize the memory usage of saved images. Typical images are composed of red, blue, and green channels, each represented by a byte in memory. Therefore, each pixel can be represented by a total of 24 bits, resulting in around 16.8 million unique colors. However, the human perceptive system is not sensitive enough to require full usage of this color space and it is beneficial to find ways to reduce the number of colors closer to what the eye is able to distinguish. This results in more efficient use of memory while still preserving details and color separation in the image. The key issue is to determine how much a picture can be quantized before the image starts to degrade to the point that a human would be able to discern the difference. Currently there is no algorithm available to aptly determine where this point occurs or whether each color channel should be treated identically. This research applies K-means color clustering to each color channel of the image separately to optimize compression. The introduction of principal component analysis (PCA) informed K-means in place of randomly seeded K-means on each color channel separately further improves performance.
Classification of one-dimensional (1D) data is important for a variety of complex problems. From the finance industry to audio processing to the medical field, there are many industries that utilize 1D data. Machine learning techniques have excelled at solving these classification problems, but there is still room for improvement because the techniques have not been perfected. This paper proposes a novel architecture called Multi-Head Augmented Temporal Transformer (MHATT) for 1D classification of time-series data. Highly modified vision transformers were used to improve performance while keeping the network exceptionally efficient. To showcase its efficacy, the network is applied to heartbeat classification using the MIT-BIH OSCAR dataset. This dataset was ethically-split to ensure a fair and intensive test for networks. The novel architecture is 94.6% more efficient and had a peak accuracy of 91.79%, which was a 13.6% reduction in error over a recent state-of-the-art network. The impressive performance and efficiency of the MHATT architecture can be exploited by edge devices for unmatched performance and flexibility of deployment.
As one of the classic fields within the area of computer vision, image classification and segmentation solutions as topics have expanded exponentially in terms of accuracy and ease of use. On Mars, the atmospheric and surface conditions can lead to the sudden onset of a dust storm, or a more common dust devil, causing a multitude of issues for both equipment and crew. The ability to identify and locate area which should be avoided due to these storms is necessary for mission safety. Many current techniques are not practical due to being hefty and computationally expensive for specific tasks that require the ability for swift deployability onto systems with more stringent constraints. This paper proposes a novel approach to the problem of segmentation by marrying an efficient yet powerful Vision Transformer based model with traditional signal processing techniques to ensure peak performance. With the National Aeronautics and Space Administration (NASA) looking to land a team on Mars, this paper takes on the real time hurdle of classifying and segmenting dust storms within remote satellite equatorial photos, using a model designed to be integrated on any and all future systems, increasing overall mission success.
Ethical data splitting is of paramount importance to ensure the validity of any solution that is based on data. If data is biased, it will not accurately represent how the solution will solve the problem. To ethically split data, the overall variance of the data needs to be fairly represented in the training and the testing sets of the dataset. To do this, the outliers of the data need to be determined so that they can be accounted for when splitting the data. Finding the principal components of the data using the L2-norm has been shown as an effective way to identify outliers of data to make a robust dataset that is resistant to outliers. It has been shown that the L1-norm is more resistant to outliers than the L2-norm, so it will allow the dataset to become more resistant to outliers. Therefore, utilizing L1-norm principal components when determining ethical data splits will result in more robust datasets.
The detection and recognition of targets within imagery and video analysis is vital for military and commercial applications. The development of infrared sensor devices for tactical aviation systems imagery has increased the performance of target detection. Due to the advancements of infrared sensors capabilities, their use for field operations such as visual operations (visops) or reconnaissance missions that take place in a variety of operational environments have become paramount. Many techniques implemented stretch back to 1970, but were limited due to computational power. The AI industry has recently been able to bridge the gap between traditional signal processing tools and machine learning. Current state of the art target detection and recognition algorithms are too bloated to be applied for on ground or aerial mission reconnaissance. Therefore, this paper proposes Edge IR Vision Transformer (EIR-ViT), a novel algorithm for automatic target detection utilizing infrared images that is lightweight and operates on the edge for easier deployability.
Chest X-rays can quickly assess the COVID-19 status of test subjects and address the problem of inadequate medical resources in emergency departments and centers. The image classification model established by the deep learning method of artificial intelligence can help doctors make a better judgment on patients with COVID-19 and related lung diseases. We compared and analyzed the current popular deep learning image classification methods, VGGNet, GoogleNet, and ResNet, using publicly available chest X-ray datasets on COVID-19 from different organizations. According to the characteristics of chest X-ray images and the classification results of the deep learning algorithm, a novel image classification algorithm, CovidXNet, is proposed. Based on the ResNet model, the CovidXNet algorithm introduces the hard sample memory pool method to improve the accuracy and generalization of the algorithm. CovidXNet is able to categorize chest X-ray images more efficiently and accurately than other popular image classification algorithms, allowing doctors to quickly confirm the patient’s diagnosis.
In recent years, the computational power of handheld devices has increased rapidly to the point of parity with computers of only a generation ago. The multiple tools integrated into these devices and the progressive expansion of cloud storage have created a need for novel compressing techniques for both storage and transmission. In this work, a novel L1 principal component analysis (PCA) informed K-means approach is proposed. This new technique seeks to preserve the color definition of images through the application of K-means clustering algorithms. Assessment of the efficacy is carried out utilizing the structural similarity index (SSIM).
For the past 4 decades the MIT-BIH dataset has become the industry standard for the analysis of a comparative metric of signal processing and machine learning techniques. This is because medical data is difficult to collect and use because it is not widely available and open-source. There exists a need to standardize the metric for comparative reasons. This paper proposes a set of datasets targeted at specific tasks currently under investigation in state-of-the-art works. The open sharing of these datasets in multiple formats will allow for the application of the benchmark data to multiple advanced classification algorithms. Published methods will be profiled using this new dataset building the foundation for its merit. A series of datasets are identified with applicable criteria as to their usage such as, TinyML for health monitoring and detection of heart disease.
Several classical statistical methods are commonly used for forecasting time-series data. However, due to a number of nonlinear characteristics, forecasting time-series data remains a challenge. Machine learning methods are better able to solve problems with high nonlinearity. RNNs (recurrent neural networks) are frequently used for time-series forecasting because their internal state, or memory, allows them to process a sequence of inputs. Specifically, LSTM (long short term memory), a type of RNN, is particularly useful, as it has both long-term and short-term components. Due to its feedback connections, ability to process a sequence of data of varying lengths, and ability to reset its own state, LSTMs are less sensitive to outliers and more forgiving to varying lags in time. Consequently, LSTMs are able to extract vital information and learn trends to forecast time-series data with high accuracy. We propose a novel neural network architecture using a combination of long short term memory and convolutional layers to predict time-series energy data with higher accuracy than comparable networks.
As one of the classic fields of computer vision, image classification is a topic that has expanded exponentially in terms of usability and accuracy in recent years. With the rapid progression of deep learning, as well as the introduction and advancement of techniques such as convolutional neural networks and vision transformers, image classification has been elevated to levels only theoretical until modern times. This paper presents an improved method of object classification using a combination of vision transformers and multilayer convolutional neural networks with specific application to underwater environments. In comparison to previous underwater object classification algorithms, the proposed network classifies images with higher accuracy, shorter training iterations, and deployable parameters.
There has been a sharp rise in the amount of data available for analysis in many professional fields in recent years. In the medical sector, this significant increase in data can help detect and confirm underlying symptoms in patients that would otherwise remain undetected. Machine learning techniques have been applied in the medical sector and can help diagnose irregularities when data is provided for the specific area on which the system has been trained. Leveraging the newfound amount of big data and advanced diagnostic techniques, higher dimensional data feature extraction can be better analyzed. The algorithm presented in this paper utilizes a convolutional neural network to categorize electrocardiogram (ECG) data by processing the original data implementing the fast Fourier transform (FFT) and principal component analysis (PCA) to reduce dimensionality while maintaining performance. The paper proposes three intelligent identification algorithms that can be fed into another specialized machine learning system or analyzed using traditional diagnostic procedures.
Living in a constant news cycle creates the need for automated tracking of events as they happen. This can be achieved through the investigation of broadcast overlay textual content. There exists a great amount of information to be deciphered via these means before further processing, with applications spanning from politics to sports. We utilize image processing to create mean cropping masks based on binary slice clustering from intelligent retrieval to identify areas of interest. This data is handed off to CEIR, based on the connectionist text proposal network (CTPN) to fine-tune the text locations and an advanced convolutional recurrent neural networks (CRNN) system to carry out text recognition to recognize the text strings. In order to improve the accuracy and reduce processing time, this novel approach utilizes a preprocessing mask identification and cropping module to reduce the amount of data being processed by the more finely tuned neural network.
The fusion of multispectral sensor data techniques for sets containing complementary information about the subject of observation leads to the visualization of data into a form more easily interpreted by both humans and algorithms. Many applications of feature-level fusion seek to combine edges and textures, across the bandwidth of the sensory spectrum. Visualization techniques can be skewed by the introduction of corruption and redundancies induced by harmonics. A majority of image fusion techniques rely on intensity hue saturation (IHS) transforms, principal component analysis (PCA), and Gram Schmidt. PCA’s ability to remove the redundancy from a set of correlated data while preserving the variance and its resistance to color distortion lends itself to this application. PCA also has a lower spectral distortion as compared to IHS and has been found to create superior image fusion. The application of neural network control techniques has been shown to more accurately recreate results similar to those found by human inference. Over the years, increased computation power has given rise to the spread of neural networks into roles previously carried out by humans. Select advanced image processing techniques have benefited greatly from their implementation. We propose a novel method of utilizing PCA in conjunction with a neural network to achieve a higher quality of image fusion. Implementation of an autoencoder neural network to fuse this information creates a higher level of data visualization when compared to traditional weighted fusion techniques.
KEYWORDS: Principal component analysis, Satellites, Data fusion, Sensors, Visualization, Signal to noise ratio, Infrared sensors, Infrared radiation, Visible radiation, Data processing
Satellites are equipped with an array of diversified sensors, capable of relaying multiple types of optical data about the earth’s surface. The different sensors used can capture varying levels of detail for a particular area of interest. Combining information gathered from sensors, ranging from the infrared to the visible spectrum, can enhance visualization and depth of data. The application of principal component analysis (PCA) to data fusion is traditionally processed by weighted reliability matrix. This paper presents a novel weighted reliability with rejection control PCA based sensor algorithm to improve data fusion quality creating a more robust visualization of the composite information obtained from satellites. The proposed algorithm can be applied using both L2 and L1 PCA. Simulation studies validate the proposed controlled weighted fusion method, even under high levels of corruption.
Meteorological modeling takes data captured from multiple sources that is then processed by data mining techniques to predict environmental changes. The most commonly used machine learning techniques for processing meteorological data are decision trees, rule-based methods, neural networks, naive Bayes, Bayesian belief networks, and support vector machines. These techniques require accurate data for effective models to be simulated. Meteorological datasets can contain outliers and errors that can significantly skew the accuracy of the generated models that are relied upon for many sectors of society including agriculture, natural disasters, and meteorological forecasting. This paper proposes a method to eliminate outliers from meteorological data to enhance the accuracy of models by applying a blind thresholding algorithm to the principal components (PCs) obtained from L1 and L2 norm Principal Component Analysis to identify and discard outliers in the dataset.
Big data has been driving professional sports over the last decade. In our data-driven world, it becomes important to find additional methods for the analysis of both games and athletes. There is an abundance of videos taken in professional and amateur sports. Player datasets can be created utilizing computer vision techniques. We propose a novel approach by creating an autonomous masking algorithm that can receive live or previously recorded video footage of sporting events. This procedure can identify graphical overlays to optimize further processing by tracking and text recognition algorithms for real-time analysis.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.