PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.
Proceedings Volume Data Mining, Intrusion Detection, Information Assurance, and Data Networks Security 2007, 657001 (2007) https://doi.org/10.1117/12.731816
This PDF file contains the front matter associated with SPIE Proceedings Volume 6570, including the Title Page, Copyright information, Table of Contents, Introduction, and the Conference Committee listing.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Data Mining, Intrusion Detection, Information Assurance, and Data Networks Security 2007, 657002 (2007) https://doi.org/10.1117/12.724246
"Botnets", or "bot armies", are large groups of remotely controlled malicious software. Bot armies pose one of the
most serious security threats to all networks. Botnets, remotely controlled and operated by botmasters or
botherders, can launch massive denial of service attacks, multiple penetration attacks, or any other malicious
network activity on a massive scale. While bot army activity has, in the past, been limited to fraud, blackmail, and
other forms of criminal activity, their potential for causing large-scale damage to the entire internet; for launching
large-scale, coordinated attacks on government computers and networks; and for large-scale, coordinated data
gathering from thousands of users and computers on any network has been underestimated. This paper will not
discuss how to build bots but the threats they pose. In a "botnet" or "bot army", computers can be used to spread
spam, launch denial-of-service attacks against Web sites, conduct fraudulent activities, and prevent authorized
network traffic from traversing the network.
In this paper we discuss botnets and the technologies that underlie this threat to network and computer security.
The first section motivates the need for improved protection against botnets, their technologies, and for further
research about botnets. The second contains background information about bot armies and their key underlying
technologies. The third section presents a discussion of the types of attacks that botnets can conduct and potential
defenses against them. The fourth section contains a summary and suggestions for future research and
development.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Data Mining, Intrusion Detection, Information Assurance, and Data Networks Security 2007, 657003 (2007) https://doi.org/10.1117/12.719026
Enterprise networks are facing ever-increasing security threats from Distributed Denial of Service (DDoS) attacks, worms, viruses, intrusions, Trojans, port scans, and network misuses, and thus effective monitoring approaches to quickly detect these activities are greatly needed. In this paper, we employ chaos theory and propose an interesting phase space method to detect Internet worms.
An Internet worm is a self-propagating program that automatically replicates itself to vulnerable systems and spreads across the Internet. Most deployed worm-detection systems are signature-based. They look for specific byte sequences (called attack signatures) that are known to appear in the attack traffic. Conventionally, the
signatures are manually identified by human experts through careful analysis of the byte sequence from captured attack traffic. We propose to embed the traffic sequence to a high-dimensional phase space using chaos theory. We have observed that the signature sequence of a specific worm will occupy specific regions in the phase space, which may be appropriately called the invariant subspace of the worm. The invariant subspace of the worm separates itself widely from the subspace of the normal traffic. This separation allows us to construct three simple metrics, each of which completely separates 100 normal traffic streams from 200 worm traffic streams, without training in the conventional sense. Therefore, the method is at least as accurate as any existing methods. More importantly, our
method is much faster than existing methods, such as based on expectation maximization and hidden Markov models.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Data Mining, Intrusion Detection, Information Assurance, and Data Networks Security 2007, 657004 (2007) https://doi.org/10.1117/12.720035
In this paper, we have proposed a comprehensive and innovative approach for analysis and visualization of large
complex multi-step cyber attack graphs. As an automated tool for cyber attack detection, prediction, and visualization,
the newly proposed method transforms large quantities of network security data into real-time actionable intelligence,
which can be used to (1) provide guidance on network hardening to prevent attacks, (2) perform real-time attack event
correlation during active attacks, and (3) formulate post-attack responses. We show that it is possible to visualize the
complex graphs, including all possible network attack paths while still keeping complexity manageable. The proposed
analysis and visualization tool provides an efficient and effective solution for predicting potential attacks upon observed
intrusion evidence, as well as interactive multi-resolution views such that an analyst can first obtain high-level overviews
quickly, and then drill down to specific details.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Data Mining, Intrusion Detection, Information Assurance, and Data Networks Security 2007, 657005 (2007) https://doi.org/10.1117/12.719139
We consider the problems of placing cameras so that every point on a
perimeter, that is not necessarily planar, is covered by at least
one camera while using the smallest number of cameras.
This is accomplished by aligning the edges of the cameras' fields
of view with points on the boundary under surveillance.
Taken into consideration are
visibility concerns, where features such as mountains must not be
allowed to come between a camera and a boundary point that would
otherwise be in the camera's field of view. We provide a general
algorithm that determines optimal camera placements and orientations.
Additionally, we consider double coverings, where every boundary point
is seen by at least two cameras, with selected boundary points
and cameras situated such that the average calibration errors between
adjacent cameras is minimized. We describe an iterative algorithm
that accomplishes these tasks. We also consider a joint optimization
algorithm, which strikes a balance between minimizing calibration
error and the number of cameras required to cover the boundary.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Data Mining, Intrusion Detection, Information Assurance, and Data Networks Security 2007, 657006 (2007) https://doi.org/10.1117/12.719023
As the amount and types of remote network services increase, the analysis of their logs has become a
very difficult and time consuming task. There are several ways to filter relevant information and
provide a reduced log set for analysis, such as whitelisting and intrusion detection tools, but all of
them require too much fine- tuning work and human expertise. Nowadays, researchers are evaluating
data mining approaches for intrusion detection in network logs, using techniques such as genetic
algorithms, neural networks, clustering algorithms, etc. Some of those techniques yield good results,
yet requiring a very large number of attributes gathered by network traffic to detect useful
information. In this work we apply and evaluate some data mining techniques (K-Nearest Neighbors,
Artificial Neural Networks and Decision Trees) in a reduced number of attributes on some log data
sets acquired from a real network and a honeypot, in order to classify traffic logs as normal or
suspicious. The results obtained allow us to identify unlabeled logs and to describe which attributes
were used for the decision. This approach provides a very reduced amount of logs to the network
administrator, improving the analysis task and aiding in discovering new kinds of attacks against
their networks.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Data Mining, Intrusion Detection, Information Assurance, and Data Networks Security 2007, 657007 (2007) https://doi.org/10.1117/12.719295
The motivation behind the fusion of Intrusion Detection Systems was the realization that with the increasing traffic and increasing complexity of attacks, none of the present day stand-alone Intrusion Detection Systems can meet the high demand for a very high detection rate and an extremely low false positive rate. Multi-sensor fusion can be used to meet these requirements by a refinement of the combined response of different Intrusion Detection Systems. In this paper, we show the design technique of sensor fusion to best utilize the useful response from multiple sensors by an appropriate adjustment of the fusion threshold. The threshold is generally chosen according to the past experiences or by an expert system. In this paper, we show that the choice of threshold bounds according to the Chebyshev inequality priciple performs better. This approach also helps to solve the problem of scalability and has the advantage of failsafe capability. This paper theoretically models the fusion of Intrusion Detection Systems for the purpose of proving the improvement in performance, supplemented with the empirical evaluation. The combination of complementary sensors is shown to detect more attacks than the individual components. Since the individual sensors chosen detect sufficiently different attacks, their result can be merged for improved performance. The combination is done in different ways like (i) taking all the alarms from each system and avoiding duplications, (ii) taking alarms from each system by fixing threshold bounds, and (iii) rule-based fusion with a priori knowledge of the individual sensor performance. A number of evaluation metrics are used, and the results indicate that there is an overall enhancement in the performance of the combined detector using sensor fusion incorporating the threshold bounds and significantly better performance using simple rule-based fusion.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Data Mining, Intrusion Detection, Information Assurance, and Data Networks Security 2007, 657008 (2007) https://doi.org/10.1117/12.719423
In this paper, a previously introduced data mining technique, utilizing the Mean Field Bayesian Data Reduction
Algorithm (BDRA), is extended for use in finding unknown data clusters in a fused multidimensional feature
space. In extending the BDRA for this application its built-in dimensionality reduction aspects are exploited for
isolating and automatically mining all points contained in each unknown cluster. In previous work, this approach
was shown to have comparable performance to the classifier that knows all cluster information when mining up
to two features containing multiple unknown clusters. However, unlike results shown in previous work based on
lower dimensional feature spaces, the results in this paper are based on utilizing up to twenty fused features.
This is due to improvements in the training algorithm that now mines for candidate data clusters by processing
all points in a quantized cell simultaneously. This is opposed to the previous method that processed all points
sequentially. This improvement in processing has resulted in a substantial reduction in the run time of the
algorithm. Finally, performance is illustrated and compared with simulated data containing multiple clusters,
and where the relevant feature space contains both correlated and uncorrelated classification information.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Data Mining, Intrusion Detection, Information Assurance, and Data Networks Security 2007, 657009 (2007) https://doi.org/10.1117/12.719917
In general, reflectance and spatial patterns characterize geospatial data. Current semantic-enabled framework image
retrieval systems for geospatial data extract primitive features based on color, texture (Spatial Gray Level Dependency -
SGLD matrices), and shape from the segmented homogenous region. However, the form of extracting textural
information is computationally expensive. The state-of-the-art image mining system for multimedia image archives uses
the wavelet transform for feature extraction to quickly and efficiently capture color and texture information. Since an
image consists of three bands, color information is captured by converting the RGB space into HSV space. Thus, a new
approach is required to capture the complete reflectance pattern, an important characteristic of geospatial data. This work
proposes a new method to perform fast coarse image segmentation using descriptors obtained by combining the 2Dwavelet
transform along the spatial axis and the Fourier transform along the spectral axis to capture color and texture
information for segmentation. These features are later on used for region-based retrieval in Earth observation data
archives. Compared to traditional techniques, result shows that the proposed method provides good retrieval accuracy in
terms of F-measure for land cover classes.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Data Mining, Intrusion Detection, Information Assurance, and Data Networks Security 2007, 65700A (2007) https://doi.org/10.1117/12.716973
A data mining procedure for automatic determination of fuzzy decision tree structure using a genetic program (GP) is
discussed. A GP is an algorithm that evolves other algorithms or mathematical expressions. Innovative methods for
accelerating convergence of the data mining procedure and reducing bloat are given. In genetic programming, bloat
refers to excessive tree growth. It has been observed that the trees in the evolving GP population will grow by a factor
of three every 50 generations. When evolving mathematical expressions much of the bloat is due to the expressions not
being in algebraically simplest form. So a bloat reduction method based on automated computer algebra has been
introduced. The effectiveness of this procedure is discussed. Also, rules based on fuzzy logic have been introduced into
the GP to accelerate convergence, reduce bloat and produce a solution more readily understood by the human user.
These rules are discussed as well as other techniques for convergence improvement and bloat control. Comparisons
between trees created using a genetic program and those constructed solely by interviewing experts are made. A new
co-evolutionary method that improves the control logic evolved by the GP by having a genetic algorithm evolve
pathological scenarios is discussed. The effect on the control logic is considered. Finally, additional methods that have
been used to validate the data mining algorithm are referenced.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Data Mining, Intrusion Detection, Information Assurance, and Data Networks Security 2007, 65700B (2007) https://doi.org/10.1117/12.724256
Increasing the accuracy of the portrayal of all of the elements of a simulation environment has long been a prime
goal of the modeling and simulation community; a goal that has remained far out of reach for many reasons. One
of the greatest hurdles facing simulation developers in the effort to increase simulation accuracy is the need to
segregate information across the entire simulation environment according to access restrictions in order to insure
the integrity, security, and reliability requirements imposed on the data. However, this need for segregation does
not mean that those with the highest access permissions should be forced to use multiple computers and displays to
integrate the information that they need or that intelligent agents should be restricted in their access to the
information that they need in order to adequately assist their human operators. In this paper, we present a potential
solution to the problem of integrating and segregating data, which is the use of virtual machine and virtual private
network technology in order to maintain segregation of data, control access, and control intercommunication.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Data Mining, Intrusion Detection, Information Assurance, and Data Networks Security 2007, 65700C (2007) https://doi.org/10.1117/12.717799
The central nervous system encodes information in sequences of asynchronously generated voltage spikes, but
the precise details of this encoding are not well understood. Thorpe proposed rank-order codes as an explanation
of the observed speed of information processing in the human visual system. The work described in this paper
is inspired by the performance of SpikeNET, a biologically inspired neural architecture using rank-order codes
for information processing, and is based on the retinal model developed by VanRullen and Thorpe. This model
mimics retinal information processing by passing an input image through a bank of Difference of Gaussian (DoG)
filters and then encoding the resulting coefficients in rank-order. To test the effectiveness of this encoding in
capturing the information content of an image, the rank-order representation is decoded to reconstruct an image
that can be compared with the original. The reconstruction uses a look-up table to infer the filter coefficients
from their rank in the encoded image. Since the DoG filters are approximately orthogonal functions, they are
treated as their own inverses in the reconstruction process.
We obtained a quantitative measure of the perceptually important information retained in the reconstructed
image relative to the original using a slightly modified version of an objective metric proposed by Petrovic. It is
observed that around 75% of the perceptually important information is retained in the reconstruction.
In the present work we reconstruct the input using a pseudo-inverse of the DoG filter-bank with the aim of
improving the reconstruction and thereby extracting more information from the rank-order encoded stimulus.
We observe that there is an increase of 10 - 15% in the information retrieved from a reconstructed stimulus as
a result of inverting the filter-bank.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Data Mining, Intrusion Detection, Information Assurance, and Data Networks Security 2007, 65700D (2007) https://doi.org/10.1117/12.720343
An analytical performance model for a generic secure messaging system is formulated as a multi-class
queuing network. The model includes assessment of the impact of security features such as secret key
encryption/ decryption, signature generation/verification, and certificate validation, on overall performance.
Findings of sensitivity analysis with respect to message rate, WAN transmission link, SSL encryption,
message size, and distance between servers is also presented. Finally, the description of how the model can
be adopted for making performance based architectural design options is outlined.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Data Mining, Intrusion Detection, Information Assurance, and Data Networks Security 2007, 65700E (2007) https://doi.org/10.1117/12.720168
This paper proposes a new approach to temporal trajectory analysis for clinical laboratory examinations. When
we select m laboratory examinations, their temporal evolution for one patient can be viewed as a trajectory
in m-dimensional space. Multiscale comparison technique can be applied for segmentation and calculation of
structural similarities of such trajectories. Then, clustering cna be applied to the calculated similarities for
classiffication of these trajectories. The proposed method was evaluated on hepatitis datasets, whose results show
that the clustering captured several interesting patterns for severe chronic hepatitis.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Data Mining, Intrusion Detection, Information Assurance, and Data Networks Security 2007, 65700F (2007) https://doi.org/10.1117/12.720169
This paper reports the results of temporal analysis of platelet (PLT) data in chronic hepatitis dataset. First
we briefly introduce a cluster analysis system for temporal data that we have developed. Second, we show the
results of cluster analysis of PLT sequences. Third, we show the results of PLT value-based temporal analysis
aiming at finding years for reaching F4, years elapsed between stages, and their relationships with virus types
and fibrotic stages. The results of cluster analysis indicate that the temporal courses of PLT can be grouped
into several patterns each of which presents similarity in average PLT level and increase/decrease trends. The
results of value-based analysis suggests that liver fibrosis may proceed faster in the exacerbating cases.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Data Mining, Intrusion Detection, Information Assurance, and Data Networks Security 2007, 65700G (2007) https://doi.org/10.1117/12.723321
This paper presents a study on Web-based learning support systems that is enhanced with two major subsystems: a Web-based
learning game and a learning-oriented Web search. The Internet and theWeb may be considered as a first resource for
students seeking for information and help. However, much of the information available online is not related to the course
contents or is wrong in the worse case. The search subsystem aims to provide students with precise, relative and adaptable
documents about certain courses or classes. Therefore, students do not have to spend time to verify the relationship of
documents to the class. The learning game subsystem stimulates students to study, enables students to review their studies
and to perform self-evaluation through a Web-based learning game such as a treasure hunt game. During the challenge
and entertaining learning and evaluation process, it is hoped that students will eventually understand and master the course
concepts easily. The goal of developing such a system is to provide students with an efficient and effective learning
environment.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Data Mining, Intrusion Detection, Information Assurance, and Data Networks Security 2007, 65700H (2007) https://doi.org/10.1117/12.707922
As a follow-up to our earlier model Autocorrel I, we have implemented a two-stage event correlation approach
with improved performance. Like Autocorrel I, the new model correlates intrusion detection system (IDS) alerts
to automate alert and incidents management, and reduce the workload on an IDS analyst. We achieve this
correlation by clustering similar alerts, thus allowing the analyst to only consider a few clusters rather than
hundreds or thousands of alerts. The first stage uses an artificial neural network (ANN)-based autoassociator
(AA). The AA's objective is to attempt to reproduce each alert at its output. In the process, it uses an
error metric, the reconstruction error (RE), between its input and output to cluster similar alerts. In order to
improve the accuracy of the system we add another machine-learning stage which takes into account the RE as
well as raw attribute information from the input alerts. This stage uses the Expectation-Maximisation (EM)
clustering algorithm. The performance of this approach is tested with intrusion alerts generated by a Snort IDS
on DARPA's 1999 IDS evaluation data as well as incidents.org alerts.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Data Mining, Intrusion Detection, Information Assurance, and Data Networks Security 2007, 65700I (2007) https://doi.org/10.1117/12.720454
Blogs represent an important new arena for knowledge discovery in open source intelligence gathering. Bloggers are a
vast network of human (and sometimes non-human) information sources monitoring important local and global events,
and other blogs, for items of interest upon which they comment. Increasingly, issues erupt from the blog world and into
the real world. In order to monitor blogging about important events, we must develop models and metrics that represent
blogs correctly. The structure of blogs requires new techniques for evaluating such metrics as the relevance, specificity,
credibility and timeliness of blog entries. Techniques that have been developed for standard information retrieval
purposes (e.g. Google's PageRank) are suboptimal when applied to blogs because of their high degree of exophoricity,
quotation, brevity, and rapidity of update. In this paper, we offer new metrics related for blog entry relevance,
specificity, timeliness and credibility that we are implementing in a blog search and analysis tool for international blogs.
This tools utilizes new blog-specific metrics and techniques for extracting the necessary information from blog entries
automatically, using some shallow natural language processing techniques supported by background knowledge
captured in domain-specific ontologies.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Data Mining, Intrusion Detection, Information Assurance, and Data Networks Security 2007, 65700J (2007) https://doi.org/10.1117/12.720891
One of the primary reasons for using the PSDF is the inherent data dimension reduction that is achieved through the
use of the projection-slice theorem, (PST). The use of the PST allows for a powerful technique to segment
informational content into lower dimensional spans while simultaneously providing a complete and naturally linked
data set. Shannon provides a formula for maximum information capacity in a channel that utilizes as the most
efficient informational coding independent data samples, as this spreads the spectrum evenly across the channel
band. This implies that independent information that represent a correlated set contains maximal information if the
"key" that represents the correlated-ness of the data is known, otherwise the independent data are purely random.
By using a novel Adaptive Grahm-Schmidt (AGS) procedure we form a method that identifies patterns in correlated
data for the removal of the inter-dependence and thereby maximization of information content per data sample. In
this work we subject the lower-dimensional data sets in the PSDF to the AGS to maximize the information content
of the PSDF and share some of our findings, results, and deductions.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Data Mining, Intrusion Detection, Information Assurance, and Data Networks Security 2007, 65700K (2007) https://doi.org/10.1117/12.720039
Annealing, in metallurgy and materials science, is a heat treatment wherein the microstructure of a material is
altered, causing changes in its properties such as strength and hardness. We define concept annealing as a lexical,
syntactic, and semantic expansion capability (the removal of defects and the internal stresses that cause term- and
phrase-based search failure) coupled with a directed contraction capability (semantically-related terms, queries, and
concepts nucleate and grow to replace those originally deformed by internal stresses). These two capabilities are
tied together in a control loop mediated by the information retrieval precision and recall metrics coupled with
intuition provided by the operator. The specific representations developed have been targeted at facilitating highly
efficient and effective semantic indexing and searching. This new generation of Find capability enables additional
processing (i.e. all-source tracking, relationship extraction, and total system resource management) at rates,
precisions, and accuracies previously considered infeasible. In a recent experiment, an order magnitude reduction in
time to actionable intelligence and nearly three orderss magnitude reduction in false alarm rate was achieved.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Data Mining, Intrusion Detection, Information Assurance, and Data Networks Security 2007, 65700L (2007) https://doi.org/10.1117/12.720153
The goal of this work is to investigate the applicability of two probabilistic approaches, namely Three-Way Aspect
Model (TWAPM) and Co-training, to retrieve images represented by multiple feature types from large image
collections. We test these approaches in a learning context via relevance feedback from user. Our experiments
show that Co-training may be a better choice than TWAPM for a Web-based Image Retrieval System.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Data Mining, Intrusion Detection, Information Assurance, and Data Networks Security 2007, 65700M (2007) https://doi.org/10.1117/12.718325
Granular Computing(GrC) is an emerging theory which simulates the process of human brain understanding and solving
problems. Rough set theory is a tool for dealing with uncertainty and vagueness aspects of knowledge model. SMLGrC
algorithm introduces GrC to classical rough set algorithms, and makes the length of the rules relatively short but it can
not process mass data sets. In order to solve this problem, based on the analysis of the hierarchical granular model of
information table, the method of Granular Distribution List(GDL) is introduced to generate granule, and a granular
computing algorithm(SLMGrC) is improved. Sample Covered Factor(SCF) is also introduced to control the generation
of rules when the algorithm generates conflicting rules. The improved algorithm can process mass data sets directly
without influencing the validity of SLMGrC. Experiments demonstrated the validity and flexibility of our method.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
Proceedings Volume Data Mining, Intrusion Detection, Information Assurance, and Data Networks Security 2007, 65700N (2007) https://doi.org/10.1117/12.716612
The data warehouse is a repository of information collected from multiple possibly heterogeneous autonomous
distributed databases. The information stored at the data warehouse is in form of views referred to as materialized
views. The selection of the materialized views is one of the most important decisions in designing a data warehouse.
Materialized views are stored in the data warehouse for the purpose of efficiently implementing on-line analytical
processing queries. The first issue for the user to consider is query response time. So in this paper, we develop
algorithms to select a set of views to materialize in data warehouse in order to minimize the total view maintenance cost
under the constraint of a given query response time. We call it query_cost view_ selection problem.
First, cost graph and cost model of query_cost view_ selection problem are presented. Second, the methods for
selecting materialized views by using random algorithms are presented. The genetic algorithm is applied to the
materialized views selection problem. But with the development of genetic process, the legal solution produced become
more and more difficult, so a lot of solutions are eliminated and producing time of the solutions is lengthened in genetic
algorithm. Therefore, improved algorithm has been presented in this paper, which is the combination of simulated
annealing algorithm and genetic algorithm for the purpose of solving the query cost view selection problem. Finally, in
order to test the function and efficiency of our algorithms experiment simulation is adopted. The experiments show that
the given methods can provide near-optimal solutions in limited time and works better in practical cases. Randomized
algorithms will become invaluable tools for data warehouse evolution.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.