With the development of information technology and artificial intelligence, speech synthesis plays a significant role in the fields of Human-Computer Interaction Techniques. However, the main problem of current speech synthesis techniques is lacking of naturalness and expressiveness so that it is not yet close to the standard of natural language. Another problem is that the human-computer interaction based on the speech synthesis is too monotonous to realize mechanism of user subjective drive. This thesis introduces the historical development of speech synthesis and summarizes the general process of this technique. It is pointed out that prosody generation module is an important part in the process of speech synthesis. On the basis of further research, using eye activity rules when reading to control and drive prosody generation was introduced as a new human-computer interaction method to enrich the synthetic form. In this article, the present situation of speech synthesis technology is reviewed in detail. Based on the premise of eye gaze data extraction, using eye movement signal in real-time driving, a speech synthesis method which can express the real speech rhythm of the speaker is proposed. That is, when reader is watching corpora with its eyes in silent reading, capture the reading information such as the eye gaze duration per prosodic unit, and establish a hierarchical prosodic pattern of duration model to determine the duration parameters of synthesized speech. At last, after the analysis, the feasibility of the above method is verified.
Georeferencing is one of the major tasks of satellite-borne remote sensing. Compared to traditional indirect methods, direct georeferencing through a Global Positioning System/inertial navigation system requires fewer and simpler steps to obtain exterior orientation parameters of remotely sensed images. However, the pixel shift caused by geographic positioning error, which is generally derived from boresight angle as well as terrain topography variation, can have a great impact on the precision of georeferencing. The distribution of pixel shifts introduced by the positioning error on a satellite linear push-broom image is quantitatively analyzed. We use the variation of the object space coordinate to simulate different kinds of positioning errors and terrain topography. Then a total differential method was applied to establish a rigorous sensor model in order to mathematically obtain the relationship between pixel shift and positioning error. Finally, two simulation experiments are conducted using the imaging parameters of Chang’ E-1 satellite to evaluate two different kinds of positioning errors. The experimental results have shown that with the experimental parameters, the maximum pixel shift could reach 1.74 pixels. The proposed approach can be extended to a generic application for imaging error modeling in remote sensing with terrain variation.
KEYWORDS: Radar, Visual process modeling, Data modeling, Imaging systems, Cameras, Calibration, Error analysis, Surveillance, Data processing, Global Positioning System
A test environment is established to obtain experimental data for verifying the positioning model which was derived previously based on the pinhole imaging model and the theory of binocular stereo vision measurement. The model requires that the optical axes of the two cameras meet at one point which is defined as the origin of the world coordinate system, thus simplifying and optimizing the positioning model. The experimental data are processed and tables and charts are given for comparing the positions of objects measured with DGPS with a measurement accuracy of 10 centimeters as the reference and those measured with the positioning model. Sources of visual measurement model are analyzed, and the effects of the errors of camera and system parameters on the accuracy of positioning model were probed, based on the error transfer and synthesis rules. A conclusion is made that measurement accuracy of surface surveillances based on binocular stereo vision measurement is better than surface movement radars, ADS-B (Automatic Dependent Surveillance-Broadcast) and MLAT (Multilateration).
KEYWORDS: Signal to noise ratio, Detection and tracking algorithms, Data modeling, Cameras, Sensors, Calibration, 3D modeling, Image registration, Coded apertures, Expectation maximization algorithms
A method has been proposed to estimate the fundamental matrix of a positing and monitoring binocular vision system with a long working distance and a large field of view. Because of the long working distance and large field of view, images grabbed by this system are seriously blurred, leading to a lack of local features. The edge points are acquired using the Canny algorithm firstly, then the pre-matched points are obtained by the GMM-based points sets registration algorithm, and eventually the fundamental matrix are estimated using the RANSAC algorithm. In actual application, two cameras are 2km away from the object, the fundamental matrix are figured out, and the distance between each point and the corresponding epipolar line is less than 0.8 pixel. Repeated experiments indicate that the average distances between the points and the corresponding epipolar lines are all within 0.3 pixel and the deviations of the distances are all within 0.3 pixel too. This method takes full advantage of the edges in the environment and does not need extra control points, whats more, it can work well in low SNR images.
Camera and system parameters calibration is an essential process for a distance measuring system using
binocular stereo vision. In usual ways of calibration, brightly colored objects are put at the center of the area of
surveillance to avoid big errors in measurement as targets are far away from the center. When calibration
objects are not allowed to be put in the sensing area of interest, they should be placed somewhere else
outside. A scheme of calibration is proposed for system parameters correction in super-size two dimensional
scale events sensing and positioning system using binocular stereo vision. The location of a reference point is
specified at the center of the surveillance area of about 200,000 meters square, and the coordinate of the
center is known in the physical world. Coordinates of cameras and calibration objects outside the surveillance
area are measured in physical world coordinates system. It is convenient to compute the angle between the
side connecting camera with the calibration object and the one connecting the same camera with the reference
point using the longitude and altitude values measured. When calibration is performed, orientation of the
camera is obtained and position of the object on the imaging plane is read out in pixels. Rotating the camera
with the angle computed above, the reference point would be on the optical axis of the camera in the ideal
case. The accuracy of the device for measuring the angle contributes to the error in aligning the optical axis
passing through the reference point. In the experiment, when putting a calibration object at the reference point,
the position of the object on an imaging plane could be read out in pixels. Comparing the difference of pixels
between the two orientations of the camera, errors caused by rotating the camera can be determined. When
another camera is configured to form a binocular stereo vision system, parameters are calibrated in the same
way. Theoretical analysis shows that the error caused by adjusting two cameras is limited by a shape that
approximates a quadrilateral. The area of the quadrilateral is determined by both the accuracy of the angle
measuring device and the distances between the cameras and the reference point. Comparison of theoretical
with experimental results is made, indicating the effectiveness of this scheme.
KEYWORDS: Eye, Visual process modeling, Cognitive modeling, Control systems, Signal processing, Mining, Human-computer interaction, Eye models, Systems modeling, Motion controllers
The technology of eye tracker has become the main methods of analyzing the recognition issues in human-computer
interaction. Human eye image capture is the key problem of the eye tracking. Based on further research, a new
human-computer interaction method introduced to enrich the form of speech synthetic. We propose a method of Implicit
Prosody mining based on the human eye image capture technology to extract the parameters from the image of human
eyes when reading, control and drive prosody generation in speech synthesis, and establish prosodic model with high
simulation accuracy. Duration model is key issues for prosody generation. For the duration model, this paper put forward
a new idea for obtaining gaze duration of eyes when reading based on the eye image capture technology, and
synchronous controlling this duration and pronunciation duration in speech synthesis. The movement of human eyes
during reading is a comprehensive multi-factor interactive process, such as gaze, twitching and backsight. Therefore,
how to extract the appropriate information from the image of human eyes need to be considered and the gaze regularity
of eyes need to be obtained as references of modeling. Based on the analysis of current three kinds of eye movement
control model and the characteristics of the Implicit Prosody reading, relative independence between speech processing
system of text and eye movement control system was discussed. It was proved that under the same text familiarity
condition, gaze duration of eyes when reading and internal voice pronunciation duration are synchronous. The eye gaze
duration model based on the Chinese language level prosodic structure was presented to change previous methods of
machine learning and probability forecasting, obtain readers’ real internal reading rhythm and to synthesize voice with
personalized rhythm. This research will enrich human-computer interactive form, and will be practical significance and
application prospect in terms of disabled assisted speech interaction. Experiments show that Implicit Prosody mining
based on the human eye image capture technology makes the synthesized speech has more flexible expressions.
In order to establish measurement basis in non-cooperative environment, this paper proposes an autonomous
position and posture servo tracking method based on laser light guided monocular vision. The light of a linear laser
projected on a plane was simulated as horizon basis, while the laser light modulated by the projection in the reference
plane was taken as servo target. The modulated position and posture change information of laser light was obtained by
monocular vision system, information from which the attitude angle of the laser light could be calculated. The attitude
angle was transmitted to parallel tracking platform in real time and controlled the movement of the platform according to
the laser light. The tracking angle parameters of parallel tracking platform under different position and posture were
verified using an inclinometer, which proved the validity and effectiveness of this method. As to the existing
measurement errors, this paper analyzed possible causes and provided with feasible suggestions to further improve the
precision of the system.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.