A new set of gratings with medium resolution (R ∼ 7500) has been mounted on the LAMOST spectrographs, and the wavelength windows range in 490 ∼ 540nm and 640 ∼ 690 nm respectively for blue and red spectrograph arm. Commissioning observation has been conducted to test the survey based on 16 spectrographs and 4000 fibers. Meanwhile, a spectral analysis pipeline has been developing to get more precise stellar parameters, radial velocities and abundance of chemical elements. Instrument profiles are calculated for each fiber at each exposure according to emission lines both from arc lamp. A template grid spectra with R ∼ 7500 for fundamental parameter (Teff, logg, and [Fe/H] ) are selected from Elodie. During the commissioning observation, each star have been visited for several times, and a fraction targets include APOGEE, Kepler and PASTEL objects which have high precisely measured parameters. With the commissioning spectra, we can understand instrument performance, intrinsic precision of repeat observations, and the accuracy of the pipeline.
LAMOST is a 4m reflecting Schmidt telescope special designed for conducting multifiber spectroscopic survey with 4000 fibers. Fiber position errors greatly impact spectral data SNR. There are three groups of sources that contribute to fiber position errors: errors orthogonal to the optical axis of telescope, errors parallel to the optical axis, and the fiber tilt from the telescope optical axis. It is difficult to measure these errors, especially during the observation. In this poster, we propose an indirect method to calculate the total and systematic position errors for each individual fiber from spectra data by constructing a model of magnitude loss due to the fiber position error for the point source.
LAMOST is a special reflecting Schmidt telescope. LAMOST breaks through the bottleneck of the large scale spectroscopic survey observation with both large aperture (effective aperture of 3.6 - 4.9m) and wide field of view (5 degrees). It is an innovative active reflecting Schmidt configuration achieved by changing mirror surface continuously to achieve a series different reflecting Schmidt system in different moments. By using the parallel controllable fiber positioning technique, the focal surface of 1.75 meters in diameter accommodates 4000 optical fibers. Also, LAMOST has 16 spectrographs with 32 CCD cameras. LAMOST is the telescope of the highest spectrum acquiring rate. As a national large scientific project, LAMOST project was proposed formally in 1996. The construction was started in 2001 and completed in 2008. After commission period, LAMOST pilot survey was started in October 2011 and spectroscopic survey began in September 2012. From October 2011 to June 2013, LAMOST has obtained more than 2 million spectra of celestial objects. There are 1.7 million spectra of stars, in which the stellar parameters (effective temperature, surface gravity, metalicitiy and radial velocity) of more than 1 million stars was obtained. In the first period of spectroscopic survey of LAMOST, 5 million of stellar spectra will be obtained and will make substantial contribution to the study of the stellar astrophysics and the structure of the Galaxy, such as the spheroid substructure of the Galaxy, the galactic gravitational potential and the distribution of the dark matter in the Galaxy, the extremely metal poor stars and hypervelocity stars, the 3D extinction in the Galaxy, the structure of thin and thick disks of the Galaxy, and so on.
The study of quasars is of great importance to the formation and evolution of galaxies and the early history of
the universe, especially high redshift quasars. With the development and employment of large sky spectroscopic
survey projects (e.g. 2dF, SDSS), the number of quasars increases to more than 200,000. For improving the
efficiency of high-cost telescopes, careful selecting observational targets is necessary. Therefore various targeting
quasar algorithms are used and developed based on different data. We review them in detail. Some statistical
approaches are based on photometric color, variability, UV-excess, BRX, radio properties, color-color cut and so
on. Automated methods include support vector machines (SVMs), kernel density estimation (KDE), artificial
neural networks (ANNs), extreme-deconvolution method, probabilistic principal surfaces (PPS) and the negative
entropy clustering (NEC), etc. In addition, we touch upon some quasar candidate catalogues created by different
Astronomy steps into a fullwave and data-avalanche era. Astronomical data is measured by Terabyte, even
Petabyte. How to save, manage, analyze so massive data is an important issue in astronomy. In order to let
astronomers free of the data processing burden and expert in science, various valuable and convenient tools
(e.g. Aladin, VOSpec, VOPlot) are developed by VO projects. To suit this requirement, we develop a toolkit to
realize automated database creation, automated database index creation and cross-match. The toolkit provides
a good interface for users to apply. The cross-match task may be implemented between local databases, remote
databases or local database and remote database. The large-scale cross-match is also easily achieved. Moreover,
the speed for large-scale cross-match is rather satisfactory.
The large sky area multi-object fiber spectroscopic telescope (LAMOST) is an innovative reflecting schmidt
telescope, promising a very high spectrum acquiring rate of several ten-thousands of spectra per night. By using the
parallel controllable fiber positioning technique, LAMOST makes reconfiguration of fibers accurately according to
the positions of objects in minutes and fine adjusting the fibers. As a key problem, High precision positioning
detection of LAMOST fiber positioning unit has always been highly regarded and some detection schemes have
been proposed. Among these, active detection method, which determines the final accurate position of optical fiber
end with the help of lighting the fiber, has been most widely researched, but this kind of method could not be applied
in LAMOST real-time observation because it needs projecting light into fiber. A novel detection idea exploiting the
technique of template matching is presented in this paper. As we know, final position of a specific fiber end can be
easily inferred by its corresponding revolving angles of the central revolving axle and bias revolving axle in double
revolving style, so the key point in this problem is converted to the accurate determination of these revolving angles.
Template matching technique are explored to acquire the matching parameters for its real-time collected imagery,
and thus determine the corresponding revolving angle of the central revolving axle and bias revolving axle
respectively. Experiments results obtained with data acquired from LAMOST site are used to verify the feasibility
and effectiveness of this novel method.
The all sky spectoscopic survey is very important both in extra-galactic and Galactic studies. The Large-Sky-Area
Multi-object Fiber Spectroscopic Telescope (LAMOST) has successfully completed its engineering work and
inaugurated in October of 2008. Now it is in the commissioning stage. In pursuit of the all sky spectroscopic survey, a
southern LAMOST is proposed. Tecnically, the Southern LAMOST will be mainly a copy of present LAMOST in
Xinglong, China, which is located at about latitute +40 degrees. Modifications are to be made for much better image
quality and thinner optical fibers to match with the better seeing condition in the Southern site. There will be 6000 or
8000 optical fibers used on the focal surface to get the highest spectrum acquiring rate, and will be equipted with about
12 to 16 spectrographs with 24 to 32 CCD cameras. Southern LAMOST is going to be built by international
We present a comparative study of implementation of supervised classification algorithms on classification of
celestial objects. Three different algorithms including Linear Discriminant Analysis (LDA), K-Dimensional Tree
(KD-tree), Support Vector Machines (SVMs) are used for classification of pointed sources from the Sloan Digital
Sky Survey (SDSS) Data Release Seven. All of them have been applied and tested on the SDSS photometric
data which are filtered by stringent conditions to make them play the best performance. Each of six performance
metrics of SVMs can achieve very high performance (99.00%). The performances of KD-tree are also very good
since six metrics are over 97.00%. Although five metrics are more than 90.00%, the performances of LDA
are relatively poor because the accuracy of positive prediction only reaches 85.98%. Moreover, we discuss what
input pattern is the best combination of different parameters for the effectiveness of these methods, respectively.
Based on survey databases from different bands, we firstly employed random forest approach for feature selection
and feature weighting, and investigated support vector machines (SVMs) to classify quasars from stars.
Two sets of data were used, one from SDSS, USNO-B1.0 and FIRST (short for FIRST sample), and another
from SDSS, USNO-B1.0 and ROSAT (short for ROSAT sample). The classification results with different data
were compared. Moreover the SVM performance with different features was presented. The experimental result
showed that the accuracy with FIRST sample was superior to that with ROSAT sample, in addition, when
compared to the result with original features, the performance using selected features improved and that using
weighted features decreased. Therefore we consider that while SVMs is applied for classification, feature
selection is necessary since this not only improves the performance, but also reduces the dimensionalities. The
good performance of SVMs indicates that SVMs is an effective method to preselect quasar candidates from
We investigate two methods: kernel regression and nearest neighbor algorithm for photometric redshift estimation
with the quasar samples from SDSS (the Sloan Digital Sky Survey) and UKIDSS (the UKIRT Infrared Deep Sky
Survey) databases. Both kernel regression and nearest neighbor algorithm belong to the family of instance-based
learning algorithms, which store all the training examples and "delay learning" until prediction time. The major
difference between the two algorithms is that kernel regression is a weighted average of spectral redshifts of the
neighbors for a query point while nearest neighbor algorithm utilizes the spectral redshift of the nearest neighbor
for a query point. Each algorithm has its own advantage and disadvantage. Our experimental results show that
kernel regression obtains more accurate predicting results, and nearest neighbor algorithm shows its superiority
especially for more thinly spread data, e.g. high redshift quasars.
K-Nearest Neighbor (kNN) algorithm is one of the simplest and most flexible and effective classification algorithms,
which has been widely used in many fields. Using the multi-band samples extracted from large surveys
of SDSS DR7 and UKIDSS DR3, we investigate the performance of kNN with different combinations of colors to
select quasar candidates. The color histograms of quasars and stars is helpful to select the optimal input pattern
for the classifier of kNN. The best input pattern is (u-g, g-r, r-i, i-z, z-Y, Y-J, J-H, H-K, Y-K, g-z).
In our case, the performance of kNN is assessed by different performance metrics, which indicate kNN has rather
high performance for discriminating quasars from stars. As a result, kNN is an applicable and effective method
to select quasar candidates for large sky survey projects.
The k Nearest Neighbor (kNN) algorithm is an effective classification approach in the statistical methods of
pattern recognition. But it could be a rather time-consuming approach when applied on massive data, especially
facing large survey projects in astronomy. NVIDIA CUDA is a general purpose parallel computing architecture
that leverages the parallel compute engine in NVIDIA graphics processing units (GPUs) to solve many complex
computational problems in a fraction of the time required on a CPU. In this paper, we implement a CUDAbased
kNN algorithm, and compare its performance with CPU-only kNN algorithm using single-precision and
double-precision datatype on classifying celestial objects. The results demonstrate that CUDA can speedup
kNN algorithm effectively and could be useful in astronomical applications.
The Large Sky Area Multi-Object Fiber Spectroscopic Telescope (LAMOST) started its test observation in the
past year (2009). The spectroscopic reduction and analysis software is avalible for the determination of spectral
classifications, redshifts and the fundamental stellar atmospheric parameters (effective temperature, surface
gravity, and metallicity). The analysis results show some systematic errors that need to be calibrated. We will
present the results of these calibrations in this paper. Comparing with known objects observed by the Sloan Digital
Sky Survey (SDSS), we can calibrate the redshifts of LAMOST galaxy spectra. Results from external spectral
analysis software, the Sloan Extension for Galactic Exploration and Understanding (SEGUE) Stellar Parameter
Pipeline (SSPP), will be applied to check the accuracy of the radial velocity (RV) measurement. Meanwhile,
the atmospheric parameters of LAMOST stellar spectra are compared with known objects for calibrating our
We employ k-nearest neighbor algorithm (KNN) for photometric redshift measurement of quasars with the Fifth
Data Release (DR5) of the Sloan Digital Sky Survey (SDSS). KNN is an instance learning algorithm where
the result of new instance query is predicted based on the closest training samples. The regressor do not use
any model to fit and only based on memory. Given a query quasar, we find the known quasars or (training
points) closest to the query point, whose redshift value is simply assigned to be the average of the values of its k
nearest neighbors. Three kinds of different colors (PSF, Model or Fiber) and spectral redshifts are used as input
parameters, separatively. The combination of the three kinds of colors is also taken as input. The experimental
results indicate that the best input pattern is PSF + Model + Fiber colors in all experiments. With this pattern,
59.24%, 77.34% and 84.68% of photometric redshifts are obtained within ▵z < 0.1, 0.2 and 0.3, respectively. If
only using one kind of colors as input, the model colors achieve the best performance. However, when using two
kinds of colors, the best result is achieved by PSF + Fiber colors. In addition, nearest neighbor method (k = 1)
shows its superiority compared to KNN (k ≠ 1) for the given sample.
We introduce an automated method called Support Vector Machines (SVMs) for quasar selection in order to
compile an input catalogue for the Large Sky Area Multi-Object Fiber Spectroscopic Telescope (LAMOST)
and improve the efficiency of its 4000 fibers. The data are adopted from the Sloan Digital Sky Survey (SDSS)
Data Release Seven (DR7) which is the latest world release now. We carefully study the discrimination of
quasars from stars by finding the hyperplane in high-dimensional space of colors with different combinations
of model parameters in SVMs and give a clear way to find the optimal combination (C<sub>-+</sub> = 2, C<sub>+-</sub> = 2,
kernel = RBF, gamma = 3.2). Furthermore, we investigate the performances of SVMs for the sake of
predicting the photometric redshifts of quasar candidates and get optimal model parameters of (w = 0.001,
C<sub>-+</sub> = 1, C<sub>+-</sub> = 2, kernel = RBF, gamma = 7.5) for SVMs. Finally, the experimental results show that the
precision and the recall of SVMs for separating quasars from stars both can be over 95%. Using the optimal
model parameters, we estimate the photometric redshifts of 39353 identified quasars, and find that 72.99% of
them are consistent with the spectroscopic redshifts within |▵z| < 0.2. This approach is effective and applicable
for our problem.
Facing very large and frequently high dimensional data in astronomy, effectiveness and efficiency of algorithms
are always the hot issue. Excellent algorithms must avoid the curse of dimensionality and simultaneously should
be computationally efficient. Adopting survey data from optical bands (SDSS, USNO-B1.0) and radio band
(FIRST), we investigate feature weighting and feature selection by means of random forest algorithm. Then
we employ a kd-tree based k-nearest neighbor method (KD-KNN) to discriminate quasars from stars. Then
the performance of this approach based on all features, weighted features and selected features are compared.
The experimental result shows that the accuracy improves when using weighted features or selected features.
KD-KNN is a quite easy and efficient approach to nonparametric classification. Obviously KD-KNN combined
with random forests is more effective to separate quasars from stars with multi-wavelength data.
With the large-scale multicolor photometry and fiber-based spectroscopy projects carried out, millions of uniform
samples are available to the astronomers. Based on this situation, we have developed an automatic system to
estimate photometric redshifts for both galaxies and quasars. In this paper we give an exhaustive introduction
of the system. We first describe a series of methods integrated in this system, such as template fitting, color-magnitude-redshift relation, polynomial regression, support vector machines and kernel regression. The merits
and demerits of these approaches have been indicated. Therefore, users can choose some suitable algorithm to
estimate photometric redshifts according to data characteristics and science requirements. Then, we present
a case study to illustrate how the system works. In order to build a more robust system of increasing the
accuracy and speed of photometric redshift estimation, we pay special attention to algorithm choice and data
preparation. From the user's viewpoint, an easy used interface will be provided. Finally, we point out the
promising techniques of measuring photometric redshifts and the application prospects of this system. In the
future, the system will become an essential tool for automatedly determining photometric redshifts in the study
of the large-scale structure of the Universe and the formation and evolution of galaxies.
The Sloan Digital Sky Survey (SDSS) is an ambitious photometry and spectra project, providing huge and
abundant samples for photometric redshift estimation. We employ polynomial regression to estimate photometric
redshifts using 330,000 galaxies with known spectroscopic redshifts from SDSS Release Four spectroscopic catalog,
and compare three polynomial regressionmethods, i.e. linear regression, quadratic regression and cubic regression
with different samples. This technique gives absolute convergence in a finite number of steps, represents better
fit with fewer coefficients and yields the result as a mathematical expression. This method is much easier to
use and understand than other empirical methods for astronomers. Our result indicates that equally or more
powerful accuracy is provided, moreover, the best r.m.s. dispersion of this approach is 0.0256. In addition, the
comparison between our results with other works is addressed.
With the construction and development of ground-based and space-based observatories, astronomical data
amount to Terascale, even Petascale. How to extract knowledge from so huge data volume by automated methods
is a big challenge for astronomers. Under this situation, many researchers have studied various approaches
and developed different softwares to solve this issue. According to the special task of data mining, we need
to select an appropriate technique suiting the requirement of data characteristics. Moreover all algorithms
have their own pros and cons. We introduce the characteristics of astronomical data, present the taxonomy
of knowledge discovery, and describe the functionalities of knowledge discovery in detail. Then the methods
of knowledge discovery are touched upon. Finally the successful applications of data mining techniques in astronomy
are summarized and reviewed. Facing data avalanche in astronomy, knowledge discovery in databases
(KDD) shows its superiority.
Data avalanche faced in astronomy, astronomical data covers from radio, infrared, optical, X-ray, even gamma
ray band. Astronomy enters an all sky-survey era. Transforming data into knowledge depends on data mining
techniques. How to effectively and efficiently extract knowledge from databases is an important issue. Especially
mining knowledge from different bands or multiband is of great significance. In this paper, we design a system
which includes four fundamental blocks: the first is used to create databases; the second for cross-matching
objects from different bands, the third for mining knowledge from the large data volume and the last one for
final result evaluation. The functionalities of the four blocks are described. The cross-match results are divided,
and the analysis mode for each of them is touched upon. Moreover the schemes of classification, regression,
clustering analysis and outlier detection are demonstrated.
The advantages of being able to accurately measure redshift with photometric data are of great importance
for studying cosmology, large scale structure of the Universe, determination of fundamental astrophysical quantities
and so on, because photometric redshifts may provide approximate distances to the enormous set of
objects. At present various algorithms for photometric redshifts have been investigated. This is induced us
to develop a software platform that integrates different algorithms of estimating photometric redshifts, such
as color-magnitude-redshift (CMR), Support Vector Machines (SVMs), HyperZ and Artificial Neural Networks
(ANNs). The requirements of the software platform, architectural issues are addressed and its framework design
implemented are discussed. It provides a user-friendly interface, by which users can choose the method they
like, upload their own data, and then get their needed result by clicking a mouse. This framework is flexible and
extensible enough to measure photometric redshifts.
The federation of data from distributed locations, different archives and different wavelengths can lead to new discoveries. Moreover, it is an important part of functions of the Virtual Observatory. We review the technical challenges involved in this issue, and develop a system which majors in providing a robust framework to efficiently extract the data from different sources into a science-grade data for the convenient use of astronomers. The system consists of several tasks wrapped together into an integrated framework. The tasks include: the automated creation of database, the rapid query of catalogs, cross-match query and the visualization of the queried results. Especially for cross-matching service, many choices are provided for users, such as one-to-one entry, one-to-many
entry, one-to-none entry, none-to-one entry. Meanwhile, the probability of cross-matching is given. In addition, users may select the attributes and the range of attributes according to their requirements. We will further improve the system in various respects according to the standards of the IVOA.
A new application framework for virtual observatory (VO) is designed for discovering unknown knowledge from thousands of astronomical catalogs which have already released and are accessible through VO services. The framework consist of two new technologies to seamlessly associate data queried from SkyNode supported databases with data mining (DM) algorithms, which either come from third-party software or are developed directly above the framework. The first one is a high level programming language, called Job Description Language (JDL), for describing jobs for data accessing and numerical computation based on web services. The second technology is a computation component standard with both local and web service invocation interface, which is named as CompuCell. It is a universal solution for integrating arbitrary third-party DM software into the framework so as to invoke them directly in JDL program. We implement a prototype with a JDL supported portal and achieve clustering algorithm in CompuCell components. We combine a series of data mining procedures with a data access procedure by programming in JDL on the portal. A scientific research, which recognizes OB associations from 2MASS catalog, is treated as a demonstration for the prototype. It confirms the feasibility of the application framework.
The Large-Sky-Area Multi-object Fiber Spectroscopic Telescope (LAMOST) put forward by Shou-guan Wang and Ding-qiang Su is a special reflecting Schmidt telescope with the spherical mirror fixed and the correcting plate acts as both correcting plate and tractor. The correcting plate is installed on an alt-azimuth mounting and its aspherical figure is variable to meet the requirement for eliminate the spherical aberration of the spherical primary mirror when it is at variant orientations during the observation course and for different sky area. With LAMOST, both large aperture and large field of view can been obtained. Benefited from the LAMOST design and practice, a LAMOST-type telescope for full-sky survey is conceived for the Antarctic. Because of the favorable seeing condition and all-winter continuous observation, a telescope with aperture of the 2-m could be equivalent to the 4-m LAMOST. We preliminarily considered a 2-m telescope with a primary focus and a Cassegrain focus. The f-ratio of 5 and FOV 3-degree for the primary focus, and f-ratio of 15 and 8 minutes FOV with the diffraction limited image for the Cassegrain focus. In this paper, the scientific goals, the optical system of the telescope, particular material and technique which are applicable under the extreme low temperature condition at the Antarctic are described.
From 2006 to 2008, all sub-mirrors and instruments of LAMOST will be installed gradually until fully completion. Before all sub-mirrors and instruments installed, LAMOST team planed a temporary scheme in order to do some testing observations. The plan will start from the beginning of 2007, and the part LAMOST will have 3×3 Mirrors (3 sub Ma and 3 sub Mb) with 1.25 degree field, and 250 fibers on its focal plane at that time. We are planning a set of observations during the engineering process, which includes small amount of stars. The spectral resolution will be 10000 and 2000, and the amount of spectra in the data set will reach several thousands. By using these data, we can improve our techniques of automated reduction and analyzing. For example, in order to test our software, physical parameters of a small proportion of stars such as Vr, Teff, log g, [Fe/H], [α/Fe] should be compared with Sloan Digital Sky Survey (SDSS). If the results are precise enough, the parameters of more stars could be applied to do some research, such as searching for star stream, studying star clusters in our Galaxy, and searching for poor-metal star etc.
Atmospheric dispersion and differential refraction will lead to a non-neglectable light loss varying with wavelength across the field of view during the integral time of multi-fiber spectral observation.
These effects will be more severe in telescopes with large field of view such as the Large sky Area Multi-Object Fiber Spectroscopic Telescope(LAMOST). We have calculated the light loss due to atmospheric refraction for LAMOST. To improve the efficiency, we could move individual fiber to correct its 5000 A monochromatic displacement during observation, the best number of this kind of track should be balanced between the displacement correction and the fiber positioner error. We have calculated the best track number by
monte-carlo simulation for different declination field and different position of the field. Also we have derived the fiber positioner error tolerance from simulation.
Astronomical data sets have experienced an unprecedented and
continuing growth in the volume, quality, and complexity over the
past few years, driven by the advances in telescope, detector, and
computer technology. Like many other fields, astronomy has become
a very data rich science. Information content measured in multiple
Terabytes, and even larger, multi Petabyte data sets are on the
horizon. To cope with this data flood, Virtual Observatory (VO)
federates data archives and services representing a new
information infrastructure for astronomy of the 21st century and
provides the platform to science discovery. Data mining promises
to both make the scientific utilization of these data sets more
effective and more complete, and to open completely new avenues of
astronomical research. Technological problems range from the
issues of database design and federation, to data mining and
advanced visualization, leading to a new toolkit for astronomical
research. This is similar to challenges encountered in other data
intensive fields today. Outlier detection is of great importance,
as one of four knowledge discovery tasks. The identification of
outliers can often lead to the discovery of truly unexpected
knowledge in various fields. Especially in astronomy, the great
interest of astronomers is to discover unusual, rare or unknown
types of astronomical objects or phenomena. The outlier detection
approaches in large datasets correctly meet the need of
astronomers. In this paper we provide an overview of some
techniques for automated identification of outliers in
multivariate data. Outliers often provide useful information.
Their identification is important not only for improving the
analysis but also for indicating anomalies which may require
further investigation. The technique may be used in the process of
data preprocessing and also be used for preselecting special
Virtual Observatory (VO) is a collection of interoperating data archives and software tools. Taking advantages of the latest information technologies, it aims to provide a data-intensively online research environment for astronomers all around the world.
A large number of high-qualified astronomical software packages and libraries are powerful and easy of use, and have been widely used by astronomers for many years. Integrating those toolkits into the VO system is a necessary and important task for the VO developers.
VO architecture greatly depends on Grid and Web services, consequently the general VO integration route is "Java Ready – Grid Ready – VO Ready". In the paper, we discuss the importance of VO integration for existing toolkits and discuss the possible solutions. We introduce two efforts in the field from China-VO project, "gImageMagick" and "Galactic abundance gradients statistical research under grid environment". We also discuss what additional work should be done to convert Grid service to VO service.
The important step of data preprocessing of data mining is feature
selection. Feature selection is used to improve the performance of
data mining algorithms by removing the irrelevant and redundant
features. By positional cross-identification, the multi-wavelength
data of 1656 active galactic nuclei (AGNs), 3718 stars, and 173
galaxies are obtained from optical (USNO-A2.0), X-ray (ROSAT), and
infrared (Two Micron All- Sky Survey) bands. In this paper we
applied a kind of filter approach named ReliefF to select features
from the multi-wavelength data. Then we put forward the naive
Bayes classifier to classify the objects with the feature subsets
and compare the results with and without feature selection, and
those with and without adding weights to features. The result
shows that the naive Bayes classifier based on ReliefF algorithms
is robust and efficient to preselect AGN candidates.
The Large sky Area Multi-Object fibre Spectroscopic Telescope will
yield 10 million spectra of a wide variety of objects including
QSOs, galaxies and stars. The data archive of one-dimensional
spectra, which will be released gradually during the survey, is
expected to exceed 1 terabyte in size. This archive will enable
astronomers to explore the data interactively through a friendly
user interface. Users will be able to access information related
to the original observations as well as spectral parameters
computed by means of an automated data-reduction pipeline. Data
mining tools will enable detailed clustering, characterization and
classification analyses. The LAMOST data archive will be made
publicly available in the standard data format for Virtual
Observatories and in a form that will be fully compatible with
future Grid technologies.
The Large Sky Area Multi-Object Fibre Spectroscopic Telescope
(LAMOST) will be set up and tested. A fully automated software
system for reducing and analyzing the spectra has to be developed
before the telescope finished. Requirement analysis has been made
and data model has been designed. The software design outline is
given in this paper, including data design, architectural and
component design and user interface design, as well as the
database for this system. This paper also shows an example of
algorithm, PCAZ, for redshift determination.
Until now, it is still difficult to identify different kinds of celestial bodies depending on their spectra, because it needs a lot of astronomers’ manual work of measuring, marking and identifying, which is generally very hard and time-consuming. And with the exploding spectral data from all kinds of telescopes, it is becoming more and more urgent to find a thoroughly automatic way to deal with such a kind of problem. In fact, when we change our viewpoint, we can find that it is a traditional problem in pattern recognition field when considering the whole process of dealing with spectral signals: filtering noises, extracting features, constructing classifiers, etc.
The main purpose for automatic classification and recognition of spectra in LAMOST (Large Sky Area Multi-Object Fibre Spectroscopic Telescope) project is to identify a celestial body’s type only based on its spectrum. For this purpose, one of the key steps is to establish a good model to describe all kinds of spectra and thus it will be available to construct some excellent classifiers.
In this paper, we present a novel describing language to represent spectra. And then, based on the language, we use some algorithms to extract classifying rules from raw spectra datasets and then construct classifiers to identify spectra by using rough set method. Compared with other methods, our technique is more similar to man’s thinking way, and to some extent, efficient.
In order to explore the spectral energy distribution of various objects in a multidimensional parameter space, the multiwavelenghth data of quasars, BL Lacs, active galaxies, stars and normal galaxies are obtained by positional cross-identification, which are from optical(USNO A-2), X-ray(ROSAT), infrared(2MASS) bands. Different classes of X-ray emitters populate distinct regions of a multidimensional parameter space. In this paper, an automatic classification technique called Support Vector Machines(SVMs) is put forward to classify them using 7 parameters and 10 parameters. Finally the results show SVMs is an effective method to separate AGNs from stars and normal galaxies with data from optical, X-ray bands and with data from optical, X-ray, infrared bands. Furthermore, we conclude that to classify objects is influenced not only by the method, but also by the chosen wavelengths. Moreover it is evident that the more wavelengths we choose, the higher the accuracy is.
In this paper, we investigate the Principal Component Analysis-Optimal Discrimination Plane (PCA-ODP) approach on a data set of galaxy spectra including eleven standard subtypes with the redshift value ranging from 0 to 1.2 and a span of 0.001. These eleven subtypes are E, S0, Sa, Sb, Sc, SB1, SB2, SB3, SB4, SB5, SB6, respectively, according to the Hubble sequence. Among them, the first four subtypes belong to the class of normal galaxies (NGs); the remaining seven belong to active galaxies (AGs). We apply the PCA approach to extract the features of galaxy spectra, project the samples onto the PCs, and investigate the ODP method on the data of feature space to find the optimal discrimination plane of the two main classes. ODP approach was developed from Fisher's linear discriminant method. The difference between them is that Fisher's method uses only one Fisher's vector and ODP uses two orthogonal vectors including Fisher's vector and another. Besides the data set above, we also use the Sloan Digital Sky Survey (SDSS) galaxy spectra and Kennicutt (1992) galaxy data to test the ODP classifier. The experiment results show that our proposed technique is both robust and efficient. The correct rate can reach as high as 99.95% for the first group data, 96% for SDSS data and 98% for Kennicutt data.
In this paper, we present a novel technique for redshift identification. Redshift is a key parameter of celestial spectrums. In the literature, there are few reports on redshift identification due to either no many people working on the problem or perhaps industrial confidentiality. Our technique is a pseudo-triangle technique. It consists of the following three major steps: firstly, the 3 wavelengths corresponding to the 3 highest intensity values of an unknown spectrum are selected to construct a pseudo-triangle, and the largest angle of this triangle is calculated which is independent of redshift value. Secondly, the obtained angle is used as an index to retrieve the corresponding 3 model wavelengths via a pre-calculated look-up-table, which is composed of all the combinations of all the feature wavelengths of the model spectrum. And finally based on the 3 corresponding wavelengths, the corresponding redshift value is derived. The main characteristic of our technique is its simplicity and efficiency, which is demonstrated by experiments on simulated data as well as on real celestial spectrums. It is shown that the correct identification rate can reach as high as 86.7%. Taking into account the high noisy nature of celestial spectrums, such a result is considered a good one.
Stellar spectra classification is an indispensable part of any workable automated recognition system of celestial bodies. Like other celestial spectra, stellar spectra are also extremely noisy and voluminous; consequently, any acceptable technique of classification must be both computationally efficient and robust to structural noise. In this paper, we propose a practical stellar spectral classification technique which is composed of the following three steps: In the first step, the Haar wavelet transform is used to extract spectral lines, then followed by a de-noising process by the hard thresholding in the wavelet field. As a result, in the subsequent steps, only those extracted spectral lines are used for classification due to the high reliability of spectral lines with respect to the continuum. In the second step, the Principal Component Analysis (PCA) is employed for optimal data compression. More specifically, we use 165 well-selected samples from 7 spectral classes of stellar spectra to construct the 'eigen-lines spectra' by PCA. Thirdly, unknown spectra are projected to the eigen-subspace defined by the above eigen-lines spectra, and then a fuzzy c-means algorithm is used for the final classification. The experiments show that our new technique is both robust and efficient.
In this paper we introduce the OCS (Observatory Control System) of the LAMOST (Large Sky Area Multi-Object Fiber Spectroscopic Telescope), which will survey more than ten million galaxies and stars to get their spectra after 2004. The OCS will operate the TCS (Telescope Control System) and ICS (Instrument Control System) in real-time to accomplish spectroscopic observations. Each observation could obtain spectra of about 4000 objects simultaneously and the amount of raw data per night is 2 - 3 gigabytes. The OCS will also handle the observational schedules and the data processing, which is called as the DHS (Data Handling System). The propose of the OCS is to make whole observation activity (including object selection, observational scheduling, observation at telescope, data processing, data archiving and so on) automatically and to gain scientific return more efficiently. The OCS is a software system connected with TCS, ICS and DHS by the computer networks. It involves many advanced information techniques, such as network, communication, database, web, GPS, et al.