Big Research Data and Data Science

Jianhui Li

CODATA China was very pleased to organize the 1st Scientific Data Conference – Scientific Research Big Data and Data Science, which was held in Beijing, China on 24–25 February 2014. Participants included more than 400 domestic experts, scholars and students from universities, research institutions and industry.

Group Photo of 1st Scientific Data Conference.

The Conference aimed to improve understanding of the central issues in the era of Big Data, to promote multidisciplinary communication and collaboration, to help the development of young data scientists, to encourage the revitalization of traditional research approaches and to contribute to and support the Chinese national strategy to promote innovation. Around the world, there is talk of a ‘data revolution’ – this conference aimed to place China at the forefront of this revolution, providing the communication, skills and training to help Chinese scientists seize the opportunities of Big Data and ‘ride the wave’ of increasing data volumes, velocity and variety. Spanning two days, the conference featured two plenary sessions and fourteen breakout sessions. There were four major keynotes, three invited reports and three reports on projects and initiatives. The keynote lectures focused on the hot issues in the Big Data era, including integration and notation of Big Data, development opportunities and major challenges for science and technology. The breakout sessions also included technical sessions and open forums, with topics including:

Big data applications in earth and spatial science
Big data analysis and processing technology
Life science and medicine data and
applications
Materials science data and applications
Big data technologies and applications in agriculture and rural informatization
Linked data and information recommendation
Scientific data visualization
Physics and chemistry data and applications
Cloud computing and data discovery
DOI registration and release
Open forum: data science and data scientists

Academiacan GUO Huadong made keynote lecture.

Plenary Session.

Breakout Session.

At the closing ceremony Prof. LI Jianhui, secretary-general of CODATA-China, made a concluding speech thanking the participants and described CODATA-China’s aspirations for the future. He also announced the forthcoming book Scientific Discovery in Big Data Era (now published) and that the 2nd Scientific Data Conference - Data, Science and the Silk Road Economic Belt would be held in Lanzhou in August 2015. The conference was one of the most successful academic activities to have been convened in the 30 years since CODATA-China was set up.

This collection of the Data Science Journal brings together the papers presented at this conference, providing insight into the current state of the art in China for data. The papers were selected and reviewed by the conference organizers as a fitting and representative collection of the discussions and insights presented at the conference. The organizers would like in particular to thank Diane Rumble for her assiduous editorial work for this special conference collection.

The selected papers consist of research work in three categories: 1) common technologies in data science, 2) data science research and 3) applications of data science in different domains.

Seven of the selected papers are concerned with the critical need for common technologies in data science:

construct a dynamic assessment process of data quality in the big data era;
propose search strategies using synchronous search and asynchronous search to conquer the problems of search speed difference and to combine the two separate search results;
introduce OpenCSDB, which is a solution of applying Linked Data in the Scientific Database, which will promote the sharing of scientific data and play a greater role in the “Twelfth Five-Year” program;
propose an invocation sequence of web service composition and its related invocation policies based on Petri net and analysis of structural relationships;
An approach is proposed by to increase the accuracy and efficiency of seeding algorithms of magnetic flux lines in magnetic field visualization;
propose an ontology-based agricultural knowledge fusion method to enhance identification and fusion of new and existing data sets to make big data analytics more possible;
introduce the new discipline of Data Science, which provides a type of novel
research method (data-intensive method) for natural and social sciences and takes research on data largely beyond computer science.

Data science has attracted the attention of researchers from different domains:

propose a computational method to integrate both biomedical scientific data and literature for drug discovery and new uses of existing drugs;
describe DarwinTree, which provides an integrated bioinformatics platform that supports all phases of the analytical pathway for phylogenetic study from data collection, phylogenetic tree construction, visualization of the tree of life and web-based rendering, and to specific application service & data mining;
introduce data mining software and tools being applied in big data issues of Astrostatistics and Astroinformatics;
introduce the National Rural Comprehensive Information Service Platform (NRCISP), supported by national science and technology support program.

There are novel applications in data science and related research areas:

review recent advances of geophysical data and geophysical informatics developed in China;
provide comfortable bus routes recommendation methods for passengers using an approach that combines multi-objective programming and a genetic algorithm in personalized information recommendation services;
propose a novel trigger model with data mining techniques for sales prediction;
offer an information filtering method and an aggregation model to provide a solution for how to choose appropriate experts for peer review;
introduce a novel personalization-oriented academic literature recommendation method to meet the user’s preference in multiple dimensions simultaneously;
propose an analysis architecture to make full use of data on natural environment corrosion of materials: the approach includes grey relational analysis, artificial neural network, fracture mechanics calculation;
A knowledge model for literature data mining is proposed by and it is applied to analyze the correlation between earthquake events and multidisciplinary data types.

These papers introduce the development of data science from a range of different perspectives. They help us get an overall understanding of the common or the most popular techniques used in data science research and applications. I hope you find them interesting and illuminating.

Prof. Li Jianhui

Secretary-General, CODATA-China

Director, Scientific Data Center, Computer Network information Center, Chinese Academy of Sciences

Data Science Journal

Editorial Content

Big Research Data and Data Science

References