Professional Experience
• Executive Director of BIG Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences, China, 2016 - Present
• Professor in “100-Talent” Program of CAS, Beijing Institute of Genomics, Chinese Academy of Sciences (CAS), China, 2011 - Present
• Research Scientist, King Abdullah University of Science and Technology, Kingdom of Saudi Arabia, 2009 - 2011
• Postdoctoral Associate, Yale University, United States of America, 2007 - 2009


• PhD in Computer Science, Institute of Computing Technology, Chinese Academy of Sciences, China, 2007
• MS in Computer Science, Nanjing University of Science and Technology, China, 2004
• BS in Computer Science, Ningxia University, China, 2002


Research Interests
• Big Data Integration and Analytics
• Computational Molecular Evolution


Projects & Resources
• IC4R
• MethBank
• LncRNAWiki
• Database Commons
• KaKs_Calculator


Academic Activities
• Editorial Board Member: Biology Direct (2013—)
• Academic Editor: PLoS ONE (2012—)
• Associate Editor-in-Chief: Genomics, Proteomics & Bioinformatics (2012—)
• Journal Reviewer: Bioinformatics, Biology Direct, BioSystems, BMC Bioinformatics, BMC Evolutionary Biology, BMC Genomics, BMC Plant Biology, BMC Systems Biology, Briefings in Bioinformatics, Chinese Bulletin of Life Sciences, Current Bioinformatics, Database, Evolutionary Bioinformatics, Gene, Genome Biology, Genomics Proteomics & Bioinformatics, In Silico Biology, Integrative Zoology, Journal of Bioinformatics and Computational Biology, Journal of Molecular Evolution, Molecular Biology and Evolution, PLoS ONE, PLoS Pathogens, RNA
• Grant Referee: UK BBSRC
• Executive Committee Member: International Society for Biocuration (2016—)
• Membership: Genetics Society of China, International Society for Biocuration 



Big Data Integration and Analytics

Data Integration
The rapid advancements in high-throughput experiment technologies make biological data increasing at an unprecedentedly exponential rate. To answer the most important and complex biological questions, it is very often to involve the integration of diverse data from multiple data sources, which needs to harness collective contributions and build bioinformatic Web APIs for massive data integration.

Data Analysis
The fast-growing volume of biological data makes it imperative to develop time-efficient applications for large-scale data analysis. This requires utility of highly efficient computing technologies (e.g., cloud, parallel) and establishment of lightweight programming environment to make full use of computing resources as well as storage resources.

Data Sharing
Data, broadly speaking, including raw data, algorithms, results, pipelines, publications, knowledge and even connections among people, are growing at an unparalleled pace. Thus, it needs to link researchers all over the world and build scientific social networks for efficient and effective data sharing.

Computational Molecular Evolution

Modeling Compositional Dynamics
Sequence compositions at different levels (e.g., codon) reflect an interplay result of mutation and selection. To better understand sequence evolution, it is of fundamental significance to study sequence composition, which is closely related to gene expression, translation speed and/or accuracy, gene function, protein structure, the intrinsic nature of the genetic code, and so on.

Detecting Mutation and Selection
A number of models have been proposed for modeling evolution of protein-coding sequence. It would be desirable to model sequence evolution and detect selective pressure, not merely in protein-coding sequences, but also in non-coding sequences.

Simulating Evolutionary Process
Simulating evolutionary process of molecular sequences over time is essential for a broad range of evolutionary studies. To perform simulations in a biologically realistic way, it is necessary to take full considerations of a variety of multiple parameters, such as, mutation rate, functional and structural constraints, pattern of site substitution, co-evolving sites, site-specific evolutionary constraints, etc. 


Selected Publications
Group Members

HAO Lili, LI Cuiping, LI Rujiao, LIANG Fang, MA Lina, SANG Jian, SONG Shuhui, TIAN Dongmei, ZOU Dong

Graduate Students:
SUN Shixiang, YIN Hongyan, WANG Guangyu, XU Xingjian, XIA Lin, YU Chunlei, LI Mengwei, LIU Lin