Research

A Project on Neural Embedding Revealing the gender difference in research topics

We measured the gender difference in subjects and races on the axes of interdisciplinarity, soft-hard, and nature-social based on the research topics of their papers. From the left picture, we can see that males perform harder in mathematics, computer science, and all of the social science fields. As for Health Professions and Nursing, Males' research is significantly harder and more cross-disciplinary. From the right picture, there are some race clusters of gender gap on these three axes.

Multidimensional Feature Analysis, validation, and Early identification of Original Science

We developed early identification machine learning frameworks, analyzed feature importances, and both individual and interactive effects of originality metrics based on the constructed dataset using award-winning scientific achievements and a comprehensive literature database. The following pictures illustrate the interactive effects of reference-based variety and diversity (left), and diversity and disruption in the early period on originality.

Knowledge Graph construction and Inference Based on Literature Data

We employed natural language processing methods to extract entities (such as models, algorithms, medicines, and diseases) and their relations from abstracts of journal articles. The upper figure presents the evolution of the knowledge graph of models and algorithms, whose links indicate the based_on relation between two models or algorithms. The lower table presents the description of one of the datasets of our knowledge graph of drug indications.

Column name Description
drug_id The unique ID of drugs in DrugBank Database.
drug_name The displayed name of drugs.
disease_dict Dict type variables, whose values indicate the occurrence frequencies of indications in the OpenAlex dataset for each drug.
disease_type_num The length of disease_dict, which indicates how many types of indications occur in the OpenAlex dataset for each drug.

Semantic-based study on the mobility of researchers and the shift of research interests

This Project is my master's thesis. In this project, my major task is to measure the impact of scientific mobility on the extent of the changes in their research interests. In one chapter, we apply embedding techniques to represent institutions in a low-dimensional space based on the scientists' trajectories extracted from their publications. The following pictures reveal the geographical stratification of China's provinces. This content is also in my publicationHe, Y., Huang, Y., Tian, C., **ang, S., & Ma, Y. (2024). Neural embeddings of scientific mobility reveal the stratification of institutions in China. Information Processing & Management, 61(3), 103702.