Data
First cycle students generate large quantities of behavioral data every day thanks to various university systems. These data on students’ behavior can be analyzed to assess school performance, study habits and other lifestyle. The data examined in this study come from the database of the campus system of a university in northeast China, which records the basic information of students, school performance, as well as detailed data on daily transactions and library visits newspapers. This study extracted four data types from the university campus system database. First, the basic information of students include the name of each student, sex, identification of students, college, major, the year of registration, the hometown and the ethnic origin. Second, academic performance data consists of course, notes and rank names. Third, lifestyle transaction data include categories of expenses, locations, service counters, payment methods, horodatages, transaction amounts, remaining sales and charging amounts. The categories of expenditure include restoration in cafeterias and bathing expenses. Fourth, the library access newspapers provide precise entry and exit times, as well as visit frequencies. The use of data complies with confidentiality requirements and standards, and the data involving students have been desensitized. All data has been pre -treated to delete double records and standardize formatting. Student performance data data has also been standardized to take account of score variations between disciplines and majors.
The data set used in this study contains 3,123,840 behavioral data records on the 3,499 undergraduate campus collected between 2018 and 2022. About 78% of students were men, and the remaining 22% were women. This study was carried out within the College of Engineering, where the gender ratio of students is clearly unbalanced. The research sample was made up of 1,676 students aged 22 and,1445 students aged 23 (48% and 41% of the data set, respectively). A subset of 398 undergraduate students represented ethnic minorities, constituting 11% of the sample. Most students (71%) were members of the Chinese Communist Party (CPC). The average weight average (GPA) was 80.095 points, with a standard deviation of 9.373. This distribution looked closely at that normal in that it reflected the academic performance models of students. This sample therefore had a reasonable structure, approximately the proportions of the real world and was very representative. In addition, the data set corresponds to campus closings following the trigger of COVID-19 in January 2020. During this period, universities in China strictly controlled access to the campus and deliveries of prohibited food products. Data on food in cafeterias, bath and study in the library largely reflect behavior on students’ campus with reduced external distractions. These factors collectively support the credibility and reliability of the study.
Measures
School performance (GPA) is a key metric for the quality of education and a quantifiable result of the success of the students. This measure normally refers to GPA in higher education environments, which is calculated by weighted the notes of individual courses according to their hours of credit (Wang et al. 2015; ZEEK et al. 2015).
This study refers to several independent variables of Shu et al. (2020) Restoration model. Aspects of interest included students Meal stability coefficient,, Early growing coefficient,, Restaurant counter selection,, Restaurant consumption levelAnd Stability of restaurant consumption. These variables have been used to define a range of food usual indicators, facilitating a more systematic analysis of students’ consumption behavior. Several other factors (for example, Medium bath frequency,, Medium bath time,, Average frequency of the library arrivalAnd Average duration of the study) have been evaluated according to their relative intensity in applied statistics. Specific calculations for all independent variables are indicated in the table 1.
Control variables included Gender (woman = 1 and 0 otherwise), Ethnic (Han = 1 and 0 otherwise), AgeAnd Political affiliation (CPC membership = 1 and 0 otherwise). These characteristics have been integrated to control the potential confusion factors that could affect academic performance. This approach has ensured a more precise evaluation of the way in which students’ life habits influence their academic success.
Model training
In this study, we use a LSTM (long -term short -term memory) model to treat and calculate the indicators of students’ historical eating habits. The model is implemented in Python using a personalized script based on the Torch library, a frame widely used for in -depth learning. The Torch library offers flexibility in the design and training of personalized LSTM models adapted to specific tasks. As the type of recurrent neural network structure (RNN), the LSTM is particularly well suited to manage the data from chronological series. Compared to the traditional RNN, the LSTM introduces the entrance doors, forgets the doors, the exit doors and a cellular state, which allow LSTM to better manage long -term dependencies in the sequences (Graves and Schmidhuber, 2005). The specific process is illustrated in Fig. 1.
Every time tThe LSTM receives an input variable Xtwhich includes behavioral characteristics such as meal time and location at that time. To make these characteristics adapted to the treatment of models, a coding to a hot is applied, which represents various categories of times (breakfast, brunch, lunch, afternoon tea, dinner and end of the evening) and meal places as distinct dimensions. Consequently, the dimension of the input variable extends to 404, encompassing categorical features for all points of time. Coding to a hot is defined here as follows: generally, categorical characteristics as the meal locations can be represented by numbers (1 to N), but as there is no control relationship or inherent magnitude between these categories, they are converted into variables coded to a hot to separate them effectively. Each category is represented by a bit register, where only one bit is active at any time, which not only addresses the problem of the implicit command in category data but also enriches the space of functionalities. This additional dimensionality helps the model to better differentiate distinct behavioral models.
The LSTM controls the flow of information through three internal doors – door holder, front door and exit door – to effectively manage long -term and short -term dependencies in a sequence. The forgotten door determines what information of the previous hidden state \ ({h} _ {t-1} \) must be kept or rejected, filtering historical information relevant for the current time. The front door receives the current entrance Xt And decides what new information should be stored in the LSTM cell. Finally, the exit door generates the current hidden state Ht based on information in the cell and produces the prediction of the model Yt at that time.
In supervised learning, the assessment metric is known as the loss value, which quantifies the error of the model. In this study, the LSTM model was used to calculate the variables linked to eating habits, and the loss of drive for certain variables is presented in the figure. 2. The loss of training and validation of the two models drop strongly at the beginning, then stabilize at lower levels. The optimal validation loss appears around the 45th and 34th iterations, indicating that the LSTM model has achieved satisfactory performance at these points.
Analytical approach
A multiple regression model was created to test H1 – H3 concerning the lifestyle of undergraduate students and the associated effects on school performance. The following regression equations are applied:
$$ \ Begin {array} {c} {y} _ {GPA} = {\ beta} _ {0}+{\ beta} _ {1}}} _ {Habit}+{\ beta} _ {2} gender+{\ beta } _ {3} ethnicity \\ \\, + \, {\ beta} _ {4} Age + {\ beta} _ {5} Politicalaffiliation + \ Varsilon \ End {Array} $$
(1)
Or YGPA is the student’s school return (GPA)); Xhabit is the student’s eating habits, hygiene habits and study habits; And Gender,, Ethnic,, Age,, Political affiliation are control variables. ε is the term random disturbance; β1 Indicates the parameters to be estimated. The regression analyzes have been implemented using personalized code written with the Statsmodels library, a widely used Python module which facilitates the estimation and analysis of various statistical models.