Article
Open access
Published: 23 May 2024

Characterization of clinical data for patient stratification in moderate osteoarthritis with support vector machines, regulatory network models, and verification against osteoarthritis Initiative data

Scientific Reports volume 14, Article number: 11797 (2024) Cite this article

426 Accesses
Metrics details

Subjects

Abstract

Knee osteoarthritis (OA) diagnosis is based on symptoms, assessed through questionnaires such as the WOMAC. However, the inconsistency of pain recording and the discrepancy between joint phenotype and symptoms highlight the need for objective biomarkers in knee OA diagnosis. To this end, we study relationships among clinical and molecular data in a cohort of women (n = 51) with Kellgren–Lawrence grade 2–3 knee OA through a Support Vector Machine (SVM) and a regulation network model. Clinical descriptors (i.e., pain catastrophism, depression, functionality, joint pain, rigidity, sensitization and synovitis) are used to classify patients. A Youden’s test is performed for each classifier to determine optimal binarization thresholds for the descriptors. Thresholds are tested against patient stratification according to baseline WOMAC data from the Osteoarthritis Initiative, and the mean accuracy is 0.97. For our cohort, the data used as SVM inputs are knee OA descriptors, synovial fluid proteomic measurements (n = 25), and transcription factor activation obtained from regulatory network model stimulated with the synovial fluid measurements. The relative weights after classification reflect input importance. The performance of each classifier is evaluated through ROC-AUC analysis. The best classifier with clinical data is pain catastrophism (AUC = 0.9), highly influenced by funcionality and pain sensetization, suggesting that kinesophobia is involved in pain perception. With synovial fluid proteins used as input, leptin strongly influences every classifier, suggesting the importance of low-grade inflammation. When transcription factors are used, the mean AUC is limited to 0.608, which can be related to the pleomorphic behaviour of osteoarthritic chondrocytes. Nevertheless, funcionality has an AUC of 0.7 with a decisive importance of FOXO downregulation. Though larger and longitudinal cohorts are needed, this unique combination of SVM and regulatory network model shall help to stratify knee OA patients more objectively.

Data-driven identification of predictive risk biomarkers for subgroups of osteoarthritis using interpretable machine learning

Article Open access 01 April 2024

Classification of four distinct osteoarthritis subtypes with a knee joint tissue transcriptome atlas

Article Open access 12 November 2020

Screening of osteoarthritis diagnostic markers based on immune-related genes and immune infiltration

Article Open access 29 March 2021

Introduction

Synovial joints (e.g., knee, hip and hand) allow smooth movements between adjacent bones and are surrounded by an articular capsule that defines a cavity filled with synovial fluid. Bone extremities are also covered by a layer of hyaline articular cartilage that prevents bone-to-bone contact, cushions possible noxious impacts, and provides a low-friction surface for joint articulation¹. Epidemiological data from the Institute for Health Metrics and Evaluation shows that knee osteoarthritis (OA) affects 22% of men and 31% of women over the age of 55 in 2019. Such prevalence is expected to rise due to the increase in life expectancy and the overall body mass index. Currently, treatments are conservative (i.e., weight loss, low-impact exercises and analgesics) and knee OA-modifying drugs are not available because knee OA pathophysiology is not fully understood. Eventually, total knee replacement may be necessary. Predictions suggest that the need for joint replacement will continue to rise because of both changes in patients’ demography, and expanding indications for surgery². Consequently, the medical costs associated with knee OA are increasing, making this disease one of the world’s leading health problems³.

knee OA diagnosis occurs mainly during its moderate to severe/late stage when the articular cartilage has become irreversibly damaged. At this point, the decision to propose the patient a total knee replacement or conservative treatment considers the patient’s age and the extension of joint structural alterations, but the eventual decision is largely conditioned by the symptoms, pain being the critical one. Every person has a unique perception, as it is affected by biological, psychological and social factors⁴. Pain is typically assessed through questionnaires such as Western Ontario and McMaster Universities Osteoarthritis Index (WOMAC). However, the inconsistency and poor reporting of questionnaires, along with a poor consistency between physical joint damage, as assessed through radiological biomarkers, and pain symptoms, motivate the search for more objective measures that could be used as new biomarkers in knee OA diagnosis^5,6.

To find such biomarkers, gait data have been used to develop prediction models for the progression of knee OA⁷. Also, serum and urine biochemical markers have been explored to establish knee OA phenotype classification systems: urinary C-terminal crosslinked telopeptide of type II collagen and serum degrading enzymes levels, cartilage oligomeric matrix protein and hyaluronan have some predictive value for knee OA classification⁵. But, as pain, at least the nociceptive dimension might be correlated with the levels of synovial inflammation⁸, there is a growing interest in exploring synovial fluid molecules as possible biomarkers^9,10. For example, Haraden et al.¹¹ predicted the development of inflammatory knee OA endotype based on synovial fluid molecules¹¹. Synovial fluid data have also been combined with gait data to enable the construction of support vector machine classifiers to predict cartilage damage by Donnenfield et al.¹⁰.

But, patient stratification systems that helped to differentiate between sensory and affective pain dimensions from the nociceptive extent have not been fully explored⁵. Therefore, objectively and biologically characterizing current knee OA clinical descriptors (i.e., WOMAC domains, catastrophism, or sensitization) may help to gain valuable insight into patient stratification which eventually might help in clinical decision-making. Machine learning models can link seemingly unrelated features by finding patterns in training data. Specifically, this cross-sectional study aims to describe a cohort of women (n = 51) by mining the relationships among typical knee OA descriptors and molecular data through Support Vector Machine (SVM) classifiers to provide objectivity in current knee OA diagnostic systems. We further intend to add interpretability to the results by enriching proteomic real-world data. Specifically, by using as initial conditions the proteomic data we personalized simulations of a previously developed regulation network model¹². The transcription factor results out of the regulatory model were used as the third set of input features for the SVM, allowing us to characterize the cohort with intracellular molecular information.

Results

We explore a total of 686 classification tasks to classify a cohort of 51 women regarding individual levels of functionality (FU), rigidity (RI), joint pain (JP), pain catastrophism (CA), pain sensitization (SE), depression (DE) and synovitis (SY). Three different types of datasets are used as input features: (i) sets of knee OA-clinical features; (ii) synovial proteomics data; (iii) in-silico data regarding the activation of transcription factors from a regulatory network-based model.

Figure 1A shows the relative importance of each clinical variable when target knee OA descriptors (those that can be binarized with an $accuracy > 0.75$ on the train set) are classified: the closer the weight is to 1 (or yellow according to the colour bar employed), the more relevant such feature is for the classification. WOMAC domains classification (i.e., JP, RI and FU) are more influenced by CA, FU and SE, respectively ($AUC > 0.8$). DE is mostly influenced by FU, but in this case, the AUC score is lower ($AUC = 0.79$). CA classifier has the highest AUC score ($AUC = 0.9$) and is highly influenced by FU. The inflammatory classifier (SY) is mostly influenced by RI.

Figure 2A shows the relative importance of SF molecules to classify knee OA clinical features. The classification of WOMAC domains (i.e., JP, RI and FU) are highly influenced ($AUC > 0.8$) by MCP1, IL-1RA and IL-6, respectively. With a lower score ($AUC = 0.71$), DE categorization is equally influenced by LEPTIN and VEGF-A. When classifying patients against CA and SE, IL-RA and LEPTIN are the parameters that have a higher influence, but the AUC score decreases below 0.6, reaching 0.41 for the SE classifier. SY is relatively well classified ($AUC = 0.67$), but in this case, the most influencing parameter is MCP1.

Figure 3 shows the classification of knee OA clinical descriptors for n = 25 when the simulated activities of the TF of the regulatory network-based model of a chondrocyte¹² are used as input features. Generally FOXO and Sox9 are the most influential TF (see Fig. 3A). Figure 3B reveals the robust effect of FOXO in JP and FU classification as it discriminates either the false positives ($AUC = 0.3$) or the true positives ($AUC = 0.7$) of the classifiers. The other classifications lead to AUC scores lower than 0.65: RI ($AUC = 0.62$) is greatly influenced by Sox9. The SE classifier ($AUC = 0.64$) is the only one mostly controlled by CREB. The SY classifier ($AUC = 0.5$) is mostly influenced by a set of TF that includes Sox9, CREB and FOXO among the most relevant, and contributions of Hif2a, NF-$\kappa$B and Runx2 result largely irrelevant.

Table 1 summarizes the results of the classification of the WOMAC domains data recruited in the OAI initiative after binarizing them with the thresholds found with our data. The average accuracy of the six explored classifiers is 0.9794.

Table 1 Accuracy of SVM classification models using the Osteoarthritis Initiative (OAI) WOMAC data as targeted outputs. Before SVM classification, WOMAC OAI results were binarized according to the thresholds found with our cohort. The input features used were selected according to the protocol described in the Supplementary Information section “Selected features for OAI data set classification”, and listed in this same section in Additional Figures A2–A7.

Full size table

Discussion

Knee OA diagnosis is influenced by subjective parameters (i.e., WOMAC domains)¹³, which impacts treatment decisions. To reduce the subjectivity in knee OA diagnosis, the present study aimed to characterize seven clinical knee OA descriptors commonly used by examining the contributions of synovial and chondrocyte-derived information. Specifically, a cohort of 51 women with Kellgren–Lawrence grades 2–3 was classified with SVM. However, logistic regression would be another plausible choice, and yet similar results to the ones provided by the SVM would be obtained, as the SVM and logistic regression loss functions are similar. However, SVM is more robust to outliers and overfitting, thanks to the tunning of parameter C. Accordingly, SVM was used to evaluate the value of biological data to inform objectively about the clinical stratification of patients³⁹.

First, we assessed how the descriptors that play a role in the knee OA clinical manifestations can be classified based on a combination of these very same descriptors. Fo Remarkably, pain catastrophism (CA) was the leading input for the classification of the joint pain dimension (JP) of the WOMAC questionnaire (Fig. 1). Pain catastrophizing is characterized by a patient’s tendency to magnify pain stimuli and feel helpless in the context of pain⁵. It has been argued that the assessment of CA in response to a specific stimulus might account for most of the variance in pain reports^14,15,16. Accordingly, the influence of CA in our classification of knee OA descriptors suggests that JP is not assessing accurately the pain due to noxious stimulus, highlighting potential bias in current diagnosis systems^17,18. At the same time, DE was the second leading feature for JP classification. A previous study demonstrated a high correlation between high WOMAC pain scores and depression¹⁹. Furthermore, knee OA patients were found to have increased negative beliefs and mental health issues²⁰. These results might emphasise a bidirectional relation regarding the feeling of pain and mental health: patients who develop depression might be more prone to develop low tolerances to pain (a phenomenon called central sensitization, SE) and reduce their physical activity (i.e., sedentarism or static postures). In other words, our classifications support the idea that pain perception contributes to depression and emotional distress in knee OA patients, which in turn, feedback loops with pain reporting. Accordingly, SE was the leading feature for DE classification, which is consistent with previous findings: in individuals with knee OA, the presence of enlarged pain areas was associated with more persistent and severe pain, as well as higher anxiety levels. These findings were interpreted as a sign of altered central pain processing²¹ Beyond its influence on DE, SE played an important role in classifying the WOMAC domains within our cohort, in general, since it resulted to be one of the leading influential features (Fig. 1A). Accordingly, SE might increase WOMAC scores indicating secondary influences in the outcome of these questionnaires.

FU had also a clear influence on the classification of RI, which matches previous findings by Wolfe et al.¹⁹ In turn, FU classification was greatly affected by SE and CA. These results suggest that the fear of movement (kinesophobia) might impose circular interactions among cognitive factors and pain perception, as Wong et al.²² reported previously for the case of CA²³. There is a growing consensus that disability symptoms in knee OA patients rest upon various factors, including central pain processing mechanisms²⁰. As a result, a person may distance herself from activities and social situations to avoid the appearance of pain, increasing the risk of developing an unhealthy lifestyle and depression²⁴. In fact, a previous study predicted that static postures might decrease chondrocyte anabolic activity which decreases the deposition of extracellular matrix¹².

Figure 1 further suggests that subjective clinical descriptors influence the classification of inflammation (i.e., SY). Although knee pain (JP) has been linked to joint inflammation²⁵, our findings, as illustrated in Fig. 1A, do not reflect this behaviour: RI led the way in classifying SY with AUC = 0.86. The lack of mechanistic significance of these inputs highlights the need for new objective biological markers in knee OA clinical-decision making. In this sense, serum and urine biochemical markers (in combination with magnetic resonance imaging) have been used to establish knee OA phenotype classification systems⁹. As the pain might be correlated to the intensity of articular cartilage degradation and a pro-inflammatory environment²⁶, there is a growing interest in exploring the synovial fluid inflammatory molecules^10,11,27. However, to the best of our knowledge, no study has specifically explored the capacity of synovial fluid markers to increase our understanding of existing knee OA diagnosis descriptors. Hence, we explored the potential of measurable molecules in synovial liquid to provide insights into KAO clinical features. Figure 2A reveals a high potential of leptin to explain the classification of knee OA patients according to clinical descriptors. Specifically, the relatively leading role of leptin in the good discrimination of patients regarding the SY levels ($AUC > 0.67$) suggests that the endocrine action of leptin might have an important role in joint inflammation stratification, especially in such joints not affected by over-compression. Leptin is a hormone produced by white adipocytes, and it is overexpressed in individuals with metabolic syndrome. This syndrome is characterized by the presence of systemic low-grade inflammation, that has the potential to induce pathological processes in several tissues. In the specific case of chondrocytes, leptin has been linked with increased production of iNOS (the main producer of NO)²⁸. Moreover, leptin shows a synergic effect with IL-1$\beta$ on the production of NO in human articular cartilage chondrocytes²⁹. Besides, leptin has been associated with increased production of matrix-degrading enzymes (i.e., MMP1 and MMP3)³⁰. Other studies demonstrated that leptin alone can induce the synthesis of PGE2 (a molecule associated to modulate inflammatory pain), IL-6, IL-8, MMP1, MMP3 and MMP13 in cartilage explants^31,32.

But, the main influencing parameter for SY and JP classification was MCP1 (Fig. 2A). This suggests that MCP1 might be used as a new biomarker to evaluate inflammation as it efficiently classifies SY ($AUC = 0.67$, Fig. 2B). Curiously, a study relates structural progression and pain with the abundance of activated macrophages³³. For SY classification, VEGF-A emerges as the second most influential parameter, which is a reasonable inference: once macrophages are recruited, they stimulate VEGF production, which increases angiogenesis and synovitis. This creates a positive feedback loop where macrophage recruitment contributes to the perpetuation of joint inflammation, thereby leading to more severe forms of knee OA³⁴. Thus, our results suggest that patients with higher levels of MCP-1 might be more prone to have painful forms of knee OA. MCP-1 might be used as an objective factor to differentiate pain induced by sensitization from pain induced by nociception and adjust the clinical decision accordingly.

We used SF information as initial conditions to a previous RNM for articular cartilage chondrocytes to obtain patient-specific information about the TF proteomic profile. Figure 3A illustrates that TF that usually are activated in healthy chondrocytes (i.e., FOXO, Sox9, and Cited2) contribute more in the classification tasks, while the hypertrophic (i.e., Runx2 and Hif2) and acute inflammatory (i.e., NF-$\kappa$B) TFs do not greatly affect the stratification of patients. But, AUC scores in Fig. 3B (with an overall mean of 0.56) suggest that there is not a clear pattern within knee OA patients. This fact might reflect the pleomorphic behaviour of osteoarthritic chondrocytes associated with unstructured activation of intracellular signalling pathways, which leads to a microheterogeneity of cellular reaction patterns within subjects³⁵. However, the AUC analysis supports the use of TF for the classification of JP because it discriminates false positives ($AUC = 0.3$). Previous studies with animal models demonstrate that FOXO has protective functions in the response of cartilage to joint trauma and mechanical overload, and is downregulated by pro-inflammatory mediators (i.e., TNF-$\alpha$ and IL-1$\beta$)^36,37. This and the poor influence of NF-$\kappa$B might help understand why clinical manifestations are insufficiently explained by acute nociceptive pain mechanisms (i.e., tissue damage and acute inflammatory responses). Our results highlight an important role of systemic low-grade inflammation in knee OA (overall leading role of leptin in Fig. 2). However, the relatively low AUC scores point out the need for further corroboration of this hypothesis in larger (longitudinal) cohorts.

Before performing the classification tasks, the clinical descriptors underwent a binarization process with a mean accuracy of 0.83 (see Supplementary Material 1). To validate thresholds, we investigated their application in the baseline OAI dataset. Our findings revealed that the accuracy was above 0.95 for six of the explored binarized outputs. We also examined whether the sensitization threshold matched the threshold recommended by Pujol et al.³⁸) and both thresholds were the same mean ($accuracy = 0.67$) on the tests sets.

As with any study, our investigation has limitations: the most notable is the need for confirmation of our presented results in a larger cohort. Besides, according to the size of our data set, we implemented a nested leave-one-out validation. When applying this method, the train set remained highly consistent across each fold. Arguably this particular form of cross-validation may introduce higher variance in the error estimate, yet represents one of the optimal strategies to take most advantage of small data set³⁹. Another weak point is the elevated number of features used when SF and TF are explored as inputs, which limits the algorithm to work optimally. Besides, TF data come from a model that was not validated using in-vitro/in-vivo data. Also, we should keep in mind that we used a linear classifier, which cannot model nonlinear relationships in the data. Furthermore, the current study has a cross-sectional design, which difficult to extract conclusions regarding the causality between biomarkers and current knee OA diagnostic systems. However, our findings can guide the design of future studies based on the most influential associations identified.

In conclusion, to the best of our knowledge, this study is the first to explore potential biomarkers that could be related to common diagnostic criteria for knee OA, such as WOMAC scores, pain catastrophism, sensitization, and depression, to potentially increase objectivity in knee OA patient’s stratification. The unique application of Support Vector machine-based classifiers in combination with a regulatory network-based model allows for a more objective understanding of knee OA based on synovial fluid biomarkers. However, it is important to acknowledge that limitations exist, and larger cohorts and longitudinal studies are needed to fully map out objective descriptors for knee OA diagnosis.

Methods

We explored how to classify subjects in a cohort of women (n = 51) with Kellgren–Lawrence grade 2–3 knee OA, described extensively in terms of inclusion and exclusion criteria by Tassani et al.⁴⁰ From those patients that presented effusion (n = 25) synovial liquid (SL) extraction was performed and measured through Luminex$\circledR$ to obtain information related to inflammatory soluble factors. Such inflammatory mediators were further used as input data for a qualitative dynamical system of a regulatory-network based model (RNM) that summarizes chondrocyte mechanobiological and intracellular activity¹². From the RNM we obtained personalized virtual information regarding the activation level of seven transcription factors (TF). An overview of the methodoly followed is depicted in Fig. 4.

But, the primary contribution of this manuscript is the characterization of patients regarding the level of seven clinical descriptors (catastrophism (CA), depression (DE), functionality (FU), joint pain (JP), rigidity (RI), sensitization (SE), and synovitis (SY)), which were binarized for the classification purposes with a Youden test. Three sets of data were used as input for the SVM-based classifiers: (i) the most appropriate set of knee OA descriptors; (ii) SL inflammatory data (n = 25), and (iii) in-sillico TF activation levels. All in all, lead to the exploration of 21 classification tasks. The binary classification tasks were validated using the receiver operating characteristic curve (ROC-AUC) analysis. The binarization thresholds were further validated in the baseline of the osteoarthritis initiative (OAI) data set.

Cohort description

At the Rheumatology/Orthopaedic Surgery Department of the Hospital del Mar (Barcelona, Spain), a cohort was built revising the clinical history of patients diagnosed with knee OA. The study followed the Good Clinical Practice guidelines and the Declaration of Helsinki, and the Clinical Research Ethical Committee approved the protocol (2016/6747/I). All participants signed an approved informed consent and agreed to do a wash-up treatment for 3 months of intra-articular hyaluronic acid infiltrations, 1 month for oral or intra-articular corticoids, and 1 week of nonsteroidal anti-inflammatory or opioid drugs. A total of 51 women were selected who presented radiographic signs of moderate knee OA (Kellgren–Lawrence grade of 2/3) and symptomatology (pain, dysfunction and/or effusion) in the last 3 months. Patients who presented any sign of secondary OA were excluded (meniscectomy, inflammatory or connective tissue diseases, overuse of the joint). Women with effusion (n = 25) were eligible for synovial liquid extraction⁴⁰.

Description of the output labels

The targeted OA descriptors used for the stratification of patients were:

Joint Pain (JP): it is assessed using the WOMAC pain scale, where patients rate the level pain experienced during five quotidian (i.e., walking, sitting or standing) activities. For each of these activities, the level of pain felt is reported on a 5-point Likert scale ranging from “none” to “extreme”⁴¹.
The functionality of the joint (FU): it contains the results of the WOMAC functional impairment questionnaire that summarizes 17 items with 1—5 Linkert scale responses.
The rigidity of the joint (RI): it is the WOMAC stiffness subtest contains two categories and is used to rank the rigidity of the joint.
Central pain sensitization or pain hypersensitivity (SE): Defined by the International Association for the Study of Pain as “increased responsiveness of nociceptive neurons in the central nervous system to either normal or subthreshold afferent input”⁴². SE was evaluated following the protocol described in Pujol et al.³⁸
Effusion (EF): it is a radiographic factor (echography), post-analysed and graded by an expert³⁸. It is a measure of the liquid accumulated inside the synovial joint.
Depression and anxiety (DE): It is obtained from the Hospital Anxiety and Depression Scale (HAD), which measures core symptoms of anxiety and depression without including physical symptoms⁴³.
Catastrophism (CA): based on the Pain Catastrophism Scale, it assesses catastrophic thinking related to pain with or without chronic pain. CA refers to the tendency to focus on, and magnify, ache sensations, and to feel helpless in front of pain¹⁶.

Originally, these descriptors were collected in a discrete non-binary form, but when used as targets to predict, we converted them into binary categorical variables (i.e. targeted labels). This was done by selecting a threshold that best divided the cohort into two groups. The optimal threshold for each variable was determined through a grid search (see Supplementary Material 1) to maximise the Youden’s index:

$$\begin{aligned} J=\frac{\text {TP}}{\text {TP} + \text {FN}}+\frac{\text {TN}}{\text {TN} + \text {FP}}-1 \end{aligned}$$

(1)

TP refers to true positives, FN refers to false negative, TN refers to true negative, and FP refers to false positive. Maximizing Youden’s index was equivalent to achieving the highest value of the sum of sensitivity (first term on the right side of the equation) and specificity (second term). However, if the accuracy across the entire fold of the leave-one-out validation was less than 0.75 in the train set, we omitted such descriptor as an output label because we assumed that it could not be binarized properly.

Input features selection

Three types of input features were used for the classification of 7 clinical OA descriptors. A total of 21 classification tasks were explored. The first set of input features was the continuous form of the same clinical OA characteristics. For this case, feature selection was done ahead of binarization, based on the numerical values of the input and output variables. Hence, both Pearson and Spearman are good options. Yet, the Pearson correlation test was preferred because it measures linear relationships among variables, according to the linearity of the kernel of the SVM. We aimed to reduce any possible redundancy of descriptors by avoiding high linear dependency relationships among the input and output sets. Specifically, we excluded those cases with Pearson correlation coefficient $(r) > 0.7$ (see Fig. A1). This ensured that the identified features were completely independent and had no strong linear relationship with the targeted output. To this end, a correlation matrix using Python 7.29.0 (corr. method from pandas library) was generated, which helped identify the highly linearly correlated features regarding the output label.

Synovial fluid (SF) samples from 25 patients were obtained through aspiration and analysed with Luminex$\circledR$ technology to measure the amounts of:

pro-inflammatory cytokines: Interleukin 6 (IL-6), Interleukin 8 (IL-8), Interleukin 4 (IL-4), Tumor necrosis factor alpha (TNF-$\alpha$), Interleukin 18 (IL-18), Interferon-gamma (INF-$\gamma$), Interleukin 17 (IL-17). These molecules contribute to the degeneration of the articular cartilage and inflammation⁴⁴.
Interleukin-1 receptor antagonist protein (IL-1RA) inhibits the activity of the Interleukin-1$\beta$ (IL-1$\beta$) receptor.
Monocyte chemoattractant protein 1 (MCP1) recruits monocytes to inflammation sites produced by tissue injury, propagating inflammation and tissue damage⁴⁵.
Vascular endothelial growth factor A (VEGF-A) increases angiogenesis⁴⁶.
Leptin modulates the inflammatory processes and articular cartilage remodelling⁴⁷.

AC is nourished through the SF. Hence, diffusion of the aforementioned molecules to articular cartilage influences the chondrocyte’s metabolism. Thus, the transcription factor proteomic profile might provide valuable data to characterize patients biologically. Accordingly, and because articular cartilage samples cannot be collected in the patients without causing any serious damage, information from a chondrocyte RNM was used¹². The RNM models the concentration of molecules (i.e., proteins) by time-dependent variables. The synthesis of each network node (i) is regulated by rate equations in the form $\frac{\textrm{d}x_i}{\textrm{d}t}=f_i(x_n)$, which depends on regulation nodes (n). The RNM can capture the intracellular channelling of external signals into identifiable cellular behaviours⁴⁸. In this work, we have used the patient SF as an external signal of a chondrocyte RNM previously developedSegarra23. From the simulations, we generated personalized synthetic information about 8 TF: Activator Protein 1 (AP1); cAMP response element-binding protein (CREB); forkhead box 1 (FOXO); Nuclear factor kappa-light-chain-enhancer (NF-$\kappa$B); transcription factor Sox9 (Sox9); Cbp/p300-interacting transactivator 2 (CITED2); Runt-related transcription factor 2 (Runx2); Hypoxia-inducible factor 2-alpha (HIF2a). Detailed integration can be found in Additional Information and summarized in Fig. 5.

Mathematical formulation

SVM is a supervised approach for classifying⁴⁹. It divides the samples into two groups by identifying the separating hyper-plane with the largest distance (margin) to the nearest training data points, called the support vectors. In this work, a binary linear SVM was used to predict a class $y_n\in [0,1]$, based on the input training set $x_n\in \mathbb {R}^M,n=1,\ldots ,N,$ where M was the number of features and N the number of samples. A linear hyperplane (a line if m = 2 or a plane if m = 3) can be written as the set of points x satisfying

$$\begin{aligned} \omega ^Tx-b=0 \end{aligned}$$

(2)

where $b\in \mathbb {R}$ was the ordinate at the origin and $\omega \in \mathbb {R}^M$ was the normal vector to the hyperplane. The algorithm found a prediction function that categorized most samples correctly by minimizing:

$$\begin{aligned} min_{\omega ,b,\zeta } \frac{1}{2}\omega ^T\omega + C\sum _{n=1}^N{\zeta _n} \end{aligned}$$

(3)

subject to

$$\begin{aligned} y_n(\omega ^T(x_n) + b) \ge 1-\zeta _n, \zeta _n\ge 0, n=1,...,N \end{aligned}$$

(4)

$y_n$ was the known label corresponding to the sample $x_n$, $\zeta _n$ is the extent to which the margin constraint on $x_n$ can be violated, and C was the penalty term that controlled the number and severity of the violations to the margin. Given a sample x, the algorithm classified it by determining on which side of the hyperplane was $y_n$ (i.e., whether $\omega ^T(x_n)+b$ was positive or negative.

To ensure the maximum performance of the classifier, hyperparameter tuning is done before the training step based on k-fold cross-validation ($k\ =\ 5$). Once the best hyperparameters were found, the SVM was built in Python (Version 3.9.7)⁵⁰. Furthermore, according to the size of the data set, leave-one-out validation was applied. Thus, we have trained $n=50$ models when clinical data is used as input feature, and 24 when synovial fluid and transcription data are used as inputs. With the linear SVM, the size of each weight $\omega _m\in \mathbb {R}$ relative to the other ones indicated how important the m-th feature was for the separation. With the leave-one-out validation, the $\bar{\omega }_m=\frac{\sum ^{n-1}\omega _m}{n-1}$ was obtained. The $\bar{\omega }$ value was normalized between 0 and 1 for each M of the $x_n$ for visualization purposes. Finally, the performance of each classifier was evaluated using a nested receiver operating characteristic curve (ROC-AUC) analysis, and mean AUC scores were reported⁵⁰.

Thresholds validation

We validated the binarization of features in the Osteoarthritis Initiative (OAI) baseline dataset using the thresholds found for the WOMAC domains (see Table A1). The OAI dataset recruited 4796 individuals in the United States. To minimize the noise in the data, we excluded individuals who had either knee history of knee surgery (i.e., arthroscopy, ligament repair, meniscectomy) and participants with less than 80% of variables completed, resulting in 3706 subjects. Pearson correlation was used for dimensionality reduction, to remove highly correlated variables ($r>0.7$), leaving 785 features for analysis. Finally, information gain was used to identify the first 50 most important features (see Figs. A2, A3, A4, A5, A6 and A7). Information gain calculation was based on the reduction in entropy of the dataset to evaluate each feature’s importance⁵¹. Hyperparameter tuning was then performed, followed by SVM classification for six targeted outputs: WOMAC pain, functionality and rigidity of the right and left knee. A detailed description of this process can be found in Additional Information.

Topical subheadings are allowed. Authors must ensure that their Methods section includes adequate experimental and characterization data necessary for others in the field to reproduce their work.

Data availability

The data that support the findings of this study are available from the Hospital del Mar but restrictions apply to the availability of these data, which were used under the terms of the informed consent signed by the patients for the current project, and so are not publicly available. Data are however available from the corresponding author, Jérôme Noailly, upon reasonable request and with permission of the Ethical Committee of the Hospital del Mar. Data for the thresholds validation is available at https://nda.nih.gov/data_structure.html?short_name=clinoq01OAI web-page.

References

Tamer, T. M. Hyaluronan and synovial joint: Function, distribution and healing. Interdiscip. Toxicol. 6, 111. https://doi.org/10.2478/INTOX-2013-0019 (2013).
Article PubMed PubMed Central Google Scholar
Jeffery, A. E., Wylde, V., Blom, A. W. & Horwood, J. P. “it’s there and i’m stuck with it’’: Patients’ experiences of chronic pain following total knee replacement surgery. Arth. Care Res. 63, 286–292. https://doi.org/10.1002/ACR.20360 (2011).
Article Google Scholar
Mobasheri, A. & Batt, M. An update on the pathophysiology of osteoarthritis. Ann. Phys. Rehabil. Med. 59, 333–339. https://doi.org/10.1016/J.REHAB.2016.07.004 (2016).
Article PubMed Google Scholar
Trouvin, A. P. & Perrot, S. Pain in osteoarthritis: Implications for optimal management. Joint Bone Spine 85, 429–434. https://doi.org/10.1016/J.JBSPIN.2017.08.002 (2018).
Article PubMed Google Scholar
Lotz, M., Martel-Pelletier, J. & Christiansen, C. Value of biomarkers in osteoarthritis: Current status and perspectives. Ann. Rheum. Dis. 72, 1756–1763. https://doi.org/10.1136/annrheumdis-2013-203726 (2013).
Article CAS PubMed Google Scholar
Copsey, B. et al. Problems persist in reporting of methods and results for the womac measure in hip and knee osteoarthritis trials. Qual. Life Res. 28, 335–343. https://doi.org/10.1007/S11136-018-1978-1/TABLES/6 (2019).
Article CAS PubMed Google Scholar
Long, M. J., Papi, E., Duffell, L. D. & McGregor, A. H. Predicting knee osteoarthritis risk in injured populations. Clin. Biomech. 47, 87–95. https://doi.org/10.1016/J.CLINBIOMECH.2017.06.001 (2017).
Article Google Scholar
Schaible, H. G. Mechanisms of chronic pain in osteoarthritis. Curr. Rheumatol. Rep. 14, 549–556. https://doi.org/10.1007/S11926-012-0279-X (2012).
Article CAS PubMed Google Scholar
Nelson, A. E. et al. A machine learning approach to knee osteoarthritis phenotyping: Data from the fnih biomarkers consortium. Osteoarth. Cartil. 27, 994–1001. https://doi.org/10.1016/J.JOCA.2018.12.027 (2019).
Article CAS Google Scholar
Donnenfield, J. I. et al. Predicting severity of cartilage damage in a post-traumatic porcine model: Synovial fluid and gait in a support vector machine. PLoS ONE 17, e0268198. https://doi.org/10.1371/JOURNAL.PONE.0268198 (2022).
Article CAS PubMed PubMed Central Google Scholar
Haraden, C. A., Huebner, J. L., Hsueh, M. F., Li, Y. J. & Kraus, V. B. Synovial fluid biomarkers associated with osteoarthritis severity reflect macrophage and neutrophil related inflammation. Arth. Res. Ther. 21, 1–9. https://doi.org/10.1186/S13075-019-1923-X/FIGURES/2 (2019).
Article CAS Google Scholar
Segarra-Queralt, M., Piella, G. & Noailly, J. Network-based modelling of mechano-inflammatory chondrocyte regulation in early osteoarthritis. Front. Bioeng. Biotechnol. 11, 87. https://doi.org/10.3389/FBIOE.2023.1006066 (2023).
Article Google Scholar
Zuiderbaan, H. A. et al. Predictors of subjective outcome after medial unicompartmental knee arthroplasty. J. Arthroplast. 31, 1453–1458. https://doi.org/10.1016/J.ARTH.2015.12.038 (2016).
Article Google Scholar
Sullivan, M. J., Stanish, W., Waite, H., Sullivan, M. & Tripp, D. A. Catastrophizing, pain, and disability in patients with soft-tissue injuries. Pain 77, 253–260. https://doi.org/10.1016/S0304-3959(98)00097-9 (1998).
Article PubMed Google Scholar
Keefe, F. J. et al. The relationship of gender to pain, pain behavior, and disability in osteoarthritis patients: The role of catastrophizing. Pain 87, 325–334. https://doi.org/10.1016/S0304-3959(00)00296-7 (2000).
Article PubMed Google Scholar
Quartana, P. J., Campbell, C. M. & Edwards, R. R. Pain catastrophizing: A critical review. Expert Rev. Neurother. 9, 745. https://doi.org/10.1586/ERN.09.34 (2009).
Article PubMed PubMed Central Google Scholar
Goodin, B. R. et al. The association of greater dispositional optimism with less endogenous pain facilitation is indirectly transmitted through lower levels of pain catastrophizing. J. Pain 14, 126–135. https://doi.org/10.1016/j.jpain.2012.10.007 (2013).
Article PubMed Google Scholar
Stratford, P., Kennedy, D. & Clarke, H. Confounding pain and function: The womac’s failure to accurately predict lower extremity function. Arthroplast. Today 4, 488. https://doi.org/10.1016/J.ARTD.2018.09.003 (2018).
Article PubMed PubMed Central Google Scholar
Wolfe, F. Determinants of womac function, pain and stiffness scores: Evidence for the role of low back pain, symptom counts, fatigue and depression in osteoarthritis, rheumatoid arthritis and fibromyalgia. Rheumatology 38, 355–361. https://doi.org/10.1093/RHEUMATOLOGY/38.4.355 (1999).
Article CAS PubMed Google Scholar
López-Ruiz, M. et al. Central sensitization in knee osteoarthritis and fibromyalgia: Beyond depression and anxiety. PloS Onehttps://doi.org/10.1371/JOURNAL.PONE.0225836 (2019).
Article PubMed PubMed Central Google Scholar
Wood, L. R., Peat, G., Thomas, E. & Duncan, R. Knee osteoarthritis in community-dwelling older adults: Are there characteristic patterns of pain location?. Osteoarth. Cartil. 15, 615–623. https://doi.org/10.1016/J.JOCA.2006.12.001 (2007).
Article CAS Google Scholar
Wong, W. S. et al. The fear-avoidance model of chronic pain: Assessing the role of neuroticism and negative affect in pain catastrophizing using structural equation modeling. Int. J. Behav. Med. 22, 118–131. https://doi.org/10.1007/S12529-014-9413-7/TABLES/5 (2015).
Article CAS PubMed Google Scholar
Helminen, E. E., Sinikallio, S. H., Valjakka, A. L., Väisänen-Rouvali, R. H. & Arokoski, J. P. Determinants of pain and functioning in knee osteoarthritis: A one-year prospective study. Clin. Rehabil. 30, 890–900. https://doi.org/10.1177/0269215515619660/FORMAT/EPUB (2016).
Article PubMed PubMed Central Google Scholar
de Oliveira Vargas e Silva, NC., da Silva-Gusmão-Cardoso, T., de Andrade, E. A., Battistella, L. R. & Alfieri, F. M. (2020) Pain disability and catastrophizing in individuals with knee osteoarthritis. BrJP3, 322–327, https://doi.org/10.5935/2595-0118.20200193.
Syx, D., Tran, P. B., Miller, R. E. & Malfait, A. M. Peripheral mechanisms contributing to osteoarthritis pain. Curr. Rheumatol. Rep. 20, 1–11. https://doi.org/10.1007/S11926-018-0716-6/METRICS (2018).
Article CAS Google Scholar
Neogi, T. The epidemiology and impact of pain in osteoarthritis. Osteoarth. Cartil. 21, 1145–1153. https://doi.org/10.1016/J.JOCA.2013.03.018 (2013).
Article CAS Google Scholar
Neuman, P., Dahlberg, L. E., Englund, M. & Struglics, A. Concentrations of synovial fluid biomarkers and the prediction of knee osteoarthritis 16 years after anterior cruciate ligament injury. Osteoarth. Cartil. 25, 492–498. https://doi.org/10.1016/J.JOCA.2016.09.008 (2017).
Article CAS Google Scholar
Otero, M., Lago, R., Lago, F., Reino, J. J. G. & Gualillo, O. Signalling pathway involved in nitric oxide synthase type ii activation in chondrocytes: Synergistic effect of leptin with interleukin-1. Arth. Res. Therapyhttps://doi.org/10.1186/AR1708 (2005).
Article Google Scholar
Otero, M., Reino, J. J. G. & Gualillo, O. Synergistic induction of nitric oxide synthase type ii: In vitro effect of leptin and interferon-gamma in human chondrocytes and atdc5 chondrogenic cells. Arth. Rheum. 48, 404–409. https://doi.org/10.1002/ART.10811 (2003).
Article CAS Google Scholar
Hui, W. et al. Leptin produced by joint white adipose tissue induces cartilage degradation via upregulation and activation of matrix metalloproteinases. Ann. Rheum. Dis. 71, 455–462. https://doi.org/10.1136/ANNRHEUMDIS-2011-200372 (2012).
Article CAS PubMed Google Scholar
Moilanen, E. et al. Leptin enhances synthesis of proinflammatory mediators in human osteoarthritic cartilage-mediator role of no in leptin-induced pge 2, il-6, and il-8 production. Med. Inflamm.https://doi.org/10.1155/2009/345838 (2009).
Article Google Scholar
Koskinen, A., Vuolteenaho, K., Nieminen, R., Moilanen, T. & Moilanen, E. Leptin enhances mmp-1, mmp-3 and mmp-13 production in human osteoarthritic cartilage and correlates with mmp-1 and mmp-3 in synovial fluid from oa patients. Clin. Exp. Rheumatol. 29, 57–64 (2011).
PubMed Google Scholar
Daghestani, H. N., Pieper, C. F. & Kraus, V. B. Soluble macrophage biomarkers indicate inflammatory phenotypes in patients with knee osteoarthritis. Arth. Rheumatol. 67, 956–965. https://doi.org/10.1002/ART.39006 (2015).
Article CAS Google Scholar
Scanzello, C. R. & Goldring, S. R. The role of synovitis in osteoarthritis pathogenesis. Bone 51, 249–257. https://doi.org/10.1016/j.bone.2012.02.012 (2012).
Article CAS PubMed PubMed Central Google Scholar
Aigner, T., Söder, S., Gebhard, P. M., McAlinden, A. & Haag, J. Mechanisms of disease: Role of chondrocytes in the pathogenesis of osteoarthritis-structure, chaos and senescence. Nat. Clin. Pract. Rheumatol. 3, 391–399. https://doi.org/10.1038/ncprheum0534 (2007).
Article CAS PubMed Google Scholar
Ludikhuize, J. et al. Inhibition of forkhead box class o family member transcription factors in rheumatoid synovial tissue. Arth. Rheum. 56, 2180–2191. https://doi.org/10.1002/ART.22653 (2007).
Article CAS Google Scholar
Matsuzaki, T. et al. Foxo transcription factors influence cartilage maturation, homeostasis and osteoarthritis pathogenesis by modulating autophagy and proteoglycan 4 hhs public access. Sci. Transl. Med.https://doi.org/10.1126/scitranslmed.aan0746 (2018).
Article PubMed PubMed Central Google Scholar
Pujol, J. et al. Brain imaging of pain sensitization in patients with knee osteoarthritis. Pain 158, 1831–1838. https://doi.org/10.1097/J.PAIN.0000000000000985 (2017).
Article PubMed Google Scholar
Hastie, T., Tibshirani, R. & Friedman, J. The Elements of Statistical Learning (Springer, 2001).
Book Google Scholar
Tassani, S. et al. Relationship between the choice of clinical treatment, gait functionality and kinetics in patients with comparable knee osteoarthritis. Front. Bioeng. Biotechnol. 10, 202. https://doi.org/10.3389/FBIOE.2022.820186/BIBTEX (2022).
Article Google Scholar
Riddle, D. L. & Perera, R. A. The womac pain scale and crosstalk from co-occurring pain sites in people with knee pain: A causal modeling study. Phys. Ther. 100, 1872. https://doi.org/10.1093/PTJ/PZAA098 (2020).
Article PubMed PubMed Central Google Scholar
Adams, G. R. et al. Do “central sensitization’’ questionnaires reflect measures of nociceptive sensitization or psychological constructs? protocol for a systematic review. Pain Reports 6, e962. https://doi.org/10.1097/PR9.0000000000000962 (2021).
Article PubMed PubMed Central Google Scholar
Stern, A. F. The hospital anxiety and depression scale. Occup. Med. 64, 393–394. https://doi.org/10.1093/OCCMED/KQU024 (2014).
Article Google Scholar
Kapoor, M., Martel-Pelletier, J., Lajeunesse, D., Pelletier, J. P. & Fahmi, H. Role of proinflammatory cytokines in the pathophysiology of osteoarthritis. Nat. Rev. Rheumatol. 7, 33–42. https://doi.org/10.1038/nrrheum.2010.196 (2011).
Article CAS PubMed Google Scholar
Raghu, H. et al. Ccl2/ccr2, but not ccl5/ccr5, mediates monocyte recruitment, inflammation and cartilage destruction in osteoarthritis. Ann. Rheum. Dis. 76, 914–922. https://doi.org/10.1136/ANNRHEUMDIS-2016-210426 (2017).
Article CAS PubMed Google Scholar
Hamilton, J. L. et al. Targeting vegf and its receptors for the treatment of osteoarthritis and associated pain. J. Bone Miner. Res. 31, 911–924. https://doi.org/10.1002/JBMR.2828 (2016).
Article PubMed Google Scholar
Eldjoudi, D. A. et al. Leptin in osteoarthritis and rheumatoid arthritis: Player or bystander?. J. Mol. Sci 2022, 23. https://doi.org/10.3390/ijms23052859 (2022).
Article CAS Google Scholar
Mendoza, L. & Xenarios, I. A method for the generation of standardized qualitative dynamical systems of regulatory networks. Theor. Biol. Med. Model. 3, 13. https://doi.org/10.1186/1742-4682-3-13 (2006).
Article CAS PubMed PubMed Central Google Scholar
Cortes, C., Vapnik, V. & Saitta, L. Support-vector networks. Mach. Learn. 20, 273–297. https://doi.org/10.1007/BF00994018 (1995).
Article Google Scholar
Pedregosa, F. et al. Scikit-learn: Machine learning in python. J. Mach. Learn. Res. 12, 2825–2830 (2011).
MathSciNet Google Scholar
Azhagusundari, B. & Thanamani, A. S. Feature selection based on information gain. Int. J. Innov. Technol. Explor. Eng. https://doi.org/10.1016/j.asoc.2008.05.006 (2013).
Article Google Scholar

Download references

Acknowledgements

Catalan and Spanish Governments (2020FI b00680; STRATO PID2021126469ob-C21-2), European Commission (MSCA-TN-ETN-2020-Disc4All-955735, ERC-2021-CoG-O-Health-101044828). G. Piella is supported by the ICREA Academia programme.

Author information

Authors and Affiliations

BCN MedTech, Department of Engineering, Universitat Pompeu Fabra, 08018, Barcelona, Spain
Maria Segarra-Queralt, Mar Galofré, Gemma Piella & Jérôme Noailly
IMIM (Hospital del Mar Medical Research Institute), Hospital del Mar, 08003, Barcelona, Spain
Laura Tio & Jordi Monfort
Rheumatology Department, Hospital del Mar, 08003, Barcelona, Spain
Jordi Monfort & Joan Carlos Monllau
Orthopedic Surgery and Traumatology Department, Hospital del Mar, 08003, Barcelona, Spain
Joan Carlos Monllau

Authors

Maria Segarra-Queralt
View author publications
You can also search for this author in PubMed Google Scholar
Mar Galofré
View author publications
You can also search for this author in PubMed Google Scholar
Laura Tio
View author publications
You can also search for this author in PubMed Google Scholar
Jordi Monfort
View author publications
You can also search for this author in PubMed Google Scholar
Joan Carlos Monllau
View author publications
You can also search for this author in PubMed Google Scholar
Gemma Piella
View author publications
You can also search for this author in PubMed Google Scholar
Jérôme Noailly
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

The conception and design of the study were made by J.N. L.T. and J.M. contributed to the acquisition of data. M.S.Q. wrote the manuscript and along with G.P., M.G. and J.N. analysed and interpreted the data. All authors contributed to the final version to be submitted and revised for important intellectual content.

Corresponding author

Correspondence to Jérôme Noailly.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Supplementary Information.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Segarra-Queralt, M., Galofré, M., Tio, L. et al. Characterization of clinical data for patient stratification in moderate osteoarthritis with support vector machines, regulatory network models, and verification against osteoarthritis Initiative data. Sci Rep 14, 11797 (2024). https://doi.org/10.1038/s41598-024-62212-x

Download citation

Received: 24 October 2023
Accepted: 14 May 2024
Published: 23 May 2024
DOI: https://doi.org/10.1038/s41598-024-62212-x

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Characterization of clinical data for patient stratification in moderate osteoarthritis with support vector machines, regulatory network models, and verification against osteoarthritis Initiative data

Subjects

Abstract

Similar content being viewed by others

Data-driven identification of predictive risk biomarkers for subgroups of osteoarthritis using interpretable machine learning

Classification of four distinct osteoarthritis subtypes with a knee joint tissue transcriptome atlas

Screening of osteoarthritis diagnostic markers based on immune-related genes and immune infiltration

Introduction

Results

Discussion

Methods

Cohort description

Description of the output labels

Input features selection

Mathematical formulation

Thresholds validation

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher's note

Supplementary Information

Supplementary Information.

Rights and permissions

About this article

Cite this article

Comments

Search

Quick links

Subjects

Abstract

Similar content being viewed by others

Data-driven identification of predictive risk biomarkers for subgroups of osteoarthritis using interpretable machine learning

Classification of four distinct osteoarthritis subtypes with a knee joint tissue transcriptome atlas

Screening of osteoarthritis diagnostic markers based on immune-related genes and immune infiltration

Introduction

Results

Discussion

Methods

Cohort description

Description of the output labels

Input features selection

Mathematical formulation

Thresholds validation

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher's note

Supplementary Information

Supplementary Information.

Rights and permissions

About this article

Cite this article

Share this article

Comments

Search

Quick links