Pre-Trained Language Models Based Sequence Prediction of Wnt-Sclerostin Protein Sequences in Alveolar Bone Formation
Background and Introduction: Osteocytes, the most numerous bone cells, create sclerostin. The sclerostin protein sequence predictive model helps create novel medications and produce alveolar bone in periodontitis and other oral bone illnesses, including osteoporosis. Neural networks examine protein variants for protein engineering and predict their structure and function impacts. Proteins with improved function and stability have been engineered using LLMs and CNNs. Sequence-based models, especially protein LLMs, predict variation effects, fitness, post-translational modifications, biophysical properties, and protein structure. CNNs trained on structural data also improve enzyme function. It is unknown if these models differ or forecast similarly. This study seeks Pre-trained language models to predict Wnt-Sclerostin Protein sequences in alveolar bone formation. Methods: Using UniProt ID, sclerostin and related proteins (Q9BQB4, Q9BQB4-1, Q9BQB4-2, Q6X4U4, O75197) were identified and quality-checked. Deepbio analyzed FASTA sequences. Deep Bio is a one-stop web service allowing academics to build any biological deep-learning architecture. DeepBIO used deep learning to improve and visualize biological sequencing data. LLM BASED Reformer, AAPNP, TEXTRGNN, VDCNN, and \(RNN\_CNN\) split sequence-based datasets into test and training. We randomly partitioned each dataset into 1000 training and 200 testing sets to change hyperparameters and measure performance. Results: Reformer, AAPNP, TEXTRGNN, VDCNN, RNN CNN exhibit 93, 64, 51, 91, and 64 percent accuracy. Conclusion: Protein sequence-based massive language models are growing, and R\&D is solving complicated challenges.