UIU Institutional Repository

    • Login
    View Item 
    •   UIU DSpace Home
    • School of Science and Engineering (SoSE)
    • Department of Computer Science and Engineering (CSE)
    • M.Sc Thesis/Project
    • View Item
    •   UIU DSpace Home
    • School of Science and Engineering (SoSE)
    • Department of Computer Science and Engineering (CSE)
    • M.Sc Thesis/Project
    • View Item
    JavaScript is disabled for your browser. Some features of this site may not work without it.

    Towards aligning Motion, IMU, EMG and Text

    Thumbnail
    View/Open
    Final Thesis (6.198Mb)
    Date
    2026-05-11
    Author
    Aadeeb, Md Shadman
    Metadata
    Show full item record
    Abstract
    Multimodal alignment and multimodal generation are two important technologies that are widely used in the field of Artificial Intelligence (AI). Multimodal alignment allows AI models to identify similarities between two different modalities of data by transforming them into a shared representation with the same dimensional structure. Multimodal generation, on the other hand, enables an AI model to generate one type of data from another using this common representation. Together, these technologies provide a useful way to compare information from different modalities and to transfer or generate data from one modality to another. In this research, multimodal alignment and multimodal generation were performed across four different modalities: three-dimensional keypoint sequences, inertial measurement unit signals, electromyography signals, and textual data. For this purpose, four encoders and four decoders were developed so that each modality could be mapped into a shared embedding space and then reconstructed or generated across modalities. To overcome the lack of fully paired datasets, an unpaired cross-modal training method was used. This approach led to the development of an Explainable AI-based system that can provide meaningful insights into a person’s movement and motion by connecting information from multiple sources. The results showed that the proposed models performed well, especially in preserving motion-related information across the different modalities. With the help of the developed encoders and decoders, the models were able to match or surpass several previous benchmarks. The text generation model achieved a BERTScore of 40.57, which was higher than previous models. For EMG-to-pose generation, the 3D keypoint position RMSE was 0.0873, which is very close to the performance reported by the authors of the dataset paper. Similarly, for IMU-to-pose generation, the average rotation error was approximately 17.6589◦, also close to the dataset authors’ result. Further analysis also revealed interesting patterns in how movement-related information is carried across different modalities. In addition, embedding arithmetic showed that two embeddings could be combined to produce mixed results, suggesting that the shared embedding space learned meaningful relationships between modalities. Overall, these findings show that the proposed multimodal framework can effectively capture,
    URI
    http://dspace.uiu.ac.bd/handle/52243/3470
    Collections
    • M.Sc Thesis/Project [167]

    Copyright 2003-2017 United International University
    Contact Us | Send Feedback
    Developed by UIU CITS
     

     

    Browse

    All of DSpaceCommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

    My Account

    LoginRegister

    Copyright 2003-2017 United International University
    Contact Us | Send Feedback
    Developed by UIU CITS