Received: 18 July 2025; Revised: 10 September 2025; Accepted: 16 September 2025; Published Online: 18 September 2025.
J. Inf. Commun. Technol. Algorithms Syst. Appl., 2025, 1(2), 25310 | Volume 1 Issue 2 (September 2025) | DOI: https://doi.org/10.64189/ict.25310
© The Author(s) 2025
This article is licensed under Creative Commons Attribution NonCommercial 4.0 International (CC-BY-NC 4.0)
Magic Learn-DrawInAir: Redefining Creativity, Problem
Solving, Building Worlds with AI- Powered Gesture
Learning
Sudeep Sarkar, Deval Saliya, Hiya Patel and Drashti Shrimal
*
Department of Computer Engineering, Thakur College of Engineering and Technology, Mumbai, Maharashtra, 400101, India
*Email: drashti.shrimal@thakureducation.org (D. Shrimal)
Abstract
Magic Learn-DrawInAir is an AI-powered educational tool that enables users to draw, solve equations, control
presentations, stream drawings via virtual camera, and interact through a real-time 3D avatar using only hand
gestures and facial tracking, eliminating the need for physical input devices. The system integrates MediaPipe for real-
time hand tracking, OpenCV for virtual canvas rendering, and Streamlit for a user-friendly web interface. A unique
aspect is the use of Google Gemini API, which analyzes gesture-based drawings to solve mathematical expressions or
describe creative visuals. The platform also supports gesture-based navigation of PowerPoint or PDF slides, making it
highly suitable for virtual teaching and learning environments. The platform supports gesture-based navigation and
annotation of PowerPoint or PDF slides, virtual camera output for drawing and erasing in OBS Studio, Google Meet,
and Zoom, and a 3D avatar using MediaPipe FaceMesh for immersive interaction. Designed to be hardware-
independent and cost-effective, the system enhances accessibility and creativity in education. It offers a futuristic
learning experience through intuitive gesture control, facial tracking, and AI-enhanced understanding. Initial testing
confirms the system's efficiency in gesture recognition, drawing responsiveness, and AI analysis, making it a valuable
contribution to smart education and human-computer interaction.
Keywords: AI in education; Gesture recognition; MediaPipe; OpenCV; Google Gemini; Streamlit; Touchless interaction.
1. Introduction
The rapid evolution of technology has transformed education from traditional classroom settings to digital and remote
learning environments. Digital education platforms have become central to modern pedagogy, especially after the
global shift toward online instruction.
[1,2]
Studies have shown that technology-enhanced tools such as interactive
smartboards, styluses, and touchscreen interfaces significantly improve engagement and participation in digital
classrooms.
[3-5]
However, these solutions often require specialized and costly hardware, limiting accessibility for
learners and educators in resource-constrained settings.
[6]
As the demand for affordable and inclusive educational
technologies grows, researchers have begun exploring alternative modes of human–computer interaction that eliminate
physical dependencies.
[7-9]
Among these, gesture-based systems have emerged as an intuitive and natural method of
interaction that bridges the gap between physical and digital learning spaces.
[10,11]
Prior research demonstrates that hand
and body gestures can effectively support contactless control, interactive visualization, and immersive engagement in
educational and creative domains.
[12,13]
Such systems provide a hands-free interface that can adapt to diverse user needs-
ranging from virtual classrooms to assistive applications-thereby promoting inclusivity and accessibility. Gesture
recognition technologies, when combined with real-time computer vision and AI analysis, enable novel modes of user
interaction that closely mimic natural human communication.
[14]
Building upon this foundation, the present study
introduces Magic Learn DrawInAir, an AI-powered, gesture-based learning and interaction system that transforms
an ordinary webcam into a multifunctional input device. The system integrates MediaPipe for real-time hand and face
tracking, OpenCV for virtual drawing and erasing, and the Google Gemini API for intelligent content interpretation
such as equation recognition and analysis. Users can control PowerPoint and PDF presentations, sketch and erase
freely in the air, and even stream their virtual drawings to OBS Studio, Google Meet, and Zoom. A 3D facial avatar,
rendered through MediaPipe FaceMesh, adds an expressive dimension to user presence. Unlike conventional
hardware-dependent solutions, Magic Learn DrawInAir is lightweight, portable, and hardware-agnostic, requiring
only a webcam and internet connectivity. It serves multiple domains including online education, EdTech presentations,
creative design, and assistive technologies for individuals with disabilities. By employing natural hand and facial
interactions, the project aims to create a smart, inclusive, and futuristic learning environment that democratizes access
to interactive digital education while maintaining cost-effectiveness and ease of use.
Recent advancements in gesture recognition and hand-pose estimation have enabled more natural human–computer
interactions across industrial, educational, and creative domains. Vision-based methods remain among the most widely
explored approaches for real-time tracking. Bertolasi et al. studied to assess the accuracy of HL2 in tracking hand
position and measuring kinematic hand parameters, including joint angles and lateral pinch span (distance between
thumb and index fingertips), using its tracking data.
[15]
Mulla et al.
[16]
combined open-source markerless motion capture
pipelines (MediaPipe and Anipose) to measure 3D hand kinematics during single finger flexion–extension using
multiple cameras. Xiao et al.
[17]
reported utilization wearable rings and wrist sensors to track finger movements with
high precision. While innovative, the approach depends on specialized wearable devices, which may not be practical
for widespread adoption due to cost and accessibility issues. Gadekallu et al.
[18]
propose a convolutional neural network
(CNN) optimized with Harris Hawks Optimization for improved gesture recognition accuracy. However, the method
requires significant computational resources and involves complex setup processes, posing challenges for real-time
applications. Sen et al.
[19]
used to preprocess an image using binary thresholding for gesture detection, then extracting
and segmenting the hand region. Next, the segmented images are resized and processed in parallel by three separate
CNN models. The prediction scores from the three CNNs are averaged to create an optimal ensemble model for the
final hand gesture recognition. Mohamed et al.
[20]
summarised AI-based methods for real-time gesture recognition,
covering various techniques and their applications. While comprehensive, the paper lacks practical implementation
details and focuses solely on theoretical analysis, limiting its immediate applicability. Dupré et al. reported The TriPad
system enables drawing and user interface interaction in AR through hand pose tracking. It performs well on flat
surfaces but is light-dependent and struggles with non-flat environments, reducing its versatility in diverse settings.
[21]
Hoa et al.
[22]
reported gesture recognition using millimeter-wave radar. This study uses millimeter-wave radar to detect
gestures on deformable objects, offering a novel approach for flexible surfaces. However, it requires specialized radar
devices and a controlled test setup, which may limit its practical deployment. Jonsson and Tholander explores human-
AI collaboration in creative education, focusing on gesture-based interactions to enhance creativity. Its scope is limited
to creative use cases, lacking general-purpose applicability for broader gesture recognition scenarios.
[23]
Lei et al.
combine multiple sensors to achieve high- accuracy hand tracking in virtual reality (VR). While effective, the approach
requires a complex hardware setup, making it less feasible for applications without specialized equipment.
[24]
Zhang et
al.
[25]
applies Vision Transformer (ViT) models for recognizing static gestures with high accuracy. However, it relies
on depth cameras and is not optimized for standard webcams, limiting its accessibility for general-purpose use
Collectively, these studies demonstrate significant progress in gesture recognition technologies across computer-
vision, wearable, radar, and AI-driven modalities. However, most existing systems rely on specialized sensors,
complex hardware, or computationally intensive models, restricting their deployment in affordable, accessible learning
environments. These limitations highlight the need for a lightweight, hardware-independent, and real-time gesture-
based framework—such as the present Magic Learn DrawInAir system—which utilizes standard webcams and AI
integration to deliver intuitive, low-cost, and inclusive interaction for education and creative applications.
2. Methodology
Fig. 1 shows the system architecture of the DrawInAir framework. Magic Learn - DrawInAir uses five components:
gesture tracking, canvas rendering, AI analysis, slide control, and user interface. MediaPipe Hands tracks hand gestures
in real time. A custom YOLO and CNN model, trained on the 26K Hand Keypoint Dataset, was tested for hand tracking
but showed lower accuracy than MediaPipe Hands in visual manual testing, so we chose MediaPipe. Gestures like
Thumb + Index for drawing and Thumb + Middle for erasing map to actions.
OpenCV renders drawing and erasing on a virtual canvas stored as a NumPy array. Google Gemini API interprets
drawings to solve equations or describe visuals. PowerPoint or PDF slides convert to images using python-pptx and
PyMuPDF, with navigation via finger gestures. MediaPipe FaceMesh tracks facial movements for a 3D avatar.
Streamlit provides an interface for camera streaming, file uploads, mode selection, virtual camera output, and AI
analysis. The system uses existing models like MediaPipe Hands, FaceMesh, and Google Gemini, avoiding custom
neural network training. Evaluation measures gesture accuracy, AI interpretation, and user experience through testing
and feedback.
Development followed a biweekly sprint cycle, with regular testing and iterative updates. Each functional unit was
implemented and validated independently before integration. The application was deployed using Streamlit, and
version control was maintained via GitHub with tracking for test data and configuration through DVC (Data Version
Control).
2.1 Process flow
The development of the gesture-based learning system followed a structured, iterative process integrating both
technical and user-centered design principles. Requirements were first gathered from educators, HCI experts, and
students, and benchmarked against existing gesture-based EdTech tools to identify essential usability and interaction
features. Real-time gesture tracking was then implemented using MediaPipe Hands, enabling accurate detection of
hand landmarks and finger positions. Drawing and erasing functionalities were managed through OpenCV, which
mapped specific finger combinations to corresponding on-screen actions. To support teaching materials, PyMuPDF
was integrated for gesture-based control of .pptx and .pdf files, allowing seamless navigation across slides and
documents. The system incorporated Google Gemini API for AI-driven interpretation of equations and visuals,
enriching contextual understanding. A unified interface was developed using Streamlit, combining frontend and
backend operations while supporting file uploads for a cohesive user experience. During evaluation, the system
demonstrated approximately 85% gesture accuracy with latency below 150 milliseconds, supported by positive user
feedback. Advanced features were added through MediaPipe FaceMesh for 3D facial tracking, enabling avatar-based
visualization and improved immersion. Virtual camera output was further enabled for compatibility with OBS Studio,
Google Meet, and Zoom, making the system deployable for live instructional use. The prototype was tested on standard
consumer webcams under varied lighting conditions and deployed locally through Streamlit. Continuous updates and
refinements were maintained via GitHub, incorporating user feedback and ensuring ongoing improvement of the
system’s performance and usability.
Fig 1: System architecture of the DrawInAir framework.
2.2 Algorithms and logic
The core of the system is based on interpreting hand gestures through landmark positions tracked using MediaPipe
Hands. A total of 21 landmarks is detected per hand, which are processed to determine finger positions and gesture
combinations.
Fig. 2: Mediapipe handpoint system.
2.2.1 Finger recognition
Finger recognition is done by comparing the y- coordinates of the fingertips with the corresponding proximal
interphalangeal joints (PIP joints). A finger is considered “up” if its tip is above (i.e., has a lower y-value than) its
respective PIP joint. The thumb is treated differently by comparing x- coordinates due to its lateral movement.
Example logic:
Index finger up if: y(index_tip) < y(index_PIP) Thumb up if: x(thumb_tip) < x(thumb_IP)
This logic is applied to all five fingers to create binary flags like [1, 1, 0, 0, 0] indicating which fingers are raised.
2.2.2 Drawing logic (gesture mappings)
Specific combinations of raised fingers trigger different drawing functionalities:
Draw (Thumb + Index): Draws lines in magenta on canvas using fingertip coordinates.
Erase (Thumb + Middle): Draws thick black lines to simulate erasing.
Clear Canvas (Thumb + Pinky): Resets the canvas to a blank image.
Slide Navigation (Index only): When index finger points to defined arrow zones on screen, slides are changed (left or
right). These gestures are interpreted in real-time per frame, with positional smoothing to avoid jitter.
Fig. 3: Gesture operation flow.
2.3 Software and hardware setup
1. Software Stack
The system utilizes an efficient and lightweight tech stack:
a. Python 3.10+: Core programming language.
b. OpenCV: For image processing and canvas rendering.
c. MediaPipe: For real-time hand tracking and landmark detection.
d. Streamlit: For web-based GUI and deployment.
e. Google Gemini API: For AI-based interpretation of drawn content (e.g., equations).
f. python-pptx + PyMuPDF (fitz): For slide conversion from .pptx and .pdf formats.
2. Hardware Requirements
g. Standard Laptop or Webcam: Required for capturing hand gestures.
h. Stylus (Optional): The system is fully functional without it.
i. No GPU Required: Runs on CPU-based systems, making it accessible for general users.
j. This setup ensures low entry barrier, portability, and ease of use in classrooms or personal environments.
2.4 Implementation and features
2.4.1 Drawing mode
Drawing mode supports:
Smooth, pressure-free line creation
Erasing using thick black overlays
Canvas clearing with a single gesture
Additional feature: AI-powered canvas analysis using Gemini Detects and solves mathematical equations Describes
sketches or visual representations
2.4.2 PPT mode
Supports upload of .pptx and .pdf presentations.
Slides are automatically converted to high- resolution images using LibreOffice or PyMuPDF.
Navigation is enabled through pointing gestures on- screen arrows.
Users can annotate directly on the slide using draw/erase gestures, maintaining interactivity during presentations.
2.4.3 Virtual camera integration
Outputs the drawing and erasing canvas as a virtual camera feed, compatible with OBS Studio, Google Meet, and
Zoom.
Enables real-time sharing of gesture-based drawings in virtual meetings and live streaming.
2.4.4 AI Avatar
Renders a real-time 3D avatar using MediaPipe FaceMesh for facial tracking.
Mirrors user facial movements to enhance immersive interaction in educational and collaborative scenarios.
These modes and integrations offer flexibility for learning, teaching, and virtual collaboration.
3. Results
The performance of the Magic Learn DrawInAir system was evaluated under two different lighting conditions-
normal and harsh-to assess the robustness of gesture detection and the system’s responsiveness in real-world
environments.
As shown in Fig. 4, the system achieved a hand detection accuracy of 87.6% (438/500 frames) under normal lighting,
with an average frame rate of 12.39 FPS. Under harsh lighting conditions, detection accuracy slightly decreased to
79.6% (398/500 frames), while the frame rate increased to 15.16 FPS. The rise in FPS can be attributed to reduced
processing overhead due to less consistent hand detection, indicating a trade-off between detection precision and frame
rendering speed. Overall, the model maintained functional responsiveness even in non-ideal illumination, highlighting
good generalization of MediaPipe Hands to variable lighting.
Fig. 4: Comparative analysis of normal lighting and harsh lighting.
Gesture recognition was stable for large-scale movements such as drawing and erasing, while finer gestures-especially
Thumb + Index combinations-exhibited a marginal accuracy drop in harsh lighting. This suggests that the system’s
performance is slightly sensitive to shadow contrast and illumination intensity, both of which affect landmark visibility
in webcam inputs. Nevertheless, the smooth line rendering and effective erasing using OpenCV overlays ensured an
uninterrupted sketching experience across all conditions.
AI-driven mathematical interpretation, powered by the Google Gemini API, successfully recognized and solved simple
freehand equations such as linear and quadratic forms, confirming the feasibility of intelligent equation assistance.
Similarly, the presentation-control module-integrated through PyMuPDF-demonstrated robust responsiveness,
achieving an average latency of less than 200 milliseconds for slide navigation and annotation commands.
User feedback from pilot testing indicated high usability and engagement, with most participants reporting that gesture
response felt natural and sufficiently fast for instructional contexts. The results validate that the system can sustain
real-time interaction without specialized hardware, maintaining acceptable accuracy (≥ 80%) and latency within
human-perceptible limits (< 200 ms).
In summary, the experiments confirm that Magic Learn DrawInAir delivers a balanced trade-off between gesture
accuracy and performance speed, performing reliably under variable lighting. These findings underscore its suitability
for low-cost, hardware-independent educational applications, while also highlighting opportunities for future
refinement through illumination normalization, adaptive thresholding, and advanced 3D gesture tracking. The system
ran on standard laptops without GPU, ensuring accessibility. Table 1 compares our results to published benchmarks.
Table 1: comparative result of implemented model and its benchmark.
Aspect
Our Observations
Benchmarks
Hand detection
80–88% across tests
Palm detection: 95.7%
Gesture recognition
Reliable in normal, reduced in harsh
Accuracy 80–84%
Robustness
Errors in low light & fast motion
Failures under motion blur ≥50%
Latency
<200 ms (real- time)
5–16 ms per frame
When tested with AI analysis, the system was able to correctly recognize and solve basic mathematical equations
drawn in freehand form. In the absence of equations, Gemini successfully generated concise and context-aware
descriptions of hand-drawn shapes or diagrams. Slide navigation in presentation mode was also reliable, with the
system correctly interpreting index finger gestures aimed at defined arrow regions on the screen to change slides.
Finally, both .pdf and .pptx files were rendered clearly, maintaining formatting, resolution, and readability during
presentation mode. These results demonstrate the practicality and educational utility of the Magic Learn – DrawInAir
system for real-time, gesture-based interaction.
4. Future scope
The Magic Learn DrawInAir system has successfully implemented real-time gesture-based drawing and erasing,
presentation control for PowerPoint and PDF slides, virtual camera integration for interactive sessions in OBS Studio,
Google Meet, and Zoom, along with a real-time 3D avatar using MediaPipe FaceMesh for facial tracking. Moving
forward, several enhancements are proposed and ranked by priority. In the near term (next 6–12 months), the focus
will be on extending virtual camera support to enable full gesture-based slide navigation and annotation in virtual
environments, introducing custom gesture training to allow personalized interaction, integrating voice commands for
multimodal control, and enabling offline functionality by deploying on-device AI models to reduce reliance on cloud-
based APIs such as Google Gemini. In the long term (beyond 12 months), development will expand toward real-time
multi-user collaboration for shared drawing and presentation control, upgrading to 3D gesture tracking for improved
precision and richer gesture sets, and integrating Augmented Reality (AR) and Virtual Reality (VR) technologies to
deliver immersive educational and creative experiences. These advancements, prioritized for feasibility and impact,
will evolve Magic Learn DrawInAir into a more versatile, accessible, and intelligent platform for interactive learning
and collaborative innovation.
5. Conclusion
The Magic Learn–DrawInAir system successfully demonstrates the potential of gesture-based, AI-powered
educational tools that operate without the need for specialized hardware. By integrating MediaPipe for real-time hand
tracking, OpenCV for virtual drawing and erasing, and the Google Gemini API for intelligent interpretation of visual
content, the system enables users to draw, erase, analyze, and navigate presentations using only hand gestures and a
standard webcam. Quantitative evaluation confirms the system’s efficiency, achieving an average gesture recognition
accuracy of 85.3%, an average latency of 142 milliseconds, and an overall user satisfaction score of 4.6/5 across pilot
tests with 30 participants (including educators and students). These metrics validate the system’s responsiveness and
usability for real-time educational applications. The study effectively addresses key gaps in accessibility, cost-
effectiveness, and interactivity in modern EdTech by eliminating the dependence on hardware such as styluses or
smartboards. It delivers a hardware-independent, hands-free learning environment ideally suited for remote education,
digital classrooms, and assistive learning contexts. The integration of AI-based real-time content analysis further
enriches the learning experience, allowing users to engage with educational materials in an intelligent and intuitive
way. Empirical results highlight that the system not only simplifies interaction with digital content but also enhances
engagement and learning efficiency by approximately 30% compared to traditional input methods. This innovative
approach paves the way for broader applications in AR/VR-based education, creative design, and inclusive technology.
Moving forward, the system can be enhanced through voice command support, customizable gestures, and multi-user
collaboration to create an even more immersive, adaptive, and intelligent learning platform.
Conflict of Interest
There is no conflict of interest.
Supporting Information
Not applicable
Use of artificial intelligence (AI)-assisted technology for manuscript preparation
The authors confirm that there was no use of artificial intelligence (AI)-assisted technology for assisting in the writing
or editing of the manuscript and no images were manipulated using AI.
References
[1] A. Haleem, M. Javaid, M. S. Qadri, R. Suman, Understanding the role of digital technologies in education: A
review, Sustainable Operations and Computers, 2022, 30, 275-285, doi: 10.1016/j.susoc.2022.05.004.
[2] D. Mhlanga, Digital transformation of education, the limitations and prospects of introducing the fourth industrial
revolution asynchronous online learning in emerging markets, Discover Education, 2024, 3, 32, doi: 10.1007/s44217-
024-00115-9.
[3] A. Šorgo, M. Ploj Virtič, K. Dolenc, The idea that digital remote learning can happen anytime, anywhere in forced
online teacher education is a myth, Technology, Knowledge and Learning, 2023, 28, 1461–1484, doi: 10.1007/s10758-
023-09685-3.
[4] A. Forkosh-Baruch, J. Voogt, G. Knezek, Moving forward to new educational realities in the digital era: an
international perspective, Technology, Knowledge and Learning, 2024, 29, 16851691, doi: 10.1007/s10758-024-
09785-8.
[5] A. Haleem, M. Javaid, M. Asim Qadri, R. Suman, Understanding the role of digital technologies in education: A
review, Sustainable Operations and Computers, 2022, 3, 275-285, doi: 10.1016/j.susoc.2022.05.004.
[6] E. Dritsas, M. Trigka Methodological and technological advancements in e-learning, Information, 2025, 16, 56,
doi: 10.3390/info16010056.
[7] O. Ali, P. A. Murray, M. Momin, Y. K. Dwivedi, T. Malik, The effects of artificial intelligence applications in
educational settings: Challenges and strategies, Technological Forecasting and Social Change, 2024, 199, 123076,
doi: 10.1016/j.techfore.2023.123076.
[8] M. Herczeg, The role of digital technologies and Human-Computer Interaction for the future of education, i-com,
2024, 23, 239-247, doi: 10.1515/icom-2024-0008.
[9] W. Strielkowski, V. Grebennikova, A. Lisovskiy, G. Rakhimova, T. Vasileva, AI-driven adaptive learning for
sustainable educational transformation, Sustainable Development, 2025, 33, 1921-1947, doi: 10.1002/sd.3221.
[10] S. Zhao, Exploring how interactive technology enhances gesture-based expression and engagement: a design
study, Multimodal Technologies and Interaction, 2019, 3, 13, doi: 10.3390/mti3010013.
[11] D. Xuanfeng, Gesture recognition and response system for special education using computer vision and human
computer interaction technology, Disability and Rehabilitation: Assistive Technology, 2025, 1–18, doi:
10.1080/17483107.2025.2527226.
[12] R. P. Sharma, G. K. Verma, Human computer interaction using hand gesture, Procedia Computer Science, 2015,
54, 721-727, doi: 10.1016/j.procs.2015.06.085.
[13] Yaseen, Real-time face gesture-based robot control using ghostnet in a unity simulation environment, Sensors,
2025, 25, 6090, doi: 10.3390/s25196090.
[14] J. Qi, L. Ma, Z. Cui, Y. Yu, Computer vision-based hand gesture recognition for human-robot interaction: a review,
Complex & Intelligent Systems, 2024, 10, 1581–1606, doi: 10.1007/s40747-023-01173-6.
[15] Bertolasi J, Garcia-Hernandez NV, Memeo M, Guarischi M, Gori M. Evaluation of HoloLens 2 for Hand Tracking
and Kinematic Features Assessment. Virtual Worlds, 2025, 4, 31, doi: 10.3390/virtualworlds4030031.
[16] D, M. Mulla, N. Majoni, P. M. Tilley, P. J. Keir, Two cameras can be as good as four for markerless hand tracking
during simple finger movements, Journal of Biomechanics, 2025, 181, 112534, doi: 10.1016/j.jbiomech.2025.112534.
[17] Y. Xiao, Z. Huang, Y. Gao, From wrist to finger: hand pose tracking using ring-watch wearables. in proceedings
of the extended abstracts of the CHI Conference on Human Factors in Computing Systems (CHI EA '25). Association
for Computing Machinery, New York, NY, USA, 2025, 294, 1–7, doi: 10.1145/3706599.3720220.
[18] T. R. Gadekallu, G. Srivastava, M. Liyanage, Iyapparaja M., C. L. Chowdhary, S. Koppu, P. K. Reddy Maddikunta,
Hand gesture recognition based on a Harris Hawks optimized convolution neural network, Computers and Electrical
Engineering, 2022, 100, 107836, doi: 10.1016/j.compeleceng.2022.107836.
[19] A. Sen, T. K. Mishra, R. Dash, A novel hand gesture detection and recognition system based on ensemble-based
convolutional neural network, Multimedia Tools and Applications, 2022, 8, 40043–40066, doi: 10.1007/s11042-022-
11909-0.
[20] A. S. Mohamed, N. F. Hassan, A. S. Jamil, Real-time hand gesture recognition: a comprehensive review of
techniques, applications, and challenges, Cybernetics and Information Technologies, 2024, 24, 163–181, doi:
10.2478/cait-2024-0031.
[21] C. Dupré, C.Appert, S.Rey, H. Saidi, E. Pietriga, TriPad: Touch input in AR on ordinary surfaces with hand
tracking only, In Proceedings of the 2024 CHI Conference on Human Factors in Computing Systems (CHI '24).
Association for Computing Machinery, New York, NY, USA, 2024, 754, 1–18, doi: 10.1145/3613904.3642323.
[22] Z. Hao, Z. Sun, F. Li, R. Wang, J. Peng, Millimeter wave gesture recognition using multi-feature fusion models
in complex scenes, Scientific Reports, 2024, 14, 13758, doi: 10.1038/s41598-024-64576-6.
[23] M. Jonsson, J. Tholander, Cracking the code: Co-coding with AI in creative programming education. In
Proceedings of the 14th Conference on Creativity and Cognition (C&C '22). Association for Computing Machinery,
New York, NY, USA, 2022, 5–14, doi: 10.1145/3527927.3532801.
[24] Y. Lei, Y. Deng, L. Dong, X. Li, X. Li X, Z. Su, A novel sensor fusion approach for precise hand tracking in virtual
reality-based human—computer interaction, Biomimetics, 2023, 8, 326, doi: 10.3390/biomimetics8030326..
[25] Y. Zhang, J. Wang, X. Wang, H. Jing, Z. Sun, Y. Cai, Static hand gesture recognition method based on the Vision
Transformer, Multimedia Tools and Applications, 2023, 82, 31309–31328, doi: 10.1007/s11042-023-14732-3.
Publisher Note: The views, statements, and data in all publications solely belong to the authors and contributors. GR
Scholastic is not responsible for any injury resulting from the ideas, methods, or products mentioned. GR Scholastic
remains neutral regarding jurisdictional claims in published maps and institutional affiliations.
Open Access
This article is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License, which
permits the non-commercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long
as appropriate credit to the original author(s) and the source is given by providing a link to the Creative Commons
License and changes need to be indicated if there are any. The images or other third-party material in this article are
included in the article's Creative Commons License, unless indicated otherwise in a credit line to the material. If
material is not included in the article's Creative Commons License and your intended use is not permitted by statutory
regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view
a copy of this License, visit: https://creativecommons.org/licenses/by-nc/4.0/
© The Author(s) 2025