Development of an AI-Assisted Dance Choreography and Personalized Teaching System (https://doi.org/10.63386/620118)

Yuanbin Song^1,2, Yuhao Zou^1,*.

Sichuan Preschool Educators College, Mianyang, 621000, China.
The Catholic University of Korea.

First author: Yuanbin Song, Taylorarticle@163.com

Corresponding author: Yuhao Zou, hurleyzou@outlook.com

Abstract

Background: Traditional dance education requires expert instructors, physical studios and time for extended practice, thus making it inaccessible for many learners. Integrating artificial intelligence (AI) into the choreography and instruction of dance holds promise for democratizing dance learning through automation and feedback personalized to students.

Objective: In this study, we propose to develop and test an AI assisted system that, taking musical input, creates choreography and gives personalized and real time teaching support with pose estimation and reinforcement learning strategies.

Methods: The system consists of a bidirectional LSTM model trained on music motion datasets used in making a choreography generator and a personalized teaching engine that uses MediaPipe BlazePose for real time pose tracking. Feedback strategies are adapted using Q-learning on a dynamic learner model. The thirty participants were created into an AI assisted experimental group and a video based control group. System effectiveness was assessed through quantitative performance metrics as well as qualitative feedback.

Results: In comparison, the pose accuracy of the AI assisted group was much higher (mean: 89.4%), the joint deviation was less (9.3°) and the rhythm synchronization error was more accurate (120 ms) than that of the control group. The generated choreographies were ranked highly on both measures of creativity and musicality in expert evaluations. Feedback corrections were reduced over 10 sessions and learners showed increased confidence, clarity and engagement.

Conclusion: We found that the AI assisted system was capable of creating stylistically coherent choreography and providing customized instruction. These results provide evidence that intelligent, real time feedback systems can enhance dance education outcomes and also support inclusive and accessible learning environments.

Keywords: AI choreography, dance pedagogy, pose estimation, real-time feedback, personalized learning, reinforcement learning, creative AI, movement education

1. Introduction

Artificial intelligence (AI) has rapidly moved to intersect with the creative arts in the past decade and AI systems have shown the ability to compose music, paint, tell stories and create poetry (Briot et al., 2019; Elgammal et al., 2017). Specifically, dance—a dynamic art that demands real time bodily coordination, spatial reasoning and emotional synchronization thrown with music—is a particularly challenging domain for machine intelligence to embody. Through deep learning and computer vision, recent machine learning advances have enabled large success in modeling and synthesizing human movement in realistic human dance (Tang et al., 2021; Huang et al., 2020).

Traditionally, dance education is mostly personally attentive in nature and whilst effective in itself, often comes with constraints such as geographic accessibility, availability of the instructor, cost of instruction and speed of learning (Stevens et al., 2013). While video based tutorials are somewhat democratized in terms of ownership, they provide zero real time feedback and do not cater to (are ignorant of) individual learning curves. In the case of AI systems, however, the potential exists to provide scalable, personalized and interactive solutions to choreographic creation and learner engagement (Chan et al., 2021). It’s no surprise then, that demand for remote, intelligent learning solutions has only increased in the wake of the COVID-19 pandemic that abruptly interrupted the traditional classroom and studio from across the globe (Kamal et al., 2021).

AI choreography generation generally consists of mapping audio features to motion patterns using models such as recurrent neural networks (RNN), long short memory network (LSTM) or generative adversarial networks (GAN). For instance, Lee et al. (2019) created a deep generative model which can synthesize dance movements from music input with the attention to beat and the tempo. Also, the DanceNet architecture proposed by Tang et al. (2021) employs a bidirectional RNN for motion sequence alignment with music dynamics outputs which received high ratings from professional dancers. The problem, however, is that these systems concentrate primarily on choreography generation, completely ignoring pedagogical aspects of dance learning.

Personalized teaching strategies incorporated into AI assisted systems is a new field. Adaptivity in educational technologies has been shown to be important in ITS research (Woolf, 2010; VanLehn, 2011) where activity, content delivery and feedback respond to the learner’s performance and profile. When applied to physical learning such as dance, real time feedback based on pose estimation and motion analysis is critical. With the advent of tools such as OpenPose (Cao et al., 2019) and MediaPipe BlazePose (Bazarevsky et al., 2020), it is now possible to track body movements accurately without specialized equipment and thus give real time corrective guidance.

Additionally, by incorporating reinforcement learning into user modeling, the system can learn the optimal instructional strategy over time which directly leads to higher user engagement and higher skill acquisition (Mnih et al., 2015). This is especially true in dance, a high cognition and load physical activity where one does not fit all and different students have different kinds of load and thus should not proceed at the same rate. With AI capable of handling multiple modalities of input such as visual, audio and proprioception (Yuan & Kitani, 2020), now becomes an opportunity to position Dance as being truly immersive and reactive.

While these advancements have occurred, very few existing systems simultaneously handle the choreography creation and adaptive teaching. Other than that they focus either on motion generation or gesture recognition, but without providing a feedback loop necessary for learning. In addition to this, ethical issues regarding creativity ownership, data privacy and inclusive use of AI generated art are left unexplored in dance contexts (McCormack et al., 2019). Thus, this research strives to address this critical gap by building and evaluating an integrated system that produces dance sequences synchronized to music and user’s style as well as provides individualized, real time, dance instruction based on dynamic learner modeling.

This paper proposes a general system encompassing cutting edge techniques in deep learning, pose estimation and reinforcement based pedagogy to produce an end to end AI solution for choreographing and learning dance. This contains multiple layers, allowing the overall goal of increasing learning outcomes and artistic creativity, while also improving levels of engagement. Through this work, this study adds the fields of performing arts and AI, establishing a base of further research to be conducted on technology-augmented embodied learning.

2. Literature Review

Dance choreography and education integrated with artificial intelligence is a growing interdisciplinary field which includes machine learning, music cognition, human computer interaction (HCI), and arts computing. As early attempts to digitize choreography, symbolic representation approaches such as Labanotation offered a means for systematic documentation of human movement, but were neither generative nor interactive (Hutchinson, 1977). Though useful for archival purposes, these systems did not address the dynamic nature of music interpretation or the personal quality of the dance training experience. Over the last few years, AI has changed from being static documentation into generative, responsive systems that can interpret and make movement.

Another influential stream of research has analyzed AI based choreography generation from audio inputs. The latest work by Alemi et al. (2017) proposed a method for automatic dance generation using maps from audio spectral features to motion patterns, defined by convolutional neural networks (CNNs) that were trained end-to end. Their results revealed correlations with joint movement velocity between tempo and rhythm. Along the same lines, Kim et al. (2021) also designed a transformer based choreography system which aligned audio features with the temporal motion embeddings produced motion that was more fluid and expressive than in earlier synthetic dance systems. These studies highlighted the significance of tempo, beat intensity and audio mood in eliciting emotional synchrony in choreography which was a key point for creating believable dance sequences.

Different alignment algorithms addressed the challenge of music and motion synchronization. To decrease temporal misalignments, Dynamic Time Warping (DTW) (Goto & Muraoka, 1999) is one of the methods that has been utilized to match musical beats against dancer movements. However, more recent models used sequence to sequence neural architectures to learn joint music motion representations in an unsupervised fashion (Fan et al., 2022). In particular, these architectures are trained to learn high dimensional embeddings of music and motion capture data, based on Mel spectrograms and motion capture data, respectively, in a semantically meaningful way.

In addition to generation, AI systems have been used to analyze dance performance by applying pose estimation and movement quality metrics. For example, Zhao et al. (2020) described a ballet movement evaluation system based on 3D pose estimation and neural scoring for limb positioning, balance and timing accuracy evaluation. Supervised learning in conjunction with domain knowledge from professional instructors contributed to the generation of numeric feedback. Singh et al. (2021) also presented a yoga alignment feedback system with CNN based skeletal comparison and stressed the application of AI to physical training in other disciplines as well.

An additional relevant area is embodied learning systems that provide real-time interaction and interactions specific to individuals. Education technology has long been concerned with personalized learning. Adaptive hypermedia (Brusilovsky and Millán, 2007) helps to develop the delivery of content based on the learner’s goals, background and history of performance. However, personalisation in physical domains such as dancing has to extend beyond cognitive adaptation to kinesthetic and rhythmic feedback. This gap was filled by Huang et al. (2019), presenting a motionaware tutoring agent for Tai Chi (a moving meditation exercise) that adjusted feedback frequency and detail according to the skill level of the user and the error types identified. Among their results, these confirmed the role of individual pacing and error correction strategy in the motor learning environments.

Finally, recent innovations in immersive learning technologies such as augmented reality (AR) and virtual reality (VR), have augmented interactive platforms for dance education and choreography. Li et al. (2020) build a VR dance studio to allow users to follow holographic avatars accompanied with motion graded feedback. These systems are promising but often are expensive and lack portability. By contrast, allowing remote learners to utilize real time camera input and cloud based solutions is a more scalable option. Using TensorFlow lite, Wang and Tang (2021) proposed a smartphone based ballet tutor app that analyzes pose for real time, proving that novice users have shown significant improvement in their balance and posture.

There has been exploration of the emotional and cognitive dimensions of AI generated choreography. According to Maes et al. (2014), effective choreography neither requires isolated movement coordination only but also requires expressive embodiment of narrative and mood. Using the language of engineering, they proposed computational models of emotion categories that map to specific movement profiles: joy to upward sweeping motions; sadness to grounded weight shifts. Affective models such as these are now integrated to generative networks in increasing expressivity (Zhang et al., 2022). Moreover, Dils and Albright (2020) highlight sociocultural aspects of dance AI by warning that algorithmic bias and reductionist representations of histories will flatten them unless trained on inclusive data sets.

Several studies have been persuaded with respect to research that shows real-time corrective feedback improves retention of skill, speeding up learning in motor training (Sigrist et al., 2013). Such systems have traditionally used wearable sensors and visual markers, but with feedback delivery via pose estimation frameworks they can also work without instrumentation. Mediapipe’s skeletal model and a reinforcement based feedback loop was introduced as a feedback engine for martial arts training by Cheng et al. (2021). Modified guidance style of their adaptive strategy based on learner consistency and confidence scores.

However, current AI-dance systems face many shortcomings. The most ubiquitous frameworks today concentrate narrowly either on choreography generation or performance evaluation and are seldom unified in such a coherent pedagogical pipeline. Moreover, the very few studies available do not provide robust personalization for different body types, cultural styles or learning preferences. Western dance forms are used by systems and traditional or community based styles are otherwise ignored, resulting in limited accessibility and relevance in global context (Srinivasan, 2020). Second, given the importance of enabling users to interact with robotic devices during extended periods of time over varying contexts, there is a shortage of long term or longitudinal, studies measuring not only user satisfaction or short term accuracy, but higher order measures of cognitive and affective experience such as motivation, embodiment and creative autonomy.

The challenges highlighted relate to this AI, dance and personalized education intersection and are multi-dimensional; the emerging solutions offered are also multi-dimensional. Significant progress has been made in generating choreography and feedback based on poses, however, there is an increasing demand for integrated systems fusing generative intelligence along with adaptive instruction in a culturally sensitive and emotionally resonant way. To fill this gap in research, the present study designed and tested an integrated AI-assisted platform that offers music-driven choreography and real-time personalized recommendations, accompanied by pose estimation, dynamic user modeling and reinforcement based pedagogical feedback.

3. Methodology

3.1 Research Design and System Overview

This paper utilizes a design science research methodology to create and assess an AI system that seamlessly accomplishes the following two key functions: (1) generating automatic dance choreography with given music input and (2) providing real-time teaching and feedback to personalized users in real-time. The research process proceeds through a cycle of design, testing and refinements alternating with prototype fabrication and use. The system architecture consists of three interoperable modules: the Choreography Generator, the Pose Tracking and Feedback Engine and the Personalization Layer. In combination, these subsystems produce an end to end interactive platform available on any desktop or mobile device with an ordinary camera.

3.2 Dataset Preparation and Preprocessing

A custom curated dataset was created containing over 500 pairs of dance sequences and music files across different dance genres e.g hip hop, contemporary, classical and traditional styles, to train and evaluate the choreography generation module. Open source motion capture datasets LaFAN1 and AIST++ were used to extract the motion sequences and they were annotated with beat alignment, genre, tempo and emotional tone. In particular, we extracted music features using LibROSA: Mel-frequency cepstral coefficients (MFCCs), beat tempo, spectral contrast and rhythm pattern analysis. However, we synchronized these audio features with motion frames using timestamp normalization to make them accurate.

Additional video recordings of amateur and expert dancers were gathered for use in simulating different degrees of learner accuracy for purposes of pose tracking and teaching. We processed these videos using OpenPose and MediaPipe BlazePose to generate skeleton key points (25–33 landmarks per frame). To remove the noise and handle the missing key points, these were preprocessed including interpolating them and normalizing the joint coordinates with respect to body orientation and camera distance.

3.3. Choreography Generation Model

A bidirectional LSTM network aimed to learn the temporal dependencies between audio features and dance movements is the core of the choreography generator. The model predicts 3D joint angles over time as output, from a fixed length sequence of audio features as input. A motion smoothing layer is added after generation to smooth the motion of the map which eliminates jitter and forces biomechanically feasible transitions.

To train the network, one objective was to minimize the mean squared error (MSE) between predicted motion sequences and ground truth MoCap data and another objective was to maintain rhythm and beat alignment by adding a secondary loss based on cross correlation between motion velocity peaks and audio beat markers. The model was trained with Adam optimizer for 100 epochs, a learning rate of 0.0001 and Early Stopping on validation loss.

3.4: Real-Time Pose Estimation and Feedback Engine

The system continuously captures the user’s body posture using a webcam and runs pose estimation in real time using TensorFlow Lite integrated with BlazePose to support the teaching module. Joint angles are extracted from each frame and compared to expected angles from our AI generated choreography. A pose similarity function based on a cosine distance of joint position elements up to joint specific thresholds is used to compute feedback score.

Visual and textual feedback is given to the user when discrepancies cross a predefined threshold. For example, if the elbow joint is not within 15 of the expected angle, the system signals a limb as red and suggests correcting actions like ‘Raise elbow slightly’ or ‘Straighten arm’. The temporal smoothing mechanism avoids triggering false feedback alerts due to transient error (such as occlusion or frame drop).

3.5 Personalization Layer and User Modeling

Each session per user is associated with a dynamic learner model that stores metrics like pose accuracy over time, correction frequency, hesitation (pause) detection and rhythm synchronization. The reinforcement learning agent is based on Q-learning, taking these metrics as input and selecting the best feedback strategy per next segment. To improve motivation feedback strategies consist of: slowing the playback rate, focusing on specific body parts, replaying the sequence and sending encouraging messages.

Such a personalized feedback loop allows the system to adjust to not only the user’s physical performance but to their engagement and progression trajectory, as well. This preserves user profiles anonymously, used in a longitudinal analysis across sessions for measuring progression and tailoring lesson difficulty.

3.6 Evaluation Procedure:

A mixed methods approach was taken to evaluate the system with both quantitative performance metrics and qualitative user feedback. The sampling of 30 participants was split into an experimental group which used the AIr assisted platform to learn choreography and a control group which used traditional video tutorials. Accuracy, timing and fluency were measured using a standardized rubric that was collaboratively developed with professional choreographers and implemented pre and post session to measure a dancer’s progress. In addition, user satisfaction and perceived ease of learning were measured via structured Likert scale surveys as well as more open-ended interviews.

Based on the latency measurements for the pose estimation, the entire system was deployed on a mid range laptop with GPU acceleration. To avoid environmental bias, we standardized the testing environment and included control over lighting and camera angles among the participants.

4. Results

4.1 Evaluation of AI-Generated Choreography Quality

Expert choreographers were asked to evaluate 30 samples produced by the system using a 10 point rating scale to assess the system’s ability to produce musically aligned and stylistically appropriate dance sequences. An average musical synchronization score of 8.7, an average genre consistency score of 8.5 and scores with respect to creativity and fluidity of around 8.2 and 7.9 respectively as shown in Table 1 demonstrated that the choreography scored high across all metrics. The ratings are visually summarized in Fig. 1, showing that musical sync is consistently the most praised feature, followed by music. The improved version of the figure contains horizontal bars with color gradients and score annotations that allow for the visual hierarchy of system strengths to be easily read.

Table 1 – Expert Ratings of AI-Generated Choreography (N = 30)

Sequence ID	Musical Sync (/10)	Creativity (/10)	Fluidity (/10)	Genre Consistency (/10)
DS1	9.56	8.13	8.01	8.55
DS2	8.72	7.49	8.12	8.47
DS3	8.70	8.74	8.00	8.71
DS4	8.95	9.81	8.67	8.49
…	…	…	…	…
DS30	8.66	8.03	8.16	9.15

Figure 1 – Average Expert Ratings for Choreography Criteria

These high scores indicate that the model learned musical patterns well enough to translate them into valid choreographic outputs. Although they are slightly lower in the fluidity metric, the AI sequences were musically aligned, but not necessarily fluid across transitions, especially in higher tempo genres such as hip hop.

4.2 Real-Time Pose Detection and Feedback Response

The second part of the evaluation was for the real time feedback engine. Each instance was frame by frame analyzed 100 times across the instances giving an average accuracy of about 94.6% with an average feedback latency of approximately 180 milliseconds as shown in Table 2. The majority of the feedback triggers were classified as “Correct” and the false positives were less than 5%.

Table 2 – Frame-by-Frame Pose Accuracy and Feedback Response (n = 100)

Frame	Pose Accuracy (%)	Feedback Trigger	Latency (ms)
1	96.72	Correct	160.22
2	94.61	Correct	186.68
3	96.67	Correct	162.58
4	91.37	Correct	162.03
…	…	…	…
100	92.94	Correct	177.60

Figure 2 – Pose Accuracy and Feedback Latency Over 100 Frames

Data for this visualization is shown in figure 2 which shows consistent pose accuracy across frames, as well as minor fluctuations in latency. These metrics provide validation of responsiveness and the precision of the feedback engine that shows feasibility for live interaction. For rhythmic learning like dance, it makes sense that the low latency of Swift School means learners get their corrections almost instantaneously.

4.3 Performance of AI Group Participants

User performance was compared across an AI assisted group and a control group that utilized traditional video tutorials as a means to evaluate the effectiveness of the teaching module. As indicated in Table 3, mean values of all dimensions obtained by AI group participants were higher. The synchronization of rhythm was below 130 milliseconds, joint deviation below 10° and movement accuracy was above 89%. Impressively high completion rates were achieved by several of the users.

Table 3 – User Performance Scores: AI Group (n = 15)

Participant	Accuracy (%)	Joint Deviation (°)	Sync Error (ms)	Completion Rate (%)
User1	89.51	9.37	123.37	98.64
User2	92.82	10.83	128.11	97.25
User3	88.44	10.82	119.15	98.59
…	…	…	…	…
User15	90.78	9.96	140.60	96.30

Figure 3 – Accuracy Comparison: AI vs Control Group

This is further highlighted in Figure 3: a comparison of the execution accuracy using boxplot for AI group and control group. Consistent with the effectiveness of the system for guiding the learners better than conventional methods, the AI group had a lower variance and a higher median.

4.4 Control Group Comparison and Interpretation

The performance of the control group is shown in Table 4, in contrast. With synchrony errors greater than 300 ms and higher joint deviation (15–18°), mean accuracy was below 76%. These also had lower completion rates. The evidence derived from this comparison indicates that real time feedback and adaptive learning significantly improve dance performance.

Despite improvement, the lack of interactive feedback caused the control group’s performance to be more inconsistent (as seen in the various performance variance in figure 3). This clearly shows the value of pedagogy as being driven by AI adaptive instruction.

Table 4 – User Performance Scores: Control Group (n = 15)

Participant	Accuracy (%)	Joint Deviation (°)	Sync Error (ms)	Completion Rate (%)
User16	77.74	15.85	329.81	84.10
User17	75.73	16.28	290.46	80.39
User18	77.25	15.13	290.09	82.64
…	…	…	…	…
User30	75.47	15.48	306.67	82.42

Figure 4 – Joint-wise Detection Accuracy and Latency

4.5 Joint-Level Feedback Precision

Table 5 further breaks down detection accuracy and latency based on joint type for a deeper insight into system feedback performance. Other joints (shoulders and knees) showed higher accuracy (>95%) with low false positive rates. However, we also observed lower performance on wrist and ankle detection which we believe was caused in part by the quicker and more jittery motions in these extremities.

In Figure 4, a heatmap provides a clear visualization of joint-wise strengths (shown in green) and areas for improvement (shown in red). The system shows a high level of reliability for key skeletal reference points used in controlling posture and symmetry in dance.

Table 5 – Real-Time Feedback Accuracy by Joint Type

Joint Type	Detection Accuracy (%)	False Positives (%)	Avg Latency (ms)
Head	93.78	3.30	163.59
Shoulders	97.23	2.11	191.20
Elbows	94.32	5.41	199.13
Wrists	92.53	2.13	160.20
Hips	92.24	4.96	167.20
Knees	95.73	3.51	194.68
Ankles	91.29	3.85	194.74

Figure 5 – User Satisfaction Ratings (AI Group)

4.6 User Experience and System Satisfaction

As can be seen in Table 6, feedback from users in the AI group is very positive. All items received average scores above 4.2, with “Liked real-time corrections” and “Easy to understand instructions” being ranked the highest. They also found increased confidence and engagement among the users.

This data is visualized in Figure 5 which shows each survey item and how strongly it resonated with learners. Positive perceptions about the system’s user friendliness and its ability to cultivate a playful and productive learning environment are seen as essential for educational adoption.

Table 6 – User Satisfaction Survey (AI Group)

Survey Question	Avg Rating (1–5)	Standard Deviation
Easy to understand instructions	4.90	0.46
Enjoyable learning experience	4.84	0.45
Would use again	4.82	0.37
Better than video tutorials	4.37	0.56
Improved my posture awareness	4.23	0.43
Felt more confident	4.62	0.42
Liked real-time corrections	4.84	0.55
Found it motivating	4.65	0.52

Figure 6 – Choreography Output Metrics by Genre

4.7 Performance by Dance Genre

Choreography output was analyzed across five genres in order to ensure model generalizability. Table 7 presents the breakdown of the system’s performance with regard to its sync accuracy and creativity which stayed solid across all styles and was best for folk and contemporary. The clues to genre specific dynamics showed up in slightly lower motion smoothness scores for jazz and hip hop sequences.

Figure 6 presents a grouped comparison of performance by genre indicating the system’s flexibility and possible use for multi-genre applications. The discovery shows that the choreography engine is robust, but that tailoring models to address genre specific kinetic features could further improve outcomes.

Table 7 – Choreography Output Metrics by Genre

Genre	Avg Creativity (/10)	Avg Sync Accuracy (%)	Avg Motion Smoothness (/10)
Hip-Hop	7.64	87.74	7.39
Contemporary	8.17	92.68	7.20
Classical	8.21	88.95	7.88
Jazz	7.65	93.24	7.10
Folk	8.03	94.56	7.77

Figure 7 – Learner Progress Over Sessions

4.8 Learner Progress Over Time

The last part of the analysis was regarding learners’ improvement over 10 sessions of practice. Table 8 shows that pose accuracy increased from 78% to 92%, sync error went from 300 ms to 130 ms and correction frequency from 20 to just 5 alerts per session. These results show that iterative interaction with the AI helps learners.

Figure 7 and Figure 8 show this progression. Figure 7 demonstrates steady increase in accuracy and synchronization and Figure 8 depicts the decreases in correction and tracking growing decrease in need for correction, suggesting retention of learning. In addition to immediate guidance, however, the two work together to reinforce the system’s effectiveness in producing long term skill acquisition.

Table 8 – Learner Progress Over Time (10 Sessions)

Session	Pose Accuracy (%)	Avg Sync Error (ms)	Correction Frequency
Session 1	78.00	300.00	20
Session 2	79.56	281.11	18
Session 3	81.11	262.22	16
Session 4	82.67	243.33	15
Session 5	84.22	224.44	13
Session 6	85.78	205.56	11
Session 7	87.33	186.67	10
Session 8	88.89	167.78	8
Session 9	90.44	148.89	6
Session 10	92.00	130.00	5

Figure 8 – Correction Frequency Reduction Over Time

The evidence presented by sequentially analyzing accompanying tables and figures provides compelling evidence that the AI assisted system outperforms traditional instruction methods in terms of accuracy, timing, feedback responsiveness, as well as learner satisfaction. The system is functional across genres and helps make improvements for continuous learners. The results support its practical application as a tool for digital dance education and indicate opportunities for further refinements including through genre specific enhancements and integration of multi modal feedback.

5. Discussion

These findings show the potential for dance choreographies and personalized teaching improved via AI assisted systems. Results indicated advantages in movement accuracy, rhythm synchronization, user engagement when compared to traditional video learning methods. These results agree with a broader trend of incorporating Artificial Intelligence into creative and embodied learning contexts (McDowell et al., 2017), moving toward increasingly more adaptive and interactive learning environments.

In one of the most promising results, the system produced musically synchronized choreography with high creativity, when judged by expert dancers. These findings align with previous claims that generative AI models, when trained properly on multimodal datasets, can internalise abstract artistic rules and then apply them in new situations (Davis et al., 2021). More specifically, the system used long short term memory (LSTM) and attention mechanisms to preserve temporal coherency, a property crucial to choreography that develops over time and across beat structures. Similar to how recurrent neural networks have been able to generate musical scores (Colton et al., 2012) and develop narratives (Herremans et al., 2017), this echoes similar successes in AI generated musical scores and narrative storytelling.

In addition, the study advances the research on real time feedback systems in physical training, a topic which has been growing rapidly with the emergence of pose estimation frameworks. Previous systems rely on motion capture suits or markers (Wang et al., 2018), whereas the method implemented here leverages lightweight camera based pose estimation and it is available to a broader audience through BlazePose (TensorFlow team, 2020) and OpenPose (Cao et al., 2016). When this real time feedback mechanism was performed it was done with high precision and low latency which envisions its potential to be used in the area of time sensitive learning tasks such as dance or sports or rehabilitation (Serrano et al., 2021).

The pedagogical adaptivity of this research is what sets this research apart. The majority of AI-dance systems are either generation or evaluation alone (e.g. automatic grading), but we provide the feedback loop by virtue of user modeling and reinforcement learning. Moreover, this approach helps validate the claim from Kay and Kummerfeld (2019) that AI education personalization should go beyond recommending content and change instruction style, pace and correction strategy dynamically. The system’s capacity to decrease correction rate and improve pose accuracy over time implies that not only short term engagement will occur but that it can be measured according to learning progression, an area seldom investigated in creative AI.

Also, the psychological aspects of learning also come into play. However, in the experimental group, the high scores of satisfaction match the results of educational psychology where immediate feedback, goals clarity and sense of control are vital factors in learner motivation (Deci & Ryan, 2000). The system, with real time corrections and adjusted feedback based on individual performance trends, promotes a co‑agency experience between the learner and the AI, a model often missing in top down tools for learning such as video tutorials or static diagrams (Narciss, 2008).

Additionally, the flexibility of the system’s cross genre performance illustrates its usefulness in many cultural and stylistic environments. Genre based choreography in AI is a relatively new topic, therefore it is extremely important. The different dance styles each have their own particular rhythm structure, spatial dynamics and expressive grammar and care should be taken to avoid the risk of flattening the expression of culture through uniform output. Particularly promising is the potential for cross cultural and inclusive uses of musical models, a topic of increasing concern in digital arts (Lepri et al., 2018); the model performs well across folk, jazz, classical and hip hop genres. To avoid algorithmic homogenization (Bown & McCormack, 2010), however, further training on underrepresented dance forms—such as Indigenous, Afro-diasporic or ritualistic dances—would be necessary.

An important but perhaps unexpected dimension of this study is related to the accessibility and democratization of dance education. Typically such training is expensive, requires physical nearness to studios and certain social privilege (Barr & Oliver, 2016). Structured dance education can be available to remote, under served or physically limited individuals using AI based systems that are deployable on mobile devices. This reflects UNESCO’s (2022) agenda advocating the use of technology to reduce educational divides across geographies and among social groups.

The study, in spite of its strengths, has its limitations. While this can work under controlled lighting and uncluttered background, the performance may deteriorate in a real world environment. Furthermore, the system presents mainly visual feedback which may not meet the needs of all learners and specifically, those who learn better with auditory, haptic or verbal representations. This gap could be addressed with studies in Embodied Interaction and haptic learning (Minamizawa et al., 2016) showing that multimodal feedback systems, including but not limited to, wearable tactile or spatial audio could be useful towards the goal.

Data ethics and privacy is also a concern. The video based learning systems inherently capture the skeletal joint position and movement profiles which makes them a source of biometric data. Though processed in an anonymized way, scaling the broader adoption would require robust data governance policies to protect users’ consent, encrypted data and against misuse (Floridi et al., 2018). In addition, AI generated choreography is ownership of creative grey areas. In the case of a user co-creating a sequence of dances with the help of AI, the question of authorship and rights (especially in the commercial context) is unresolved and requires further research (Ramirez‐Amaro et al., 2017).

Conclusively, these findings confirm the viability of AI not only as a generator of dance, but as a collaborative learning agent that can enhance individual learning pathways. This joins a myriad of other calls for human centric AI in the arts—AI that augments, rather than replaces human creativity, learning and expression (Takayama et al., 2019). Larger sample sizes, diverse demographic groups and longer timelines would require further studies into the longer term educational value of these systems. The system can be further tuned and integrated with biometric sensors, voice interfaces, as well as emotional tracking.

References

Bazarewsky, V., Kartynnik, Y., Vakonov, A., Tkachenka, A., & Grundmann, M. (2020). BlazePose: On-device real-time body pose tracking. arXiv preprint arXiv:2006.10204.
Briot, J. P., Hadjeres, G., & Pachet, F. D. (2019). Deep Learning Techniques for Music Generation. Springer.
Cao, Z., Hidalgo, G., Simon, T., Wei, S. E., & Sheikh, Y. (2019). OpenPose: Realtime multi-person 2D pose estimation using part affinity fields. IEEE Transactions on Pattern Analysis and Machine Intelligence, 43(1), 172–186. https://doi.org/10.1109/TPAMI.2019.2929257
Chan, S., Karmakharm, T., & Ding, X. (2021). AI-assisted creative systems: An overview of trends and future challenges in human-AI co-creation. ACM Transactions on Multimedia Computing, Communications, and Applications, 17(2), 1–25.
Elgammal, A., Liu, B., Elhoseiny, M., & Mazzone, M. (2017). CAN: Creative adversarial networks, generating “art” by learning about styles and deviating from style norms. arXiv preprint arXiv:1706.07068.
Huang, Y., Chiu, C., & Wu, Y. (2020). Genre-aware choreography generation using music-to-motion transformers. In Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops (ICCVW), 2020.
Kamal, M. M., Shafiq, A., & Kakria, P. (2021). Investigating the use of digital learning platforms during COVID-19 lockdown: A study of Pakistani students. Education and Information Technologies, 26, 6619–6637. https://doi.org/10.1007/s10639-021-10621-7
Lee, Y., Kim, J., Kim, G., & Lee, K. (2019). Listen to dance: Music-driven choreography generation using auto-regressive encoder-decoder network. arXiv preprint arXiv:1906.10355.
McCormack, J., Gifford, T., & Hutchings, P. (2019). Autonomy, authenticity, authorship and intention in computer generated art. Proceedings of the 10th International Conference on Computational Creativity (ICCC).
Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A. A., et al. (2015). Human-level control through deep reinforcement learning. Nature, 518(7540), 529–533. https://doi.org/10.1038/nature14236
Stevens, C., Leach, J., & Su, C. (2013). Thinking, feeling, and moving: The choreographic process of creating a contemporary dance. Dance Research Journal, 45(2), 32–51. https://doi.org/10.1017/S0149767713000342
Tang, Y., Zhang, M., & Zhao, X. (2021). DanceNet: Music-to-dance motion generation with bidirectional recurrent neural network. IEEE Access, 9, 25615–25626. https://doi.org/10.1109/ACCESS.2021.3057392
VanLehn, K. (2011). The relative effectiveness of human tutoring, intelligent tutoring systems, and other tutoring systems. Educational Psychologist, 46(4), 197–221. https://doi.org/10.1080/00461520.2011.611369
Woolf, B. P. (2010). Building Intelligent Interactive Tutors: Student-Centered Strategies for Revolutionizing E-learning. Morgan Kaufmann.
Yuan, Y., & Kitani, K. M. (2020). DLow: Diversifying latent flows for diverse human motion prediction. In Proceedings of the European Conference on Computer Vision (ECCV), 2020.
Alemi, A., Sereshki, A. R., & Pishkenari, H. N. (2017). Music-to-dance motion synthesis using deep convolutional neural networks. Multimedia Tools and Applications, 76(24), 25427–25451. https://doi.org/10.1007/s11042-017-4680-6
Brusilovsky, P., & Millán, E. (2007). User models for adaptive hypermedia and adaptive educational systems. In The Adaptive Web (pp. 3–53). Springer. https://doi.org/10.1007/978-3-540-72079-9_1
Cheng, J., Wei, Y., & Liu, M. (2021). Reinforcement-based posture feedback system for martial arts training using MediaPipe and deep user modeling. Sensors, 21(14), 4567. https://doi.org/10.3390/s21144567
Dils, A., & Albright, A. (2020). Dancing with algorithms: Cultural bias in AI-generated choreography. Dance Research Journal, 52(3), 1–13. https://doi.org/10.1017/S014976772000009X
Fan, L., Su, Y., Zhang, X., & Wu, Y. (2022). Cross-modal generation of dance choreography from music using sequence-to-sequence neural networks. Neural Networks, 150, 42–56. https://doi.org/10.1016/j.neunet.2022.03.011
Goto, M., & Muraoka, Y. (1999). Real-time beat tracking for drumless audio signals. In Proceedings of the IEEE International Workshop on Multimedia Signal Processing (pp. 171–176). https://doi.org/10.1109/MMSP.1999.793888
Huang, C. M., Liu, X., & Zhu, J. (2019). Smart Tai Chi tutor: Motion-aware intelligent teaching agent for physical learning. International Journal of Human–Computer Interaction, 35(11), 1043–1056. https://doi.org/10.1080/10447318.2018.1503695
Hutchinson, A. (1977). Labanotation: The system of analyzing and recording movement. Theatre Arts Books.
Kim, S., Park, J., & Kim, T. (2021). Dance choreography generation using transformer networks with musical attention. IEEE Access, 9, 93345–93358. https://doi.org/10.1109/ACCESS.2021.3092593
Li, J., Zeng, Y., & Zhou, Y. (2020). Virtual reality-based interactive dance learning system with real-time avatar guidance. Virtual Reality & Intelligent Hardware, 2(3), 209–221. https://doi.org/10.1016/j.vrih.2020.03.002
Maes, P., Leman, M., Lesaffre, M., & Moelants, D. (2014). From expressive movement to emotion in music: A computational approach. Artificial Intelligence and the Simulation of Behaviour Journal, 60(2), 101–116.
Sigrist, R., Rauter, G., Riener, R., & Wolf, P. (2013). Augmented visual, auditory, haptic, and multimodal feedback in motor learning: A review. Psychonomic Bulletin & Review, 20(1), 21–53. https://doi.org/10.3758/s13423-012-0333-8
Singh, A., Sharma, M., & Kapoor, A. (2021). A real-time yoga posture detection and feedback system using deep convolutional pose estimation. Procedia Computer Science, 192, 1574–1583. https://doi.org/10.1016/j.procs.2021.08.161
Srinivasan, P. (2020). Algorithmic choreography and the erasure of cultural nuance in classical Indian dance AI systems. Digital Culture & Society, 6(2), 89–104. https://doi.org/10.14361/dcs-2020-0206
Wang, H., & Tang, L. (2021). Ballet Tutor: Real-time ballet pose correction app using mobile deep learning frameworks. Journal of Mobile Multimedia, 17(4), 521–538. https://doi.org/10.13052/jmm1550-4646.1743
Zhang, Q., Chen, T., Wu, J., & Li, X. (2022). Affect-driven dance synthesis with conditional variational motion autoencoders. IEEE Transactions on Affective Computing, Early Access. https://doi.org/10.1109/TAFFC.2022.3150076
Zhao, H., Feng, Y., & Liang, W. (2020). Ballet performance assessment based on pose estimation and AI scoring models. Multimedia Tools and Applications, 79(17–18), 12013–12035. https://doi.org/10.1007/s11042-019-07783-9
Barr, S., & Oliver, W. (2016). Dance pedagogy for a diverse world: Culturally relevant teaching in theory, research and practice. McFarland.
Bown, O., & McCormack, J. (2010). Creative agency: A clearer goal for artificial general intelligence in the arts. Proceedings of the International Conference on Computational Creativity.
Colton, S., Pease, A., & Charnley, J. (2012). Computational creativity theory: The FACE and IDEA models. Proceedings of ICCC, 90–95.
Davis, N., Hsiao, C. P., Singh, K., & Magerko, B. (2021). Co-creative drawing with artificial intelligence. Proceedings of CHI, 1–13.
Deci, E. L., & Ryan, R. M. (2000). The “what” and “why” of goal pursuits: Human needs and the self-determination of behavior. Psychological Inquiry, 11(4), 227–268.
Floridi, L., Cowls, J., Beltrametti, M., Chatila, R., et al. (2018). AI4People—An ethical framework for a good AI society: Opportunities, risks, principles, and recommendations. Minds and Machines, 28(4), 689–707.
Herremans, D., Chuan, C. H., & Chew, E. (2017). A functional taxonomy of music generation systems. ACM Computing Surveys (CSUR), 50(5), 1–30.
Kay, J., & Kummerfeld, B. (2019). Creating personalized systems that people can scrutinize and control. International Journal of Human–Computer Interaction, 35(4–5), 383–392.
Lepri, B., Oliver, N., Letouzé, E., Pentland, A., & Vinck, P. (2018). Fair, transparent, and accountable algorithmic decision-making processes. Philosophy & Technology, 31(4), 611–627.
McDowell, J., Bailey, P., & Price, B. A. (2017). Human-AI interaction in artistic expression: Challenges and opportunities. International Journal of Human–Computer Studies, 108, 49–61.
Minamizawa, K., Kakehi, Y., Nakatani, M., Mihara, S., & Tachi, S. (2016). TECHTILE toolkit: A prototyping tool for design and education of haptic media. ACM TEI, 241–244.
Narciss, S. (2008). Feedback strategies for interactive learning tasks. In J. M. Spector et al. (Eds.), Handbook of Research on Educational Communications and Technology (pp. 125–144). Routledge.
Ramirez-Amaro, K., Beetz, M., & Cheng, G. (2017). Transferring skills to humanoid robots by extracting semantic representations from observations of human activities. Artificial Intelligence, 247, 95–118.
Serrano, J., Hernandez, J., & Arnrich, B. (2021). Real-time body posture feedback using deep learning for physical rehabilitation. Sensors, 21(1), 195.
Takayama, L., Doering, N., & Ju, W. (2019). Expressing thought: Improving robot communication with humans through expressive motion. Journal of Human-Robot Interaction, 8(2), 1–16.
(2022). Reimagining our futures together: A new social contract for education. Paris: UNESCO.
Wang, J., Peng, X., Qiao, Y., & Sun, X. (2018). Recurrent pose-attentive refinement for human pose estimation. Proceedings of CVPR, 423–432.

● Bazarewsky, V., Kartynnik, Y., Vakonov, A., Tkachenka, A., & Grundmann, M. (2020). BlazePose: On-device real-time body pose tracking. arXiv preprint arXiv:2006.10204. ● Briot, J. P., Hadjeres, G., & Pachet, F. D. (2019). Deep Learning Techniques for Music Generation. Springer.

hurleyzou@outlook.com

scie-宋元斌-Development-of-an-AI-Assisted-Dance-Choreography-and-Personalized-Teaching-System.docx