top of page

Understanding Artificial Intelligence

What the Artificial Intelligence Looks At:

The artificial intelligence that Pops uses employees 13 machine learning models to analyze the following criteria in four different categories:

Speech Delivery


Main Points


Vocal Style



Signaling Ending

Vocal Quality



Restatement of Thesis

Vocal Fillers

Central Idea/Thesis


Summary of Main Points

Eye Contact



Memorable Ending

Body Language




Audience Adaption


Recommendations Provided by the Artificial Intelligence:

The AI provides feedback on several categories, including the Overall speech, Delivery, and each organizational element: Introduction, Body, and Conclusion. To provide the presenter with the most valuable feedback that enables actionable response for the next practice session, recommendations are provided not only based upon which specific evaluation criteria need to be improved, but also the combination of those criteria. For example, if a presenter their Eye Contact and Vocal Fillers, it may be a sign they lack confidence. It is also determined which areas need the most attention and those areas are prioritized in the recommended actions/changes for improvement. As result, the user is provided with enough to know what to improve the next practice session, but not so much as to overload.

From there, three types of feedback are provided for each category depending on how many evaluation criteria need improvement within each category: 1) General Rating, 2) Visual Representation, 3) Text Recommendation.


Pose Estimation

A heatmap of the human body showing the 20+ properties representing joints of the human pose and the magnitude of the movement of each is provided.

Nose, Neck, Right Shoulder, Right Elbow, Right Wrist, Left Shoulder, Left Elbow, Left Wrist, Mid Hip, Right , Hip, Right Knee, Right Ankle, Left Hip, Left Knee, Left Ankle, Right Eye, Right Ear, Left Ear, Left Small Toe, Left Heel, Right Big Toe, Right Small Toe, Right Heel


Speech Metrics

Useful speech metrics which provides quantitative descriptors on several aspects of the presentation. Several metrics of the presenter’s speech including number of words, total presentation time, speaking rate, vocal fillers percentage (total fillers divided by total words). Presentation length is measured against the minimum and maximum time parameters entered by the user during the session creation process. The word count and talk speed are determined based upon speech to text conversion, then providing an exact count of words and the words divided by time in minutes to determine words per minute (WPM).

The vocal filler ("uhm", "so", "ya' know", etc.) feedback is one of the most difficult measures to assess. Vocal fillers are identified by also converting speech to text, then analyzing the frequency and placement of words within sentences. While the algorithms are capable of returning the exact words it determines are fillers and the frequency of each, it was decided to provide the most useful feedback, the percent of the time (frequency) of vocal fillers would be returned.


bottom of page