Projecting New Points onto a Mediapipe Face Mesh

 


Alright, so let me start by saying: there won't be any code shared this time. While that might be disappointing, I'm not entirely sure to what extent I'm allowed to write and share code on this topic, due to non-compete clauses I may have signed while working with my previous employer, and I'd rather not risk crossing any lines. I will, however, talk about a really cool problem that I was able to solve, and lay out the high level approach that I took to solving it.


If you're familiar with Google's Mediapipe, you may know that it provides a powerful framework for detecting vertices on the surface of a subject's face. One limitation of it, however, is that the surface of the mesh produced by Mediapipe stops about halfway up the forehead. For what I was working on at the time, that wasn't enough for us; it was important that we generate vertices that extend to the top of the forehead as well. And after some amount of problem solving and engineering, I was able to develop an approach that produced the following results (Mediapipe's default output in blue, with new points in red.)



Now, to my knowledge, Mediapipe uses pretty cutting-edge machine learning techniques to produce its output. We really didn't want to deal with labeling and training new data in order to get the additional points we needed, so I was given the task of coming up with a quicker solution. Let's talk about what I did. 

It started with downloading a stock 3D model of a human head.



What I did first, was rotate, scale, and translate this model such that it was aligned along the same axes as a mesh produced by mediapipe, and in roughly the same location. I then opened it in Blender so that I could see the indices of its vertices. I mapped a set of vertices on the lower forehead region to a set of points on the mediapipe mesh representing the same area. The idea was that although the shape of a head differs wildly between individuals, the patch of skin representing the lower forehead in particular is probably not wildly different.

Having two sets of points that map to one another, I was able to utilize Procrustes analysis to further adjust the set of points to match the mediapipe mesh of the individual in the frame. While doing this, I made sure to consider the X and Y-scaling separately.

Here's where the cool part happens: In performing this analysis of the points, we generate a transformation matrix that can be used to take one set to the other. We apply this same transformation matrix to a second set of points on the surface of the 3D model, which are handpicked, and are representative of points on the upper forehead. 

In other words what I did was take the 3D model, force its shape to fit the mediapipe mesh, and then treat points that fall off of the mediapipe mesh as though they didn't. 

It's important to note that there is no geometric or anatomical grounding to how these points are projected -- the upper forehead is not actually detected, so in the case of a very uncommonly shaped skull, the process kind of falls apart. However... it worked well for hundreds of subjects. It did what we needed it do, and I feel it was kind of clever and worth sharing.

Comments

Popular posts from this blog

Connecting AI to Google Places

Solving Sudoku with Computer Vision