Captura de movimiento y reconocimiento de actividades para múltiples personas mediante un enfoque bayesiano

Marcos, A.; Pizarro, D.; Marrón, M.; Mazo, M.

doi:10.1016/j.riai.2013.03.007

Información del artículo

Resumen

Texto completo

Bibliografía

Descargar PDF

Estadísticas

Resumen

Se presenta un método general para la detección, de forma no invasiva, de la postura corporal de varias personas a partir de la información capturada por múltiples cámaras. Se sigue una filosofía basada en el entrenamiento previo de un modelo articulado y posterior seguimiento. La principal aportación es la posibilidad de detectar varias personas simultáneamente. Se utiliza un modelo articulado para definir las posturas que puede adoptar una persona. Mediante bases de datos de captura de movimiento se selecciona un conjunto de clases o actividades predefinidas. El entrenamiento reduce la complejidad del modelo articulado a partir de técnicas no lineales de reducción de dimensionalidad. Así, las diferentes actividades de una persona quedan definidas de manera compacta por un conjunto de valores de baja dimensionalidad. Posteriormente, un filtro de partículas mixto (estados discretos y continuos) es utilizado para detectar la postura y el tipo de movimiento simultáneamente. Las hipótesis resultantes, seleccionadas automáticamente a partir de la distribución de partículas, son refinadas usando un optimizador no lineal que hace uso de funciones ‘a priori’ del tipo de movimiento entrenado. La propuesta se ha evaluado con un método simple pero estándar, basado en la comparación de volúmenes cilíndricos articulados con volúmenes del cuerpo humano, extraidos automáticamente a partir de las imágenes. Se consigue una precisión cercana a trabajos del estado del arte que no tienen en cuenta a más de una persona y ofrece un marco de trabajo flexible para futuras investigaciones.

Palabras clave:

Visión artificial

Aplicaciones de seguimiento

Sistemas multidimensionales

Visión estéreo

Abstract

This work presents a general framework for tracking simultaneously the body posturas of multiple people from nonintrusive visual sensors. The method is based on a trainingthen-tracking philosophy, with the main addition of being able to handle more than just one person. We train the body postura from labelled motion capture datasets. The training process is based on popular non-linear dimensionality reduction techniques. Then, a mixed, discrete and continuous state particle filter is used to simultaneously detect the postura and the kind of motion performed by each of the human bodies. The resulting hypotheses, automatically selected from the particle distribution, are then refined using non-linear optimization methods with statistical priors. The whole framework is tested using a simple but standard method based on comparing articulated cylindrical models with SfS volumes, taken from several cameras. Our accuracy in public available datasets is near to the state-ofthe-art works that do not take into account multiple people in the problem.

Keywords:

Computer vision

Tracking applications

Multidimensional systems

Stereo vision

Texto completo

Referencias no citadas

Bartoli et al., 2010, Bernardo et al., 1992, Bottino and Laurentini, 2001, Chu et al., 2003, Doucet, 1997, Gall et al., 2010a, Gall et al., 2009, Gall et al., 2010b, Grochow et al., 2004, Isard and Blake, 1998, Lawrence, 2005, Liu et al., 2011, Marquardt, 1963, Shotton et al., 2011, Sigal et al., 2010, Urtasun, 2006, Wang et al., 2006 and Williams, 1998.

Referencias

[Bartoli et al., 2010]

A. Bartoli, D. Pizarro, M. Loog.

Stratified Generalized Procrustes Analysis.

British Machine Vision Conference, (2010),

[Bernardo et al., 1992]

J. Bernardo, J. Berger, A. Dawid, A. Smith.

Some Bayesian Numerical Analysis.

Bayesian Statistics, 4 (1992),

[Bottino and Laurentini, 2001]

A. Bottino, A. Laurentini.

A silhouette based technique for the reconstruction of human movement.

Computer Vision and Image Understanding, 83 (2001), pp. 79-95

[Chu et al., 2003]

C. Chu, O. Jenkins, M. Mataric.

Markerless kinematic model and motion capture from volume sequences, (2003),

[Doucet, 1997]

Doucet, A., 1997. Monte-Carlo methods for bayesian estimation of Hidden- Markov models. Application to Radiation Signal. Ph.D. thesis, University of Paris-Sud, France.

[Gall et al., 2010a]

J. Gall, B. Rosenhahn, T. Brox, H. Seidel.

Optimization and filtering for human motion capture.

International journal of computer vision, 87 (2010), pp. 75-92

[Gall et al., 2009]

J. Gall, C. Stoll, E. De Aguiar, C. Theobalt, B. Rosenhahn, H. Seidel.

Motion capture using joint skeleton tracking and surface estimation, (2009),

[Gall et al., 2010b]

J. Gall, A. Yao, L. Van Gool.

2d action recognition serves 3d human pose estimation.

Computer Vision–ECCV, 2010 (2010), pp. 425-438

[Grochow et al., 2004]

Grochow, K., Martin, S., Hertzmann, A., Popović, Z., 2004. Style-based inverse kinematics. In: ACM Transactions on Graphics (TOG). Vol. 23. ACM, pp. 522-531.

[Isard and Blake, 1998]

Isard, M., Blake, A., 1998. A mixed-state condensation tracker with automatic model-switching. In: Computer Vision, 1998. Sixth International Conference on. IEEE, pp. 107-112.

[Lawrence, 2005]

N. Lawrence.

Probabilistic non-linear principal component analysis with Gaussian process latent variable models.

The Journal of Machine Learning Research, 6 (2005), pp. 1783-1816

[Liu et al., 2011]

Y. Liu, C. Stoll, J. Gall, H. Seidel, C. Theobalt.

Markerless motion capture of interacting characters using multi-view image segmentation.

Computer Vision and Pattern Recognition, (2011),

[Marquardt, 1963]

D. Marquardt.

An algorithm for least-squares estimation of nonlinear parameters.

Journal of the society for Industrial and Applied Mathematics, 11 (1963), pp. 431-441

[Shotton et al., 2011]

Shotton, J., Fitzgibbon, A., Cook, M., Sharp, T., Finocchio, M., Moore, R., Kipman, A., Blake, A., 2011. Real-time human pose recognition in parts from single depth images. In: In CVPR.

[Sigal et al., 2010]

L. Sigal, A. Balan, M. Black.

Humaneva: Synchronized video and motion capture dataset and baseline algorithm for evaluation of articulated human motion.

International Journal of Computer Vision, 87 (2010), pp. 4-27

[Urtasun, 2006]

Urtasun, R., 2006. Motion Models for Robust 3D Human Body Tracking. Ph.D. thesis, É COLE POLYTECHNIQUE FÉ DÉ RALE DE LAUSANNE.

[Wang et al., 2006]

J. Wang, D. Fleet, A. Hertzmann.

Gaussian process dynamical models.

Advances in neural information processing systems, 18 (2006), pp. 1441

[Williams, 1998]

C. Williams.