People : Primary Faculty

Daniel Diaz, Ph.D

Contact Information

Research Assistant Professor, Department of Public Health Sciences, Division of Biostatistics
Clinical Research Building, R-669
1120 NW 14th Street, Room 1054
Miami FL 33136

Since I took the first lecture of my first probability course, it was crystal clear to me that I wanted to be a probabilist. So from that point on, I focused my (Statistics) undergrad on the mathematical foundations of probability. However, I ended up doing my undergrad thesis on statistics because of the influence of Francisco J.P. Zimmermann, a Brazilian visiting professor at the Universidad Nacional de Colombia (UN) —my undergrad institution. (So, yes: In 2004, it was required of undergrads to do a thesis obtaining some novel results at UN… sadly, it is no more the case.) Specifically, I worked on experimental designs (more on this on my research interests).

As a consequence of my undergrad thesis, I was invited to the 2004 Latin American Congress on Probability and Statistics, in Punta del Este, Uruguay, to give a talk on my research. Being there, I used the moment to ask some professors and researchers where to pursue my doctoral studies and I decided to go to the Institute of Mathematics and Statistics at University of São Paulo. At USP I worked with Serguei Popov on probability; specifically, percolation and large deviations for the stable marriage of Poisson and Lebesgue with random appetites (more on this on my research interests).

Now I have a position at the Biostatistics Division of the University of Miami. Here, besides continuing with my doctoral research on disordered systems, I’ve been working with Sunil Rao using some probability tools (e.g., point processes, martingales, large deviations) to study the rates of convergence of some estimators to their parametric values on models with a high number of variables. So it is a combination of probability and statistics, it’s a promising research avenue and I have learned a lot of interesting things.

My research interests are somewhat varied:

1. Disordered systems

I’ve worked on a model called the stable marriage of Poisson and Lebesgue (Some very nice drawings about it can be found in the Alexander Holroyd’s website). Basically, what I’ve done with this model is to randomize one of its parameters —called “appetite“— to study the model properties. Based on this randomization, now I am planning to do something similar with other related models induced by allocation rules (there are some in Dan Romik’s website). While studying these methods, my interest is usually on percolation properties, large deviations and large dimensions.

Among the applications of these disordered systems, several are of biological interest. For instance, given certain disease and certain probability of its propagation, what percentage of a population will contract the disease? What is the speed to which the disease propagates among the population? These questions are of interest when the population is one of animals, persons or cells, just to mention a few. It is also useful on theoretical particle physics and ferromagnetic models. It has been used on search of extra-terrestrial intelligent life. And finally, it has also been used in propagation of orchard fires and expansion of rumors.

2. High dimensional data

Jointly with Jean Eudes Dazard and Sunil Rao, we are presently working now on a problem they have been working on. The problem originated with the PRIM algorithm, famous because of its usefulness in bump hunting (finding modes). In a recent paper, Dazard and Rao, using a mix of techniques, including mainly the PRIM algorithm and sparse principal components analysis, improved this bump hunting. Also recently, Wolfgang Polonik and Zailong Wang, published an article formalizing some aspects of the PRIM algorithm studying the rates of convergence of bump estimators to their parametric values. So part of what I am doing now, related to this model, is to mix that formalization of the algorithm by Polonik and Wang with Rao’s and Dazard’s method to see what the speed of convergence of the estimators in PRIM is (on continuous random variables and survival processes), once we apply the principal component analysis and the sparse principal component analysis to them.

Although my part is highly theoretical, this research is motivated by the need to find subclusters among clusters in genomic data dealing with cancer in such a way that it helps to improve assignment treatments. Also, it has a lot of applications on online marketing, for instance in websites like Amazon and its suggestions based on previous searches by the customer.

3. Target Search and the No Free Lunch theorems

Recently, Robert Marks and his collaborators have been working on target search. This is a work motivated by the famous “No Free Lunch” theorems, which state that, given a (small) target on a big wall, all target searches behave, on average, as a blind search (i.e., a search induced by a uniform distribution). Therefore, when the search is successful in few or relatively few steps, Marks and collaborators argue that there is an exogenous input of information on the search algorithm governing the search. Up to now, the research has been limited to fixed targets. However, an interesting and open questions arises when the targets are random. How does the search behave on these cases? For instance, if the target is a set of particles moving according to a random walk on a finite lattice (the search space), is there a difference —with respect to the fixed target— on the amount of exogenous information needed to reach the target? Of course, the result will depend on the dynamics and the random conditions we want to impose. These results have implications and applications on evolutionary algorithms.

4. Experimental designs and applications to medical biotechnology

I’ve worked on a model known as the strip-split-plot design, a variation of the famous split-plot designs. Some of those results will soon be published in two articles and I have many ideas to continue that research. This project started as an idea by Francisco J.P. Zimmermann, my undergrad thesis advisor. Zimmermann was a senior researcher at Embrapa —the Brazilian official institution in charge of agricultural research—, where there was an interest on development of three-way models for experimentation with rice. Recently, Sunil Rao suggested to apply this design perspective to medical biotechnology, which is almost virgin. Certainly, there are some instances of statistical developments applied to general aspects of nanotechnology, but very few related to medical nanotechnology. Therefore, it is promising on several aspects and I’m certainly interested.