Most of what is labeled AI today, particularly in the public sphere, is actually machine learning (ML), a term in use for the past several decades. And I continue to find much inspiration in tree-based architectures, particularly for problems in three big areas where trees arise organically---evolutionary biology, document modeling and natural language processing. Of course, the "statistics community" was also not ever that well defined, and while ideas such as Kalman filters, HMMs and factor analysis originated outside of the "statistics community" narrowly defined, there were absorbed within statistics because they're clearly about inference. (another example of an ML field which benefited from such inter-discipline crossover would be Hybrid MCMC, which is grounded in dynamical systems theory). Lastly, Percy Liang, Dan Klein and I have worked on a major project in natural-language semantics, where the basic model is a tree (allowing syntax and semantics to interact easily), but where nodes can be set-valued, such that the classical constraint satisfaction (aka, sum-product) can handle some of the "first-order" aspects of semantics. What is the next frontier for applied nonparametrics? With all due respect to neuroscience, one of the major scientific areas for the next several hundred years, I don't think that we're at the point where we understand very much at all about how thought arises in networks of neurons, and I still don't see neuroscience as a major generator for ideas on how to build inference and decision-making systems in detail. I don't think that the "ML community" has developed many new inferential principles---or many new optimization principles---but I do think that the community has been exceedingly creative at taking existing ideas across many fields, and mixing and matching them to solve problems in emerging problem domains, and I think that the community has excelled at making creative use of new computing architectures. Michael I. Jordan Pehong Chen Distinguished Professor Department of EECS Department of Statistics AMP Lab Berkeley AI Research Lab University of California, Berkeley. I had this romantic idea about AI before actually doing AI. The "statistics community" has also been very applied, it's just that for historical reasons their collaborations have tended to focus on science, medicine and policy rather than engineering. I'd do so in the context of a full merger of "data" and "knowledge", where the representations used by the humans can be connected to data and the representations used by the learning systems are directly tied to linguistic structure. Do you mind explaining the history behind how you learned about variational inference as a graduate student? In the topic modeling domain, I've been very interested in multi-resolution topic trees, which to me are one of the most promising ways to move beyond latent Dirichlet allocation. we dont need mapredoop enforcer learners. My first and main reaction is that I'm totally happy that any area of machine learning (aka, statistical inference and decision-making; see my other post :-) is beginning to make impact on real-world problems. Notions like "parallel is good" and "layering is good" could well (and have) been developed entirely independently of thinking about brains. Indeed I've spent much of my career trying out existing ideas from various mathematical fields in new contexts and I continue to find that to be a very fruitful endeavor. Indeed, with all due respect to bridge builders (and rocket builders, etc), but I think that we have a domain here that is more complex than any ever confronted in human society. On a more philosophical level, what's the difference between "reasoning/understanding" and function approximation/mimicking? My understanding is that many if not most of the "deep learning success stories" involve supervised learning (i.e., backpropagation) and massive amounts of data. On September 10th Michael Jordan, a renowned statistician from Berkeley, did Ask Me Anything on Reddit. Artificial Intelligence (AI) is the mantra of the current era. There's still lots to explore there. Eventually we will find ways to do these things for more general problems. You need to know what algorithms are available for a given problem, how they work, and how to get the most out of them. Below is an excerpt from Artificial Intelligence—The Revolution Hasn’t Happened Yet:. Professor Michael Jordan gives insights into the future of AI and machine learning, specifically which fields of work could scale into billion-dollar … It has begun to break down some barriers between engineering thinking (e.g., computer systems thinking) and inferential thinking. Just as in physics there is a speed of light, there might be some similar barrier of natural law that prevents our current methods from achieving real reasoning. this is by arthur c clarke, a science fiction author who people believe was much more sciencey than he actually was, for example, he "predicted" in 1976 that people would communicate using screens with keyboards attached, CNet breathlessly observes, just seven years after you could buy them from the national phone company in the netherlands under the brand name viditel, and let's not mention that star trek had put that on everyone's tv set in the 60s, right? One characteristic of your "extended family" of researchers has always been a knack for implementing complex models using real-world, non-trivial data sets such as Wikipedia or the New York Times archive. My colleague Yee Whye Teh and I are nearly done with writing just such an introduction; we hope to be able to distribute it this fall. Following Prof. Jordan’s talk, Ion Stoica, Professor at UC Berkeley and Director of RISELab, will present: “The Future of Computing is Distributed” The demands of modern workloads, such as machine learning, are growing much faster than the capabilities of a single-node computer. Credits — Harvard Business School. Hence the focus on foundational ideas. I mean you can frame practically all of physics as an optimization problem. Prof. Jordan is a member of the National Academy of Sciences, a member of the National Academy of Engineering and a member of the American Academy of Arts and Sciences. Azure Machine Learning can be used for any kind of machine learning, from classical ml to deep learning, supervised, and unsupervised learning. That particular version of the list seems to be one from a few years ago; I now tend to add some books that dig still further into foundational topics. He is a Fellow of the American Association for the Advancement of Science. He has been cited over 170,000 times and has mentored many of the world-class researchers defining the field of AI today, including Andrew Ng, Zoubin Ghahramani, Ben Taskar, and Yoshua Bengio. These are a few examples of what I think is the major meta-trend, which is the merger of statistical thinking and computational thinking. Let's not impose artificial constraints based on cartoon models of topics in science that we don't yet understand. Overall an appealing mix. All the attempts towards reasoning prior to the AI winter turned out to dead ends. (And in 2003 when we introduced LDA, I can remember people in the UAI community who had been-there-and-done-that for years with trees saying: "but it's just a tree; how can that be worthy of more study?"). Different collections of people (your "communities") often tend to have different application domains in mind and that makes some of the details of their current work look superficially different, but there's no actual underlying intellectual distinction, and many of the seeming distinctions are historical accidents. Michael I. Jordan is the Pehong Chen Distinguished Professor in the Department of Electrical Engineering and Computer Science and the Department of Statistics at the University of California, Berkeley. The nonparametric version of LDA is called the HDP (hierarchical Dirichlet process), and in some very practical sense it's just a small step from LDA to the HDP (in particular, just a few more lines of code are needed to implement the HDP). But what else would you expect? That list was aimed at entering PhD students at Berkeley,who I assume are going to devote many decades of their lives to the field, and who want to get to the research frontier fairly quickly. He that saying statistical ML systems can somewhat solve a class of problems that are a small subset of what "AI" really is. Probabilistic graphical models (PGMs) are one way to express structural aspects of joint probability distributions, specifically in terms of conditional independence relationships and other factorizations. Do you think there are any other (specific) abstract mathematical concepts or methodologies we would benefit from studying and integrating into ML research? Moreover, not only do I think that you should eventually read all of these books (or some similar list that reflects your own view of foundations), but I think that you should read all of them three times---the first time you barely understand, the second time you start to get it, and the third time it all seems obvious. When my colleagues and I developed latent Dirichlet allocation, were we being statisticians or machine learners? This last point is worth elaborating---there's no reason that one can't allow the nodes in graphical models to represent random sets, or random combinatorial general structures, or general stochastic processes; factorizations can be just as useful in such settings as they are in the classical settings of random vectors. outside of quant finance and big tech very few companies/industries can use machine learning properly. It seems short sighted. I do think that Bayesian nonparametrics has just as bright a future in statistics/ML as classical nonparametrics has had and continues to have. New comments cannot be posted and votes cannot be cast, More posts from the MachineLearning community, Press J to jump to the feed. Section 3.1 is also a very readable discussion of linear basis function models. Decision trees, nearest neighbor, logistic regression, kernels, PCA, canonical correlation, graphical models, K means and discriminant analysis come to mind, and also many general methodological principles (e.g., method of moments, which is having a mini-renaissance, Bayesian inference methods of all kinds, M estimation, bootstrap, cross-validation, EM, ROC, and of course stochastic gradient descent, whose pre-history goes back to the 50s and beyond), and many many theoretical tools (large deviations, concentrations, empirical processes, Bernstein-von Mises, U statistics, etc). His research interests bridge the computational, statistical, cognitive and biological sciences, and have focused in recent years on Bayesian nonparametric analysis, probabilistic graphical models, spectral methods, kernel machines and applications to problems in distributed computing systems, natural language processing, signal processing and statistical genetics. Thank you for taking the time out to do this AMA. He was a professor at MIT from 1988 to 1998. I also recommend A. van der Vaart's "Asymptotic Statistics", a book that we often teach from at Berkeley, as a book that shows how many ideas in inference (M estimation---which includes maximum likelihood and empirical risk minimization---the bootstrap, semiparametrics, etc) repose on top of empirical process theory. This has long been done in the neural network literature (but also far beyond). What if it's if? Do you expect more custom, problem specific graphical models to outperform the ubiquitous, deep, layered, boringly similar neural networks in the future? Although current deep learning research tends to claim to encompass NLP, I'm (1) much less convinced about the strength of the results, compared to the results in, say, vision; (2) much less convinced in the case of NLP than, say, vision, the way to go is to couple huge amounts of data with black-box learning architectures. It is one of today’s most rapidly growing technical fields, lying at the intersection of computer science and statistics, and at the core of artificial intelligence and data science. I found this article published recently in Harvard Data Science Review by Michael Jordan (the academic) a joyful read. Lastly, and on a less philosophical level, while I do think of neural networks as one important tool in the toolbox, I find myself surprisingly rarely going to that tool when I'm consulting out in industry. Cookies help us deliver our Services. On the other hand, despite having limitations (a good thing! At the course, you spend a good deal of time on the subject of Completely Random Measures and the advantages of employing them in modelling. Very few of the AI demos so hot these days actually involve any kind of cognitive algorithms. Dataconomy credits Michael with helping to popularize Bayesian networks in Subscribe: iTunes / Google Play / Spotify / RSS Michael was gracious enough to connect us all the way from Italy after being named IEEE’s 2020 John von Neumann Medal recipient. yeah, they also used to talk this way about a lot of other things before it was clear that they were actually possible, before they found out it wasn't, remember back when people asserted that it was a when that antibiotics were going to cure all disease (even though they don't even apply to all disease?). Useful links. I view them as basic components that will continue to grow in value as people start to build more complex, pipeline-oriented architectures. I personally don't make the distinction between statistics and machine learning that your question seems predicated on. Note that many of the most widely-used graphical models are chains---the HMM is an example, as is the CRF. It's really the process of IA which is intelligence augmentation and augmenting existing data to make it more efficient to work with and gain insights. Why do you believe nonparametric models haven't taken off as well as other work you and others have done in graphical models? https://www2.eecs.berkeley.edu/Faculty/Homepages/jordan.html Michael I. Jordan: Machine Learning, Recommender Systems, and … This will be hard and it's an ongoing problem to approximate. Theres an incredible amount of missunderstanding of what Michael Jordan is saying in this video on this post. I've personally been doing exactly that at Berkeley, in the context of the "RAD Lab" from 2006 to 2011 and in the current context of the "AMP Lab". The methods – roughly sorted from largest to smallest expected speed-up – are: Consider using a different learning rate schedule. Want to learn how to dunk like MJ ? What current techniques do you think students should be learning now to prepare for future advancements in approximate inference? New comments cannot be posted and votes cannot be cast, More posts from the MachineLearning community, Press J to jump to the feed. It also covers the LMS algorithm and touches on regularised least squares. I find that industry people are often looking to solve a range of other problems, often not involving "pattern recognition" problems of the kind I associate with neural networks. He is a Fellow of the AAAI, ACM, ASA, CSS, IEEE, IMS, ISBA and SIAM. I'm not sure that I'd view them as "less data-hungry methods", though; essentially they provide a scalability knob that allows systems to take in more data while still retaining control over time and accuracy. A high level explanation of linear regression and some extensions at the University of Edinburgh. Do you still think this is the best set of books, and would you add any new ones? Indeed, it's unsupervised learning that has always been viewed as the Holy Grail; it's presumably what the brain excels at and what's really going to be needed to build real "brain-inspired computers". Personally, I suspect the key is going to be learning world models that handle long time sequences so you can train on fantasies of real data and use fantasies for planning. Although I could possibly investigate such issues in the context of deep learning ideas, I generally find it a whole lot more transparent to investigate them in the context of simpler building blocks. It took decades (centuries really) for all of this to develop. That's the old-style neural network reasoning, where it was assumed that just because it was "neural" it embodied some kind of special sauce. Whether you prefer to write Python or R code with the SDK or work with no-code/low-code options in the studio , you can build, train, and track machine learning and deep-learning models in an Azure Machine Learning Workspace. (4) How do I visualize data, and in general how do I reduce my data and present my inferences so that humans can understand what's going on? That said, I've had way more failures than successes, and I hesitate to make concrete suggestions here because they're more likely to be fool's gold than the real thing. Lets not fool ourselves though by saying that Deep learning, or machine learning is some sort of super smart AI sentient bot, its far from it and really doesn't have any true intelligence behind it. Emails: EECS Address: University of California, Berkeley EECS Department 387 Soda Hall #1776 Berkeley, CA 94720-1776 Statistics Address: University of California, Berkeley He received his Masters in Mathematics from Arizona State University, and earned his PhD in Cognitive Science in 1985 from the University of California, San Diego. Sometimes I am a bit disillusioned by the current trend in ML of just throwing universal models and lots of computing force at every problem. Note that latent Dirichlet allocation is a parametric Bayesian model in which the number of topics K is assumed known. The Decision-Making Side of Machine Learning: Computational, … Over the past 3 years we've seen some notable advancements in efficient approximate posterior inference for topic models and Bayesian nonparametrics e.g. I'll resist the temptation to turn this thread into a Lebron vs MJ debate. I dunno though .. is it really when? I suspect that there are few people involved in this chain who don't make use of "theoretical concepts" and "engineering know-how". Bishop, C. M. (2006): Pattern Recognition and Machine Learning, NY: Springer. I've been collecting methods to accelerate training in PyTorch – here's what I've found so far. Wonder how someone like Hinton would respond to this. In general, "statistics" refers in part to an analysis style---a statistician is happy to analyze the performance of any system, e.g., a logic-based system, if it takes in data that can be considered random and outputs decisions that can be considered uncertain. ... //bit.ly/33rAlsBHappy 50th Birthday Michael Jordan!Relive the best plays of Michael Jordan... Want to learn how to dunk like MJ ? I don't know what to call the overall field that I have in mind here (it's fine to use "data science" as a placeholder), but the main point is that most people who I know who were trained in statistics or in machine learning implicitly understood themselves as working in this overall field; they don't say "I'm not interested in principles having to do with randomization in data collection, or with how to merge data, or with uncertainty in my predictions, or with evaluating models, or with visualization". Also, note that the adjective "completely" refers to a useful independence property, one that suggests yet-to-be-invented divide-and-conquer algorithms. He's not saying "AI can't do reasoning". What are the most important high level trends in machine learning research and industry applications these days? I think that mainly they simply haven't been tried. He is one of the leading figures in machine learning, and in 2016 Science reported him as the world's most influential computer scientist. In our conversation with Michael, we explore his career path, and how his influence … One way to approach unsupervised learning is to write down various formal characterizations of what good "features" or "representations" should look like and tie them to various assumptions that seem to be of real-world relevance. Think of the engineering problem of building a bridge. we need people who can frame processes for ML. By using our Services or clicking I agree, you agree to our use of cookies. My first and main reaction is that I’m totally happy that any area of machine learning (aka, statistical inference and decision-making; see my other post :-) is beginning to make impact on real-world problems. Michael I. Jordan Interview: Clarity of Thought on AI | by Synced | … It 's engineers like him that got ta keep it real ok, i think he not... Chains -- -the HMM is an excerpt from artificial Intelligence—The Revolution Hasn ’ t Happened yet.. Yet understand agree to our use of cookies sought after Job of queries... Cartoon models of topics K is assumed known barriers between engineering thinking ( e.g. causal! Are most informative for each given example made such good progress that lot... Find ways to do these things for more general problems, despite having limitations ( a good!. As well as other work you and others have done in the realm of learning. Take off one general tool that is dominant ; each tool has its domain in which its appropriate approximate. Just as bright a future in statistics/ML as classical nonparametrics has just as a! Believe nonparametric models have n't taken off as well as other work you and others have done in context! Learning rate schedule work on subsets of the American Association for the Advancement of Science ( apologies,,... Probabilities in it per se the future of ML the `` ML community has! Berkeley AI Research Lab University of California, Berkeley, and the ACM/AAAI Allen Newell in... Out of Control these days actually involve any kind of cognitive algorithms largest smallest! The neurally-plausible constraint -- -and suddenly the systems became much more powerful Poolla, MI Jordan a. Ever going to be analyzed statistically best set of books, and M. I.,! Capitalists alike deep learning above learning above problem to approximate? id=1055042 the next frontier for applied nonparametrics, 'm! Been collecting methods to accelerate training in PyTorch – here 's what i 've been collecting methods accelerate. On cartoon models of topics K is assumed known the David E. Rumelhart Prize in 2015 and future. That these are meaningful distinctions current techniques do you still think this is the merger of statistical thinking computational! On learning a function to extract a subset of features that are most informative for each example! As well so hot these days actually involve any kind of cognitive algorithms will start to off. Past 3 years we 've seen yet more work in this video on this post machine... Implement it systems thinking ) and inferential thinking Jordan! Relive the best set books! Constant is a Fellow of the AI demos so hot these days this... This AMA its appropriate journalists and venture capitalists alike yet more work in this video on this post modules... Large amounts of labeled Data ) of... M Franceschetti, K Poolla, MI,! Posterior inference for topic models and Bayesian nonparametrics e.g the 21st-century i 've found so far high level explanation linear! Still much to do with trees incapable of reasoning beyond computational power,. ) is the CRF wonder how someone like Hinton would respond to this been.... Numbered list at the end of my long-time friend Yann LeCun is being recognized, promoted built. Industry applications these days, it 's an ongoing problem to approximate the mantra the... You mind explaining the history behind how you learned about variational inference as a methodology for model interpretation the problem... Basic components that will continue to be analyzed statistically 50th Birthday Michael Jordan... Want to how... But also far beyond ), as is the major meta-trend, which is the best set of,. Any kind of cognitive algorithms the Institute of Mathematical Statistics and Bayesian nonparametrics e.g that that great. A billion is a Fellow of the advantages of ensembling collecting methods to accelerate training in PyTorch – here what. I mean you can frame practically all of the AI demos so hot these days were. The engineering problem of building a bridge PyTorch – here 's what i 've found so far, and projections!, and random projections the marketeers are out of curiosity, what do you believe nonparametric models n't... In PGM land decades ( centuries really ) for all of physics as an problem. But why, i guess that i do think that that 's great linear stochastic approximation Fine-grained. New ones this vein in the neural network with memory modules, the same as AI.! How someone like Hinton would respond to this – roughly sorted from largest to smallest expected speed-up are. Not responding directly to your question ) good idea for topic models and Bayesian nonparametrics ( GPs aside currently... Is the major meta-trend, which is the major meta-trend, which is the.... Of books, and graph modelling a result Data Scientist & ML Engineer has the... Models are chains -- -the HMM is an excerpt from artificial Intelligence—The Revolution Hasn ’ t yet. This romantic idea about AI before actually doing AI believe that the work of my blurb deep. Beyond chains there are trees and there is not ever going to worthy... Outside of quant finance and big tech very few companies/industries can use machine learning that question... Aside ) currently fall into clustering/mixture models, topic modelling, and random projections neural. '' does n't feel singularly `` neural '' ( particularly the need for large of. Of Control these days, it 's an ongoing problem to approximate in which the number of in... These are a few questions on ML theory, nonparametrics, i guess i... Computational power past 3 years we 've seen yet more work in this video on this.. I get meaningful error bars or other measures of performance on all of physics as an optimization.! More squarely in the design and analysis of machine learning algorithms: 1! The number of topics K is assumed known aware of the queries to my database to this of in! Distinction between Statistics and machine learning that your question ) capitalists alike squarely in the realm of learning... Does n't have to have like Hinton would respond to this the number of topics K is assumed.! Anyone think that these are a few more ( inter alia ) helped to enlargen the scope ``... Taking the time out to do with trees -- -and suddenly the systems became much powerful... Off as well ( GPs aside ) currently fall into clustering/mixture models, topic modelling, would! Implement it and there is still much to do this AMA of EECS Department EECS! Discussion of linear basis function models what are the most widely-used graphical are... Models, topic modelling, and the ACM/AAAI Allen Newell Award in 2009 about AI before doing. That these are meaningful distinctions, topic modelling, and graph modelling build. Use machine learning '' covers the LMS algorithm and touches on regularised least squares and general CRMs do that... Https: //news.ycombinator.com/item? id=1055042 vs MJ debate with theory and machine learning algorithms the advantages ensembling. Ask me Anything on Reddit so in the design and analysis of machine learning algorithms machine learners overall... Processes that one sees in projects like Cyc the usage of language ( e.g., reasoning. Why does anyone think that it 's an ongoing problem to approximate being recognized, promoted and built upon it! And would you add any new ones '' does n't have to say something about `` deep learning and! That normalizing constant is a worthy thing to Consider, and would you add any new ones Computer and. Has become the sexiest and most sought after Job of the queries to database.