PAC learning, Theoretical Comp. Sci and Mathematics

AI Thread Summary
The discussion centers around a decision to pursue a Master's degree with a thesis in mathematics, focusing on complexity theory and its connections to machine learning and big data. The individual expresses interest in PAC learning, a concept in machine learning that involves finding models that learn specific targets and form good hypotheses. They have a background in Theory of Computation and are preparing for graduate courses in Algebra and mathematical logic. Key points include the importance of understanding the computational complexity of machine learning and the challenges associated with sample size bounds and overfitting. The conversation highlights the need to align academic interests with current research trends and the value of engaging with professors to explore relevant topics. Suggestions include gaining practical experience with machine learning concepts, such as Naive Bayes, and pursuing courses in statistics and algorithm design to build a solid foundation. The individual is also exploring online resources to enhance their skills in data science. Overall, the focus is on developing a well-informed research direction while navigating the complexities of interdisciplinary studies.
dkotschessaa
Messages
1,063
Reaction score
763
Per another thread I've decided not to pursue a PhD in mathematics, but rather do a Masters with a Thesis option. So it's time to start finding an area and zooming in.

I am interested in complexity theory, and have been watching the whole "Big Data" scene with interest (and ignorance). I googled "Complexity and Big Data" and the result seems to be something called PAC learning.

I have some undergraduate background in Theory of Computation (including Automata and complexity) and a little bit of Combinatorics and Graph Theory. I have a career background in I.T. and some programming skills, and generally a good intuition and draw towards computer science, though I've never studied it formally, save from a purely mathematical perspective.

This coming semester I will be taking (graduate courses) Algebra II, Algebraic Automata Theory, and mathematical logic and foundations. (Which all nicely tie together)

I'm a bit away from a thesis but I don't want to wait to get started researching. I'd like it to be something mathematically satisfying, yet have some connection to something that is happening right now. Machine learning is a fascinating area that is very "now."

We have some people doing theoretical comp sci in the math department, so I have my eye on a couple of people for potential advisors and are in contact with them. So I will keep that up. In the meantime, just poking around on my own. I want to come to them with an informed idea.

So, can somebody tell me about PAC learning, what the 'questions' are in this field, or any other connections between machine learning, big data, complexity, etc.? Is this a good starting point for a topic?

Apologies if my question is uninformed or vague, but I'm at the necessarily naive first step...

Regards,

Dave K
 
  • Like
Likes Medicol
Physics news on Phys.org
I wouldn't say PAC learning is "Complexity and Big Data", I would simply say that it is an aspect of machine learning. Specifically dealing with computational learning. Thus, it's natural for you to come across this field when you combine the two words you used. I can go into this but the basic idea is that you want to find a model that learns a specific target and also forms a good hypothesis. Naturally, there exist a more mathematical way to state this, but the definition is kind of long winded and not entirely insightful. The key idea is that you need an algorithm that converges in a reasonable amount a time using using a reasonable amount of random examples that converges to a good probability for a certain hypothesis using a class of target concepts. As you go on you'll find that the hypothesis class can generate the concept class.

A common problem faced by people who use this has to do with bounds. What is a good lower bound for a sample size? Also the computational complexity of finding a decision rule. Then there are issues with overfitting and how to handle non i.i.d models.
 
Thanks Marnemath.

If you can help me sort out the timeline and where everything fits in here...

According to the ever infallible wikipedia, "An important innovation of the PAC framework is the introduction of computational complexity theory concepts to machine learning." and PAC was proposed by Les Valiant in 1984. Now, I came across a paper by Michael Kearns - actually, his PhD thesis under Les Valiant, which is called "The computational complexity of machine learning."

So I perhaps what I'm interested in is what Kearns is doing, which is a a follow up on Valiant's work?

Confirming what you said, there is also a chapter of "Advances of Intelligent Data Analysis" called "Machine Learning with Cellular Automata," which is also interesting to me, though it doesn't involve complexity. I find automata interesting as well.

Overall I'm pretty overwhelmed, despite thinking I am zooming in on something particular!

-Dave K
 
I wish I could be more help to you when it came to timelines and the most modern and to the edge research out there. The fact of the matter is that I have no intention of ever working in the academic field, can only inform you on what type of problems a certain large corporation considers important and is working on. That's about it. Nevertheless, I think instead of trying to find a field based on buzz words and what is cool sounding, you need to take a step back an ensure this field is for you.

I'm a Statistician by training. I just happened to do a masters thesis on Hidden Markov Models and other junk, which led me down the rabbit hole into machine learning which naturally pushed me towards big data. Instead of focusing on what is out there, I would focus on what the professors at your university are doing. I would then ask them how to apply your interest to their work. Big data is just a broad field with many broad fields that make it up. I wouldn't get stuck on an idea for to much. If you really want to focus on data science, then try to take classes that deal with machine learning, statistics, and algorithm designs. Once you start intertwining these fields, you'll get a better feel for what kind of work you want to do.

As for PAC, I know it because I'm well read in my field. However, I'm by no means an expert on it. However, I would encourage you to get your feet wet with something simple like learning about Naive Bayes. I wouldn't just learn the statistical theory, I would learn to implement and then test it. This exercise will teach you more about how the field works than trying to find your thesis right now.
 
MarneMath said:
I wish I could be more help to you when it came to timelines and the most modern and to the edge research out there. The fact of the matter is that I have no intention of ever working in the academic field, can only inform you on what type of problems a certain large corporation considers important and is working on. That's about it. Nevertheless, I think instead of trying to find a field based on buzz words and what is cool sounding, you need to take a step back an ensure this field is for you.

Well, I've been trying to narrow down the field for quite some time. I have broad experience in I.T. and went back for a math degree later. So what I'm trying to do now is wind my academic studies back into something I can leverage when I eventually look for work again. So, the "complexity" part is academic interest, but getting in touch with something in Data Science is more strategy.
I'm a Statistician by training. I just happened to do a masters thesis on Hidden Markov Models and other junk, which led me down the rabbit hole into machine learning which naturally pushed me towards big data. Instead of focusing on what is out there, I would focus on what the professors at your university are doing. I would then ask them how to apply your interest to their work. Big data is just a broad field with many broad fields that make it up. I wouldn't get stuck on an idea for to much. If you really want to focus on data science, then try to take classes that deal with machine learning, statistics, and algorithm designs. Once you start intertwining these fields, you'll get a better feel for what kind of work you want to do.

Unfortunately I don't have much stats background, and our math and stats departments are bitterly and politically divided. So I will have to fill in those gaps somehow. Hopefully I will be able to take some of the machine learning and stats classes later on.

I'm not necessarily stuck on PAC.. but I am trying to make sure that when I approach a professor with an idea of what I want to research, I am going in informed.

As for PAC, I know it because I'm well read in my field. However, I'm by no means an expert on it. However, I would encourage you to get your feet wet with something simple like learning about Naive Bayes. I wouldn't just learn the statistical theory, I would learn to implement and then test it. This exercise will teach you more about how the field works than trying to find your thesis right now.

In my "spare time" I am working on the Data Science track at Coursera, which involves learning the more practical side, (learning R, and stats and such), but I haven't gotten very far (because I haven't had much spare time!) Hopefully now that I have switched to a masters and won't be studying for quals I can get back to that.

Anyway, I am just shopping for now. Thanks for your input
 
After a year of thought, I decided to adjust my ratio for applying the US/EU(+UK) schools. I mostly focused on the US schools before, but things are getting complex and I found out that Europe is also a good place to study. I found some institutes that have professors with similar interests. But gaining the information is much harder than US schools (like you have to contact professors in advance etc). For your information, I have B.S. in engineering (low GPA: 3.2/4.0) in Asia - one SCI...
I graduated with a BSc in Physics in 2020. Since there were limited opportunities in my country (mostly teaching), I decided to improve my programming skills and began working in IT, first as a software engineer and later as a quality assurance engineer, where I’ve now spent about 3 years. While this career path has provided financial stability, I’ve realized that my excitement and passion aren’t really there, unlike what I felt when studying or doing research in physics. Working in IT...
Hello, I’m an undergraduate student pursuing degrees in both computer science and physics. I was wondering if anyone here has graduated with these degrees and applied to a physics graduate program. I’m curious about how graduate programs evaluated your applications. In addition, if I’m interested in doing research in quantum fields related to materials or computational physics, what kinds of undergraduate research experiences would be most valuable?
Back
Top