Machine Learning on tabular data using LLMs

Machine Learning on tabular data using LLMs

Erwan Bigan, co-founder at TinyPred

By datacraft

Date and time

Location

3 Rue Rossini

3 Rue Rossini 75009 Paris France

About this event

  • Event lasts 2 hours

This event is reserved for our members, but we still have a few places available for those who would like to discover the club. Don't hesitate to sign up - you'll be put on a waiting list and we'll confirm your place a days before the event.

Erwan Bigan, co-founder at TinyPred

Machine Learning (ML) on tabular data mostly relies upon conventional algorithms like Logistic Regression, Random Forest, or gradient-boosted decision trees.

Although there is no golden rule to determine how much data is actually required to train robust ML models, it is generally believed that at least several hundreds or thousands of samples are needed for most practical applications, thus restricting possible use cases.

TinyPred has came up with a different approach : the use of Large Language Models for ML on tabular data. This method can require up to 5-10x fewer training data than traditional ML, for the same predictive strength.

The reason for better performance is that, beyond inferring statistical patterns just like machine learning, they benefit from their general knowledge expertise, which allows them to interpret the data.

In this workshop, we will :

- Discuss new industrial use cases enabled by this LLM-based ML approach;

- Present results obtained on published datasets, which confirm the performance advantage for open source as well as proprietary LLMs;

- Position this LLM-based ML approach in the broader context of foundational models for tabular ML: new pre-trained numerical deep learning models (e.g., TabICL, TabPFN) have also been shown to deliver superior performance, but for train sizes of several thousands of samples.

2509-Tinypred-LLMforFrugalAI

Organized by

FreeSep 30 · 6:30 PM GMT+2