Vitaliy Liptchinsky

Mountain View, California, United States

1K followers 500+ connections

View mutual connections with Vitaliy

Welcome back

By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.

New to LinkedIn? Join now

or

By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.

New to LinkedIn? Join now

Join to view profile

General Motors

Technische Universität Wien

Activity

We trained a model and we wrote a paper about it. Have fun y’all! https://lnkd.in/g-8fqmVh https://lnkd.in/gahr7Cfn

We trained a model and we wrote a paper about it. Have fun y’all! https://lnkd.in/g-8fqmVh https://lnkd.in/gahr7Cfn

Liked by Vitaliy Liptchinsky
Nemotron-4-340B is released today! * Base, Instruct, Reward models * Permissive license * Great for Synthetic Data Generation * Designed to help…

Nemotron-4-340B is released today! * Base, Instruct, Reward models * Permissive license * Great for Synthetic Data Generation * Designed to help…

Liked by Vitaliy Liptchinsky
Amazing event this year with Cisco + Splunk at .conf24. It was great to lead so many customer sessions on how to achieve resilience with AI…

Amazing event this year with Cisco + Splunk at .conf24. It was great to lead so many customer sessions on how to achieve resilience with AI…

Liked by Vitaliy Liptchinsky

Join now to see all activity

Experience & Education

General Motors

*****

**** ** *********** (******)
****

**** ** ** ******** ********
********** *********ä* ****

**.*. *********** *******

2010 - 2016
**** ****** ******** ********** ** ****

****** ******* *********** *** ***********

2001 - 2006

View Vitaliy’s full experience

See their title, tenure and more.

By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.

Publications

Self-supervised pretraining of visual features in the wild

March 2, 2021
Recently, self-supervised learning methods like MoCo, SimCLR, BYOL and SwAV have reduced the gap with supervised methods. These results have been achieved in a control environment, that is the highly curated ImageNet dataset. However, the premise of self-supervised learning is that it can learn from any random image and from any unbounded dataset. In this work, we explore if self-supervision lives to its expectation by training large models on random, uncurated images with no supervision. Our…

Recently, self-supervised learning methods like MoCo, SimCLR, BYOL and SwAV have reduced the gap with supervised methods. These results have been achieved in a control environment, that is the highly curated ImageNet dataset. However, the premise of self-supervised learning is that it can learn from any random image and from any unbounded dataset. In this work, we explore if self-supervision lives to its expectation by training large models on random, uncurated images with no supervision. Our final SElf-supERvised (SEER) model, a RegNetY with 1.3B parameters trained on 1B random images with 512 GPUs achieves 84.2% top-1 accuracy, surpassing the best self-supervised pretrained model by 1% and confirming that self-supervised learning works in a real world setting. Interestingly, we also observe that self-supervised models are good few-shot learners achieving 77.9% top-1 with access to only 10% of ImageNet. Code: https://github.com/facebookresearch/vissl

Other authors
See publication
Beyond english-centric multilingual machine translation

The Journal of Machine Learning Research November 1, 2020
Existing work in translation demonstrated the potential of massively multilingual machine
translation by training a single model able to translate between any pair of languages.
However, much of this work is English-Centric, training only on data which was translated
from or to English. While this is supported by large sources of training data, it does not reflect
translation needs worldwide. In this work, we create a true Many-to-Many multilingual
translation model that can…

Existing work in translation demonstrated the potential of massively multilingual machine
translation by training a single model able to translate between any pair of languages.
However, much of this work is English-Centric, training only on data which was translated
from or to English. While this is supported by large sources of training data, it does not reflect
translation needs worldwide. In this work, we create a true Many-to-Many multilingual
translation model that can translate directly between any pair of 100 languages. We build
and open-source a training data set that covers thousands of language directions with parallel
data, created through large-scale mining. Then, we explore how to effectively increase model
capacity through a combination of dense scaling and language-specific sparse parameters to
create high quality models. Our focus on non-English-Centric models brings gains of more
than 10 BLEU when directly translating between non-English directions while performing
competitively to the best single systems from the Workshop on Machine Translation (WMT).
We open-source our scripts so that others may reproduce the data, evaluation, and final m2m100 model: https://github.com/pytorch/fairseq/tree/master/examples/m2m_100.

Other authors
See publication
End-to-end asr: from supervised to semi-supervised learning with modern architectures

July 16, 2020
We study pseudo-labeling for the semi-supervised training of ResNet, Time-Depth Separable ConvNets, and Transformers for speech recognition, with either CTC or Seq2Seq loss functions. We perform experiments on the standard LibriSpeech dataset, and leverage additional unlabeled data from LibriVox through pseudo-labeling. We show that while Transformer-based acoustic models have superior performance with the supervised dataset alone, semi-supervision improves all models across architectures and…

We study pseudo-labeling for the semi-supervised training of ResNet, Time-Depth Separable ConvNets, and Transformers for speech recognition, with either CTC or Seq2Seq loss functions. We perform experiments on the standard LibriSpeech dataset, and leverage additional unlabeled data from LibriVox through pseudo-labeling. We show that while Transformer-based acoustic models have superior performance with the supervised dataset alone, semi-supervision improves all models across architectures and loss functions and bridges much of the performance gaps between them. In doing so, we reach a new state-of-the-art for end-to-end acoustic models decoded with an external language model in the standard supervised learning setting, and a new absolute state-of-the-art with semi-supervised training. Finally, we study the effect of leveraging different amounts of unlabeled audio, propose several ways of evaluating the characteristics of unlabeled audio which improve acoustic modeling, and show that acoustic models trained with more audio rely less on external language models.

Other authors
See publication
LIBRI-LIGHT: A BENCHMARK FOR ASR WITH LIMITED OR NO SUPERVISION

ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) December 17, 2019
We introduce a new collection of spoken English audio suitable for training speech recognition systems under limited or no supervision. It is derived from open-source audio books from the LibriVox project. It contains over 60K hours of audio, which is, to our knowledge, the largest freely-available corpus of speech. The audio has been segmented using voice activity detection and is tagged with SNR, speaker ID and
genre descriptions. Additionally, we provide baseline systems and evaluation…

We introduce a new collection of spoken English audio suitable for training speech recognition systems under limited or no supervision. It is derived from open-source audio books from the LibriVox project. It contains over 60K hours of audio, which is, to our knowledge, the largest freely-available corpus of speech. The audio has been segmented using voice activity detection and is tagged with SNR, speaker ID and
genre descriptions. Additionally, we provide baseline systems and evaluation metrics working under three settings: (1) the zero resource/unsupervised setting (ABX), (2) the semisupervised setting (PER, CER) and (3) the distant supervision setting (WER). Settings (2) and (3) use limited textual resources (10 minutes to 10 hours) aligned with the speech. Setting (3) uses large amounts of unaligned text. They are
evaluated on the standard LibriSpeech dev and test sets for comparison with the supervised state-of-the-art

Other authors
See publication
Wav2letter++: A fast open-source speech recognition system

ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) May 12, 2019
This paper introduces wav2letter++, a fast open-source deep learning speech recognition framework. wav2letter++ is written entirely in C++, and uses the ArrayFire tensor library for maximum efficiency. We explain the architecture and design of the wav2letter++ system and compare it to other major open-source speech recognition systems. In some cases wav2letter++ is more than 2× faster than other optimized frameworks for training end-to-end neural networks for speech recognition. We also show…

This paper introduces wav2letter++, a fast open-source deep learning speech recognition framework. wav2letter++ is written entirely in C++, and uses the ArrayFire tensor library for maximum efficiency. We explain the architecture and design of the wav2letter++ system and compare it to other major open-source speech recognition systems. In some cases wav2letter++ is more than 2× faster than other optimized frameworks for training end-to-end neural networks for speech recognition. We also show that wav2letter++ training times scale linearly to 64 GPUs, the most we tested, for models with 100 million parameters. High-performance frameworks enable fast iteration, which is often a crucial factor in successful research and model tuning on new datasets and tasks.

Other authors
See publication
Letter-based speech recognition with gated convnets

February 16, 2019
In the recent literature, "end-to-end" speech systems often refer to letter-based acoustic models trained in a sequence-to-sequence manner, either via a recurrent model or via a structured output learning approach (such as CTC). In contrast to traditional phone (or senone)-based approaches, these "end-to-end'' approaches alleviate the need of word pronunciation modeling, and do not require a "forced alignment" step at training time. Phone-based approaches remain however state of the art on…

In the recent literature, "end-to-end" speech systems often refer to letter-based acoustic models trained in a sequence-to-sequence manner, either via a recurrent model or via a structured output learning approach (such as CTC). In contrast to traditional phone (or senone)-based approaches, these "end-to-end'' approaches alleviate the need of word pronunciation modeling, and do not require a "forced alignment" step at training time. Phone-based approaches remain however state of the art on classical benchmarks. In this paper, we propose a letter-based speech recognition system, leveraging a ConvNet acoustic model. Key ingredients of the ConvNet are Gated Linear Units and high dropout. The ConvNet is trained to map audio sequences to their corresponding letter transcriptions, either via a classical CTC approach, or via a recent variant called ASG. Coupled with a simple decoder at inference time, our system matches the best existing letter-based systems on WSJ (in word error rate), and shows near state of the art performance on LibriSpeech.

Other authors
See publication
Expressive Languages for Selecting Groups from Graph-Structured Data

ACM May 17, 2013
Many query languages for graph-structured data are based
on regular path expressions, which describe relations among
pairs of nodes. We propose an extension that allows to re-
trieve groups of nodes based on group structural character-
istics and relations to other nodes or groups. It allows to ex-
press group selection queries in a concise and natural style,
and can be integrated into any query language based on
regular path queries. We present an e?cient algorithm…

Many query languages for graph-structured data are based
on regular path expressions, which describe relations among
pairs of nodes. We propose an extension that allows to re-
trieve groups of nodes based on group structural character-
istics and relations to other nodes or groups. It allows to ex-
press group selection queries in a concise and natural style,
and can be integrated into any query language based on
regular path queries. We present an e?cient algorithm for
evaluating group queries in polynomial time from an input
data graph. Evaluations using real-world social networks
demonstrate the practical feasibility of our approach.

Other authors
See publication
A Novel Approach to Modeling Context-Aware and Social Collaboration Processes

Springer Verlag June 9, 2012
Other authors
See publication
Statelets: Coordination of Social Collaboration Processes

Springer Verlag April 20, 2012
Coordination language

Other authors
See publication

Languages

English

Native or bilingual proficiency
German

Limited working proficiency
Russian

Native or bilingual proficiency
Ukrainian

Native or bilingual proficiency

Recommendations received

14 people have recommended Vitaliy

Join now to view

More activity by Vitaliy

🚨BREAKING NEWS🚨 Paxos International announces Lift Dollar ($USDL), a new stablecoin with daily yield and access to the dollar. #Paxos #USDL…

🚨BREAKING NEWS🚨 Paxos International announces Lift Dollar ($USDL), a new stablecoin with daily yield and access to the dollar. #Paxos #USDL…

Liked by Vitaliy Liptchinsky
The Infrastructure organization at Meta is hiring software engineers that specialize in GPU performance. We are looking for GPU compiler engineers…

The Infrastructure organization at Meta is hiring software engineers that specialize in GPU performance. We are looking for GPU compiler engineers…

Liked by Vitaliy Liptchinsky
Ilya Sutskever gave John Carmack this reading list of approximately 30 research papers and said, ‘If you really learn all of these, you’ll know 90%…

Ilya Sutskever gave John Carmack this reading list of approximately 30 research papers and said, ‘If you really learn all of these, you’ll know 90%…

Liked by Vitaliy Liptchinsky
The Developer Infrastructure organization at Meta is seeking software engineers for multiple roles focusing on compilers, low-level libraries, and…

The Developer Infrastructure organization at Meta is seeking software engineers for multiple roles focusing on compilers, low-level libraries, and…

Liked by Vitaliy Liptchinsky

View Vitaliy’s full profile

See who you know in common
Get introduced
Contact Vitaliy directly

Join to view full profile

Sign in

Stay updated on your professional world

By clicking Continue to join or sign in, you agree to LinkedIn’s User Agreement, Privacy Policy, and Cookie Policy.

New to LinkedIn? Join now

Other similar profiles

Explore collaborative articles

We’re unlocking community knowledge in a new way. Experts add insights directly into each article, started with the help of AI.

Explore More

Add new skills with these courses

See all courses

Vitaliy Liptchinsky

Mountain View, California, United States 1K followers 500+ connections

Activity

We trained a model and we wrote a paper about it. Have fun y’all! https://lnkd.in/g-8fqmVh https://lnkd.in/gahr7Cfn

Liked by Vitaliy Liptchinsky

Nemotron-4-340B is released today! * Base, Instruct, Reward models * Permissive license * Great for Synthetic Data Generation * Designed to help…

Liked by Vitaliy Liptchinsky

Amazing event this year with Cisco + Splunk at .conf24. It was great to lead so many customer sessions on how to achieve resilience with AI…

Liked by Vitaliy Liptchinsky

Experience & Education

General Motors

************* ******** ********, ********** *******

View Vitaliy’s full experience

See their title, tenure and more.

Publications

March 2, 2021

The Journal of Machine Learning Research November 1, 2020

July 16, 2020

ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) December 17, 2019

ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) May 12, 2019

February 16, 2019

ACM May 17, 2013

Springer Verlag June 9, 2012

Springer Verlag April 20, 2012

Languages

English

Native or bilingual proficiency

German

Limited working proficiency

Russian

Native or bilingual proficiency

Ukrainian

Native or bilingual proficiency

Recommendations received

Peter Ferak

Robert Utner

More activity by Vitaliy

🚨BREAKING NEWS🚨 Paxos International announces Lift Dollar ($USDL), a new stablecoin with daily yield and access to the dollar. #Paxos #USDL…

Liked by Vitaliy Liptchinsky

The Infrastructure organization at Meta is hiring software engineers that specialize in GPU performance. We are looking for GPU compiler engineers…

Liked by Vitaliy Liptchinsky

Ilya Sutskever gave John Carmack this reading list of approximately 30 research papers and said, ‘If you really learn all of these, you’ll know 90%…

Liked by Vitaliy Liptchinsky

The Developer Infrastructure organization at Meta is seeking software engineers for multiple roles focusing on compilers, low-level libraries, and…

Liked by Vitaliy Liptchinsky

View Vitaliy’s full profile

Sign in

Other similar profiles

Amit Butala

Ram Valliyappan

Ronak Daya

Henryk Sarat

Eric Gay

Michael Benisch

Kelly Anlas

Baris Cetinok

Alec Lovett

Dragos Maciuca

Explore collaborative articles

Add new skills with these courses

Advanced .NET: Commands, C# Enhancements, Web Apps, and Libraries

Introducing Functional Programming in C++

Software Testing Foundations: Continuous Testing and DevOps

Mountain View, California, United States

1K followers 500+ connections