“Vitaliy can understand, improve, and design within complex systems and make life easier for the developers working with him. He has great technical knowledge and combines it with a good feel for his co-workers. Great person to work with and learn from!”
Vitaliy Liptchinsky
Mountain View, California, United States
1K followers
500+ connections
Activity
-
We trained a model and we wrote a paper about it. Have fun y’all! https://lnkd.in/g-8fqmVh https://lnkd.in/gahr7Cfn
We trained a model and we wrote a paper about it. Have fun y’all! https://lnkd.in/g-8fqmVh https://lnkd.in/gahr7Cfn
Liked by Vitaliy Liptchinsky
-
Nemotron-4-340B is released today! * Base, Instruct, Reward models * Permissive license * Great for Synthetic Data Generation * Designed to help…
Nemotron-4-340B is released today! * Base, Instruct, Reward models * Permissive license * Great for Synthetic Data Generation * Designed to help…
Liked by Vitaliy Liptchinsky
-
Amazing event this year with Cisco + Splunk at .conf24. It was great to lead so many customer sessions on how to achieve resilience with AI…
Amazing event this year with Cisco + Splunk at .conf24. It was great to lead so many customer sessions on how to achieve resilience with AI…
Liked by Vitaliy Liptchinsky
Experience & Education
Publications
-
Self-supervised pretraining of visual features in the wild
Recently, self-supervised learning methods like MoCo, SimCLR, BYOL and SwAV have reduced the gap with supervised methods. These results have been achieved in a control environment, that is the highly curated ImageNet dataset. However, the premise of self-supervised learning is that it can learn from any random image and from any unbounded dataset. In this work, we explore if self-supervision lives to its expectation by training large models on random, uncurated images with no supervision. Our…
Recently, self-supervised learning methods like MoCo, SimCLR, BYOL and SwAV have reduced the gap with supervised methods. These results have been achieved in a control environment, that is the highly curated ImageNet dataset. However, the premise of self-supervised learning is that it can learn from any random image and from any unbounded dataset. In this work, we explore if self-supervision lives to its expectation by training large models on random, uncurated images with no supervision. Our final SElf-supERvised (SEER) model, a RegNetY with 1.3B parameters trained on 1B random images with 512 GPUs achieves 84.2% top-1 accuracy, surpassing the best self-supervised pretrained model by 1% and confirming that self-supervised learning works in a real world setting. Interestingly, we also observe that self-supervised models are good few-shot learners achieving 77.9% top-1 with access to only 10% of ImageNet. Code: https://github.com/facebookresearch/vissl
Other authorsSee publication -
Beyond english-centric multilingual machine translation
The Journal of Machine Learning Research
Existing work in translation demonstrated the potential of massively multilingual machine
translation by training a single model able to translate between any pair of languages.
However, much of this work is English-Centric, training only on data which was translated
from or to English. While this is supported by large sources of training data, it does not reflect
translation needs worldwide. In this work, we create a true Many-to-Many multilingual
translation model that can…Existing work in translation demonstrated the potential of massively multilingual machine
translation by training a single model able to translate between any pair of languages.
However, much of this work is English-Centric, training only on data which was translated
from or to English. While this is supported by large sources of training data, it does not reflect
translation needs worldwide. In this work, we create a true Many-to-Many multilingual
translation model that can translate directly between any pair of 100 languages. We build
and open-source a training data set that covers thousands of language directions with parallel
data, created through large-scale mining. Then, we explore how to effectively increase model
capacity through a combination of dense scaling and language-specific sparse parameters to
create high quality models. Our focus on non-English-Centric models brings gains of more
than 10 BLEU when directly translating between non-English directions while performing
competitively to the best single systems from the Workshop on Machine Translation (WMT).
We open-source our scripts so that others may reproduce the data, evaluation, and final m2m100 model: https://github.com/pytorch/fairseq/tree/master/examples/m2m_100.Other authorsSee publication -
End-to-end asr: from supervised to semi-supervised learning with modern architectures
We study pseudo-labeling for the semi-supervised training of ResNet, Time-Depth Separable ConvNets, and Transformers for speech recognition, with either CTC or Seq2Seq loss functions. We perform experiments on the standard LibriSpeech dataset, and leverage additional unlabeled data from LibriVox through pseudo-labeling. We show that while Transformer-based acoustic models have superior performance with the supervised dataset alone, semi-supervision improves all models across architectures and…
We study pseudo-labeling for the semi-supervised training of ResNet, Time-Depth Separable ConvNets, and Transformers for speech recognition, with either CTC or Seq2Seq loss functions. We perform experiments on the standard LibriSpeech dataset, and leverage additional unlabeled data from LibriVox through pseudo-labeling. We show that while Transformer-based acoustic models have superior performance with the supervised dataset alone, semi-supervision improves all models across architectures and loss functions and bridges much of the performance gaps between them. In doing so, we reach a new state-of-the-art for end-to-end acoustic models decoded with an external language model in the standard supervised learning setting, and a new absolute state-of-the-art with semi-supervised training. Finally, we study the effect of leveraging different amounts of unlabeled audio, propose several ways of evaluating the characteristics of unlabeled audio which improve acoustic modeling, and show that acoustic models trained with more audio rely less on external language models.
Other authorsSee publication -
LIBRI-LIGHT: A BENCHMARK FOR ASR WITH LIMITED OR NO SUPERVISION
ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
We introduce a new collection of spoken English audio suitable for training speech recognition systems under limited or no supervision. It is derived from open-source audio books from the LibriVox project. It contains over 60K hours of audio, which is, to our knowledge, the largest freely-available corpus of speech. The audio has been segmented using voice activity detection and is tagged with SNR, speaker ID and
genre descriptions. Additionally, we provide baseline systems and evaluation…We introduce a new collection of spoken English audio suitable for training speech recognition systems under limited or no supervision. It is derived from open-source audio books from the LibriVox project. It contains over 60K hours of audio, which is, to our knowledge, the largest freely-available corpus of speech. The audio has been segmented using voice activity detection and is tagged with SNR, speaker ID and
genre descriptions. Additionally, we provide baseline systems and evaluation metrics working under three settings: (1) the zero resource/unsupervised setting (ABX), (2) the semisupervised setting (PER, CER) and (3) the distant supervision setting (WER). Settings (2) and (3) use limited textual resources (10 minutes to 10 hours) aligned with the speech. Setting (3) uses large amounts of unaligned text. They are
evaluated on the standard LibriSpeech dev and test sets for comparison with the supervised state-of-the-artOther authorsSee publication -
Wav2letter++: A fast open-source speech recognition system
ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
This paper introduces wav2letter++, a fast open-source deep learning speech recognition framework. wav2letter++ is written entirely in C++, and uses the ArrayFire tensor library for maximum efficiency. We explain the architecture and design of the wav2letter++ system and compare it to other major open-source speech recognition systems. In some cases wav2letter++ is more than 2× faster than other optimized frameworks for training end-to-end neural networks for speech recognition. We also show…
This paper introduces wav2letter++, a fast open-source deep learning speech recognition framework. wav2letter++ is written entirely in C++, and uses the ArrayFire tensor library for maximum efficiency. We explain the architecture and design of the wav2letter++ system and compare it to other major open-source speech recognition systems. In some cases wav2letter++ is more than 2× faster than other optimized frameworks for training end-to-end neural networks for speech recognition. We also show that wav2letter++ training times scale linearly to 64 GPUs, the most we tested, for models with 100 million parameters. High-performance frameworks enable fast iteration, which is often a crucial factor in successful research and model tuning on new datasets and tasks.
Other authorsSee publication -
Letter-based speech recognition with gated convnets
In the recent literature, "end-to-end" speech systems often refer to letter-based acoustic models trained in a sequence-to-sequence manner, either via a recurrent model or via a structured output learning approach (such as CTC). In contrast to traditional phone (or senone)-based approaches, these "end-to-end'' approaches alleviate the need of word pronunciation modeling, and do not require a "forced alignment" step at training time. Phone-based approaches remain however state of the art on…
In the recent literature, "end-to-end" speech systems often refer to letter-based acoustic models trained in a sequence-to-sequence manner, either via a recurrent model or via a structured output learning approach (such as CTC). In contrast to traditional phone (or senone)-based approaches, these "end-to-end'' approaches alleviate the need of word pronunciation modeling, and do not require a "forced alignment" step at training time. Phone-based approaches remain however state of the art on classical benchmarks. In this paper, we propose a letter-based speech recognition system, leveraging a ConvNet acoustic model. Key ingredients of the ConvNet are Gated Linear Units and high dropout. The ConvNet is trained to map audio sequences to their corresponding letter transcriptions, either via a classical CTC approach, or via a recent variant called ASG. Coupled with a simple decoder at inference time, our system matches the best existing letter-based systems on WSJ (in word error rate), and shows near state of the art performance on LibriSpeech.
Other authorsSee publication -
Expressive Languages for Selecting Groups from Graph-Structured Data
ACM
Many query languages for graph-structured data are based
on regular path expressions, which describe relations among
pairs of nodes. We propose an extension that allows to re-
trieve groups of nodes based on group structural character-
istics and relations to other nodes or groups. It allows to ex-
press group selection queries in a concise and natural style,
and can be integrated into any query language based on
regular path queries. We present an e?cient algorithm…Many query languages for graph-structured data are based
on regular path expressions, which describe relations among
pairs of nodes. We propose an extension that allows to re-
trieve groups of nodes based on group structural character-
istics and relations to other nodes or groups. It allows to ex-
press group selection queries in a concise and natural style,
and can be integrated into any query language based on
regular path queries. We present an e?cient algorithm for
evaluating group queries in polynomial time from an input
data graph. Evaluations using real-world social networks
demonstrate the practical feasibility of our approach.Other authorsSee publication -
Statelets: Coordination of Social Collaboration Processes
Springer Verlag
Languages
-
English
Native or bilingual proficiency
-
German
Limited working proficiency
-
Russian
Native or bilingual proficiency
-
Ukrainian
Native or bilingual proficiency
Recommendations received
14 people have recommended Vitaliy
Join now to viewMore activity by Vitaliy
-
🚨BREAKING NEWS🚨 Paxos International announces Lift Dollar ($USDL), a new stablecoin with daily yield and access to the dollar. #Paxos #USDL…
🚨BREAKING NEWS🚨 Paxos International announces Lift Dollar ($USDL), a new stablecoin with daily yield and access to the dollar. #Paxos #USDL…
Liked by Vitaliy Liptchinsky
-
The Infrastructure organization at Meta is hiring software engineers that specialize in GPU performance. We are looking for GPU compiler engineers…
The Infrastructure organization at Meta is hiring software engineers that specialize in GPU performance. We are looking for GPU compiler engineers…
Liked by Vitaliy Liptchinsky
-
Ilya Sutskever gave John Carmack this reading list of approximately 30 research papers and said, ‘If you really learn all of these, you’ll know 90%…
Ilya Sutskever gave John Carmack this reading list of approximately 30 research papers and said, ‘If you really learn all of these, you’ll know 90%…
Liked by Vitaliy Liptchinsky
-
The Developer Infrastructure organization at Meta is seeking software engineers for multiple roles focusing on compilers, low-level libraries, and…
The Developer Infrastructure organization at Meta is seeking software engineers for multiple roles focusing on compilers, low-level libraries, and…
Liked by Vitaliy Liptchinsky
Other similar profiles
Explore collaborative articles
We’re unlocking community knowledge in a new way. Experts add insights directly into each article, started with the help of AI.
Explore More