Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up

All HF Hub posts

Undi95 
posted an update 2 days ago
view post
Post
2103
Hello there,

New model released, my goal was to try finetune on the last Llama-3.1-8B-Instruct but not a small train, I wanted to do something useful.
One of the rare model that I didn't made for RP, or in the goal to uncensor it (but I did anyway kek).

The model was trained on 9M Claude conversations ONLY, giving him another writting style.

Undi95/Meta-Llama-3.1-8B-Claude > OG release fp32, it's the epoch 2
Undi95/Meta-Llama-3.1-8B-Claude-bf16 > Base model resharded in bf16 waiting for available quant without issues

Since it's frustrating to be censored using a local model, orthogonal activation steering was used, trying to force the model to never refuse a prompt.

Undi95/Meta-Llama-3.1-8B-Claude-68fail-3000total > Uncensored model, refuse 68 times on 3000 toxic prompt
Undi95/Meta-Llama-3.1-8B-Claude-39fail-3000total > Uncensored model, refuse 39 times on 3000 toxic prompt

It still refuse some prompt but the majority of them is uncensored. OAS can make a model more dumb or make the base perplexity go higher, so I didn't snipe for 0 refusal.

I don't do non-RP model a lot so any feedback is welcome, I would like to re-use this base for some others future project if needed.
Β·
fdaudens 
posted an update 1 day ago
view post
Post
1398
πŸš€ Introducing the Model Drops Tracker! πŸ•΅οΈβ€β™‚οΈ

Feeling overwhelmed by the AI model release frenzy? 🀯 You're not alone!

I built this simple tool to help us all keep up:
- Filter recent models from the πŸ€— Hub
- Set minimum likes threshold
- Choose how recent you want to go

Try it out and let me know what you think: fdaudens/Model-Drops-Tracker

Any features you'd like to see added?
#AIModels
  • 2 replies
Β·
singhsidhukuldeep 
posted an update 3 days ago
view post
Post
2429
Meta Researchers: How many compute hours should we use to train Llama 3.1?
Mr. Zuck: Yes! πŸ€–πŸ’ͺ

Good folks at @AIatMeta did not just release the models but also published a 92-page detailed paper πŸ“„ on their findings and technical aspects of the models and their training process!

Generally, we just gobble up these weights and forget the compute infrastructure used to train these models. πŸ–₯οΈπŸš€


Here are some interesting findings about the computing infrastructure of Llamas:

- Llama 1 and 2 models were trained on @Meta 's AI Research SuperCluster. Llama 3 was migrated to Meta’s production clusters! πŸ“Š

- That's 16,000 H100 GPUs, with each GPU featuring 700W TDP and 80GB HBM3, arranged in Meta’s Grand Teton AI server platform. πŸ–₯οΈπŸ”‹

- What about storing checkpoints? Used Tectonic, a distributed file system, for storage, with capacities reaching 240 PB and peak throughput of 7 TB/s. πŸ’ΎπŸ“ˆ

- Meta's mad lads saved each GPU’s model state, ranging from 1 MB to 4 GB per GPU, for recovery and debugging. πŸ› οΈπŸ”


If this sounds big, well, they document the humungous challenges that come with it:

- In the 54-day training period, there were 466 job interruptions. πŸ•’πŸ”„

- About 78% of unexpected interruptions were attributed to confirmed or suspected hardware issues. Mostly GPUs! πŸ’₯πŸ–₯️

- Saving all checkpoints is cool until you do it for the 300B+ parameters model. The bursty nature of checkpoint writes, essential for state-saving during training, periodically saturated the storage fabric, impacting performance. πŸ“‰πŸ’Ύ

- With all this, effective training timeβ€”measured as the time spent on useful training over the elapsed timeβ€”was higher than 90%. β±οΈπŸ“Š

I think this is the stuff that movies can be made on! 🎬🌟

Paper: https://ai.meta.com/research/publications/the-llama-3-herd-of-models/
  • 2 replies
Β·
singhsidhukuldeep 
posted an update 1 day ago
view post
Post
806
Yet another post hailing how good Meta Llama 3.1 is? πŸ€” I guess not!

While Llama 3.1 is truly impressive, especially 405B (which gives GPT-4o a run for its money! πŸ’ͺ)

I was surprised to see that on the Open LLM Leaderboard, Llama 3.1 70B was not able to dethrone the current king Qwen2-72B! πŸ‘‘

Not only that, for a few benchmarks like MATH Lvl 5, it was completely lagging behind Qwen2-72B! πŸ“‰

Also, the benchmarks are completely off compared to the official numbers from Meta! 🀯

Based on the responses, I still believe Llama 3.1 will perform better than Qwen2 on LMSYS Chatbot Arena. πŸ€– But it still lags behind on too many benchmarks! πŸƒβ€β™‚οΈ

Open LLM Leaderboard: open-llm-leaderboard/open_llm_leaderboard 🌐

Hopefully, this is just an Open LLM Leaderboard error! @open-llm-leaderboard SOS! 🚨
Β·
ehristoforu 
posted an update about 15 hours ago
view post
Post
569
😏 Hello from Project Fluently Team!

✨ Finally we can give you some details about Supple Diffusion. We worked on it for a long time and we have little left, we apologize that we had to increase the work time.

πŸ› οΈ Some technical information. The first version will be the Small version (there will also be Medium, Large, Huge, possibly Tiny), it will be based on the SD1 architecture, that is, one text encoder, U-net, VAE. Now about each component, the first is a text encoder, it will be a CLIP model (perhaps not CLIP-L-path14), CLIP was specially retrained by us in order to achieve the universality of the model in understanding completely different styles and to simplify the prompt as much as possible. Next, we did U-net, U-net in a rather complicated way, first we trained different parts (types) of data with different U-nets, then we carried out merging using different methods, then we trained DPO and SPO using methods, and then we looked at the remaining shortcomings and further trained model, details will come later. We left VAE the same as in SD1 architecture.

πŸ™Œ Compatibility. Another goal of the Supple model series is full compatibility with Auto1111 and ComfyUI already at the release stage, the model is fully supported by these interfaces and the diffusers library and does not require adaptation, your usual Sampling methods are also compatible, such as DPM++ 2M Karras, DPM++ SDE and others.

🧐 Today, without demo images (there wasn’t much time), final work is underway on the model and we are already preparing to develop the Medium version, the release of the Small version will most likely be in mid-August or earlier.

😻 Feel free to ask your questions in the comments below the post, we will be happy to answer them, have a nice day!
merve 
posted an update about 24 hours ago
view post
Post
961
We have recently merged Video-LLaVA to transformers! πŸ€—πŸŽžοΈ
What makes this model different?

Demo: llava-hf/video-llava
Model: LanguageBind/Video-LLaVA-7B-hf

Compared to other models that take image and video input and either project them separately or downsampling video and projecting selected frames, Video-LLaVA is converting images and videos to unified representation and project them using a shared projection layer.

It uses Vicuna 1.5 as the language model and LanguageBind's own encoders that's based on OpenCLIP, these encoders project the modalities to an unified representation before passing to projection layer.


I feel like one of the coolest features of this model is the joint understanding which is also introduced recently with many models

It's a relatively older model but ahead of it's time and works very well! Which means, e.g. you can pass model an image of a cat and a video of a cat and ask questions like whether the cat in the image exists in video or not 🀩
davidberenstein1957 
posted an update 1 day ago
view post
Post
571
The Meta Llama-3.1 model series can be used for distilling and fine-tuning but this requires annotated preference data so I created a Human Feedback Collector based on Gradio that directly logs data to the Hugging Face Hub.

- Model meta-llama/Meta-Llama-3.1-8B-Instruct
- Data SFT, KTO and DPO data
- Runs on free Zero GPUs in Hugging Face Spaces
- Might need some human curation in Argilla
- Or provide some AI feedback with distilabel

https://huggingface.co/collections/davidberenstein1957/chatinterface-llm-human-feedback-collectors-66a22859c9e703d2af7500c1
fdaudens 
posted an update 3 days ago
view post
Post
1829
I just had a masterclass in open-source collaboration with the release of Llama 3.1 πŸ¦™πŸ€—

Meta dropped Llama 3.1, and seeing firsthand the Hugging Face team working to integrate it is nothing short of impressive. Their swift integration, comprehensive documentation, and innovative tools showcase the power of open-source teamwork.

For the curious minds:

πŸ“Š Check out independent evaluations: open-llm-leaderboard/open_llm_leaderboard

🧠 Deep dive into the tech: https://huggingface.co/blog/llama31

πŸ‘¨β€πŸ³ Try different recipes (including running 8B on free Colab!): https://github.com/huggingface/huggingface-llama-recipes

πŸ“ˆ Visualize open vs. closed LLM progress: andrewrreed/closed-vs-open-arena-elo

πŸ€– Generate synthetic data with distilabel, thanks to the new license allowing the use of outputs to train other LLMs https://huggingface.co/blog/llama31#synthetic-data-generation-with-distilabel

πŸ’‘ Pro tip: Experience the 405B version for free on HuggingChat, now with tool-calling capabilities! https://huggingface.co/chat/

#OpenSourceAI #AIInnovation
  • 1 reply
Β·
Jaward 
posted an update 1 day ago
view post
Post
917
Super Exciting New Paper By MetaπŸ€–πŸ§ πŸš€

Discrete Flow Matching:
Introduces a new framework/algorithm for generating text/code without having to predict auto-regressively or one β€œword” at a time as traditional GPT models do. It generates all parts of the text/code at once.

The algorithm does this by slowly transforming random noise (source) into meaningful text (data). It learns how to transform samples along a path created between source and target using a "probability velocity" that describes how probabilities change over time. During generation, DFM starts with a random sample and iteratively updates it using this learned velocity, gradually transforming it into a sample from the target distribution. This allows for non-autoregressive generation.

They were able to scale models of up to 1.7B parameters achieving impressive scores on HumanEval and MBPP for coding, significantly closing the gap between autoregressive models and discrete flow models.

Though in its infancy, it sure does hold a promising future as leading research scientists argue non-autoregressive methods yield better reasoning.
as-cle-bert 
posted an update 2 days ago
view post
Post
1946
Hi HF community!πŸ€—
Hope y'all are as excited as me for the release of Llama 3.1! πŸ¦™
Following the release, I built a space exploiting HF Inference API, thanks to a recipe you can find in this awesome GitHub repo (https://github.com/huggingface/huggingface-llama-recipes/): you can now run Llama-3.1-405B customizing its system instructions and other parameters, for free! πŸ˜‡
Follow this link: as-cle-bert/Llama-3.1-405B-FP8 and let the fun begin!πŸ•
  • 1 reply
Β·