AI & ML interests

None defined yet.

espejelomar 
posted an update about 1 month ago
view post
Post
4728
Sharing WorldForge with @abdelstark

It's an open-source Python project for evaluating and replaying robotics and world-model workflows.

The useful part is not only calling a model. WorldForge records the run, validates action shapes, translates outputs into actions, and keeps replay artifacts you can inspect later.

The current demo uses LeRobot + LeWorldModel on PushT through the official loader:

stable_worldmodel.policy.AutoCostModel("pusht/lewm")

The harness also has replay-only paths for Cosmos-Policy and GR00T-style outputs, so you can inspect the provider contract from saved artifacts without keeping a GPU server online.

Try it:

pip install worldforge-ai
uv run --extra harness worldforge-harness --flow robotics-compare

Repo: https://github.com/AbdelStark/worldforge
Docs: https://abdelstark.github.io/worldforge/

Pre-1.0, MIT, and actively looking for contributors. Good areas:
- robotics provider adapters
- replay artifacts
- eval flows
- docs & first-run demos

Good first issues: https://github.com/AbdelStark/worldforge/contribute

If you're building robot policy evals or model adapters, would love a PR — or an issue describing what's missing.
mrm8488 
posted an update almost 2 years ago
view post
Post
8935
🚨Exciting news for the Multilingual Synthetic Data Community!🚨

I’ve taken inspiration from the MAGPIE paper on Llama-3-8B-instruct and extended its capabilities. Here’s what’s new!

🗞 The MAGPIE paper showcased that if you use the instruction-tuned version (Llama-3-8B-instruct) to generate synthetic instructions and then fine-tune the base version (Llama-3-8B) on this dataset, you can improve even the it-tuned version

🤔 While reading a script by Sebastian Raschka, PhD, I wondered: Could these advancements be replicated in other languages? Specifically, could they benefit non-English datasets?

🎉 And the answer is YES! At least for Spanish. I've successfully adapted the techniques for Spanish, proving the model's flexibility and multilingual capabilities.

👩‍💻 To make this accessible, I created a basic script (heavily inspired by the Sebastian Raschka one) that allows you to generate similar datasets using ollama models (initially phi and llama3) automatically and upload it to the Hugging Face Hub!
[Script](https://gist.github.com/mrm8488/4650a5e3cc45523798a527a3446eb312)


🔍 Explore the datasets 📚 generated using our new script!

- [Llama-3-8B](https://huggingface.co/datasets/mrm8488/dataset_llama3_5000_samples_es_4231_filtered)
- [Phi-3-medium](https://huggingface.co/datasets/mrm8488/dataset_phi3-medium_5000_samples_es_3906_filtered)
- [Phi-3-mini](https://huggingface.co/datasets/mrm8488/dataset_phi3_5000_samples_es_3282_filtered)


Note: These datasets have basic filtering. Apply additional quality filters before using them to fine-tune large language models.

Inspiration and base script:
https://github.com/rasbt/LLMs-from-scratch/blob/main/ch07/05_dataset-generation/llama3-ollama.ipynb
https://www.linkedin.com/feed/update/urn:li:activity:7210982019751661568/
  • 7 replies
·
mrm8488 
posted an update about 2 years ago
view post
Post
9695
Working on a concept GPT-2 (small) that uses KANs instead of MLPs.
The ckpt and training code will be soon on the hub.
  • 6 replies
·
mrm8488 
posted an update over 2 years ago
view post
Post
Hello world! 🔥
mariagrandury 
posted an update over 2 years ago
view post
Post
✅ Ever wondered how to measure transparency in model development?

My last open-source contribution for 2023 is s Space that allows you to self-assess the transparency of your model based on the 100 indicators of the Foundation Model Transparency Index (FMTI).

The original study evaluated the developers of 10 top LLMs. Curious about how yours measures up? 👀

mariagrandury/fmti-transparency-self-assessment

Let's commit to a 2024 with greater transparency in the AI ecosystem! 🚀
  • 7 replies
·
mariagrandury 
posted an update over 2 years ago
view post
Post
Holiday talk about AI taking over? Let's shift the narrative!

🌟 There is no reason to believe that just because AI systems are intelligent they will want to dominate us. Yann LeCun reminds us that AI systems won't have the same motivations as humans, we'll design them not to.

🌍 Instead of getting distracted by future existential risks, we must address AI’s more pressing risks — like emitting carbon, infringing copyrights and spreading bias. Sasha Luccioni urges us to create tools and legislation that promote transparency and diversity.

💡 Dive deeper into these perspectives:
- Yann's (@ylecun ) WIRED interview (12'): https://www.wired.com/story/artificial-intelligence-meta-yann-lecun-interview/
- Sasha's (@sasha ) TED Talk (10'): https://www.ted.com/talks/sasha_luccioni_ai_is_dangerous_but_not_for_the_reasons_you_think

P.S.: Love this new "Posts" feature, big thanks to 🤗 for letting me try it!

What are your go-to citations for AI risks? 👇
  • 2 replies
·