AI & ML interests

None defined yet.

posted an update about 1 month ago

Post

4728

Sharing WorldForge with @abdelstark

It's an open-source Python project for evaluating and replaying robotics and world-model workflows.

The useful part is not only calling a model. WorldForge records the run, validates action shapes, translates outputs into actions, and keeps replay artifacts you can inspect later.

The current demo uses LeRobot + LeWorldModel on PushT through the official loader:

stable_worldmodel.policy.AutoCostModel("pusht/lewm")

The harness also has replay-only paths for Cosmos-Policy and GR00T-style outputs, so you can inspect the provider contract from saved artifacts without keeping a GPU server online.

Try it:

pip install worldforge-ai
uv run --extra harness worldforge-harness --flow robotics-compare

Repo: https://github.com/AbdelStark/worldforge
Docs: https://abdelstark.github.io/worldforge/

Pre-1.0, MIT, and actively looking for contributors. Good areas:
- robotics provider adapters
- replay artifacts
- eval flows
- docs & first-run demos

Good first issues: https://github.com/AbdelStark/worldforge/contribute

If you're building robot policy evals or model adapters, would love a PR — or an issue describing what's missing.

mariagrandury

authored 2 papers 4 months ago

BabyBabelLM: A Multilingual Benchmark of Developmentally Plausible Training Data

Paper • 2510.10159 • Published Oct 11, 2025 • 3

Measuring what Matters: Construct Validity in Large Language Model Benchmarks

Paper • 2511.04703 • Published Nov 3, 2025 • 8

mariagrandury

authored 4 papers 9 months ago

Adding LLMs to the psycholinguistic norming toolbox: A practical guide to getting the most out of human ratings

Paper • 2509.14405 • Published Sep 17, 2025 • 2

Psycholinguistic Word Features: a New Approach for the Evaluation of LLMs Alignment with Humans

Paper • 2506.22439 • Published May 29, 2025 • 3

Apertus: Democratizing Open and Compliant LLMs for Global Language Environments

Paper • 2509.14233 • Published Sep 17, 2025 • 20

La Leaderboard: A Large Language Model Leaderboard for Spanish Varieties and Languages of Spain and Latin America

Paper • 2507.00999 • Published Jul 1, 2025 • 1

mariagrandury

authored 2 papers about 1 year ago

Kaleidoscope: In-language Exams for Massively Multilingual Vision Evaluation

Paper • 2504.07072 • Published Apr 9, 2025 • 9

It's the same but not the same: Do LLMs distinguish Spanish varieties?

Paper • 2504.20049 • Published Apr 8, 2025

mariagrandury

authored a paper over 1 year ago

Evaluating Large Language Models with Tests of Spanish as a Foreign Language: Pass or Fail?

Paper • 2409.15334 • Published Sep 8, 2024 • 1

mariagrandury

authored a paper almost 2 years ago

The #Somos600M Project: Generating NLP resources that represent the diversity of the languages from LATAM, the Caribbean, and Spain

Paper • 2407.17479 • Published Jul 1, 2024 • 1

mrm8488

posted an update almost 2 years ago

Post

8935

🚨Exciting news for the Multilingual Synthetic Data Community!🚨

I’ve taken inspiration from the MAGPIE paper on Llama-3-8B-instruct and extended its capabilities. Here’s what’s new!

🗞 The MAGPIE paper showcased that if you use the instruction-tuned version (Llama-3-8B-instruct) to generate synthetic instructions and then fine-tune the base version (Llama-3-8B) on this dataset, you can improve even the it-tuned version

🤔 While reading a script by Sebastian Raschka, PhD, I wondered: Could these advancements be replicated in other languages? Specifically, could they benefit non-English datasets?

🎉 And the answer is YES! At least for Spanish. I've successfully adapted the techniques for Spanish, proving the model's flexibility and multilingual capabilities.

👩‍💻 To make this accessible, I created a basic script (heavily inspired by the Sebastian Raschka one) that allows you to generate similar datasets using ollama models (initially phi and llama3) automatically and upload it to the Hugging Face Hub!
[Script](https://gist.github.com/mrm8488/4650a5e3cc45523798a527a3446eb312)

🔍 Explore the datasets 📚 generated using our new script!

- [Llama-3-8B](https://huggingface.co/datasets/mrm8488/dataset_llama3_5000_samples_es_4231_filtered)
- [Phi-3-medium](https://huggingface.co/datasets/mrm8488/dataset_phi3-medium_5000_samples_es_3906_filtered)
- [Phi-3-mini](https://huggingface.co/datasets/mrm8488/dataset_phi3_5000_samples_es_3282_filtered)

Note: These datasets have basic filtering. Apply additional quality filters before using them to fine-tune large language models.

Inspiration and base script:
https://github.com/rasbt/LLMs-from-scratch/blob/main/ch07/05_dataset-generation/llama3-ollama.ipynb
https://www.linkedin.com/feed/update/urn:li:activity:7210982019751661568/

7 replies

mariagrandury

authored a paper almost 2 years ago

Spanish and LLM Benchmarks: is MMLU Lost in Translation?

Paper • 2406.17789 • Published May 28, 2024 • 2

mrm8488

posted an update about 2 years ago

Post

9695

Working on a concept GPT-2 (small) that uses KANs instead of MLPs.
The ckpt and training code will be soon on the hub.

6 replies

mrm8488

posted an update over 2 years ago

Post

Hello world! 🔥

mariagrandury

posted an update over 2 years ago

Post

✅ Ever wondered how to measure transparency in model development?

My last open-source contribution for 2023 is s Space that allows you to self-assess the transparency of your model based on the 100 indicators of the Foundation Model Transparency Index (FMTI).

The original study evaluated the developers of 10 top LLMs. Curious about how yours measures up? 👀

mariagrandury/fmti-transparency-self-assessment

Let's commit to a 2024 with greater transparency in the AI ecosystem! 🚀

7 replies

mariagrandury

posted an update over 2 years ago

Post

Holiday talk about AI taking over? Let's shift the narrative!

🌟 There is no reason to believe that just because AI systems are intelligent they will want to dominate us. Yann LeCun reminds us that AI systems won't have the same motivations as humans, we'll design them not to.

🌍 Instead of getting distracted by future existential risks, we must address AI’s more pressing risks — like emitting carbon, infringing copyrights and spreading bias. Sasha Luccioni urges us to create tools and legislation that promote transparency and diversity.

💡 Dive deeper into these perspectives:
- Yann's (@ylecun ) WIRED interview (12'): https://www.wired.com/story/artificial-intelligence-meta-yann-lecun-interview/
- Sasha's (@sasha ) TED Talk (10'): https://www.ted.com/talks/sasha_luccioni_ai_is_dangerous_but_not_for_the_reasons_you_think

P.S.: Love this new "Posts" feature, big thanks to 🤗 for letting me try it!

What are your go-to citations for AI risks? 👇