Datasets¶
ExactStep provides synthetic long-range datasets:
generate_long_copy_dataset: copy task where targets equal inputs.generate_long_bracket_depth_dataset: bracket nesting depth with distractors.generate_long_reverse_dataset: reverse the token sequence along time.generate_running_sum_dataset: running sum of digit tokens (optionally modulom).
Both return (X, Y, meta) padded to common length.
from transformer_instant import generate_long_copy_dataset
X, Y, meta = generate_long_copy_dataset(num_sequences=256, max_len=1024, vocab_size=64, seed=0)
Reverse dataset example:
from transformer_instant import generate_long_reverse_dataset
X, Y, meta = generate_long_reverse_dataset(num_sequences=128, max_len=256, vocab_size=32, seed=2)
Running-sum dataset example (with modulo):
from transformer_instant import generate_running_sum_dataset
X, Y, meta = generate_running_sum_dataset(num_sequences=128, max_len=256, vocab_size=10, modulo=7, seed=3)