pip install "unsloth[cu118] @ git+https://github.com/unslothai/unsloth.git"
pip install "unsloth[cu121] @ git+https://github.com/unslothai/unsloth.git"
We also have Google Colab notebooks for the Alpaca dataset and SlimOrca. We also have a Kaggle notebook for the LAION dataset.@triton.jit
def _rope_embedding(
Q, Q_row_stride,
cos, cos_row_stride,
sin, sin_row_stride,
seqlen, head_dim,
BACKWARD_PASS: tl.constexpr,
BLOCK_SIZE : tl.constexpr,
):
row_position = tl.program_id(0)
head_position = tl.program_id(1)
col_offsets = tl.arange(0, BLOCK_SIZE)
half_head_dim = head_dim // 2
mask = col_offsets < half_head_dim
rot_position = row_position % seqlen
Q += row_position* Q_row_stride + head_position*head_dim
cos += rot_position*cos_row_stride
sin += rot_position*sin_row_stride
Q1 = tl.load(Q + half_head_dim*0 + col_offsets,
mask = mask, other = 0)
sin1 = tl.load(sin + half_head_dim*0 + col_offsets,
mask = mask, other = 0)
cos1 = tl.load(cos + half_head_dim*0 + col_offsets,
mask = mask, other = 0)
Q2 = tl.load(Q + half_head_dim*1 + col_offsets,
mask = mask, other = 0)
tl.store(Q + half_head_dim*0 + col_offsets,
Q1*cos1 - Q2*sin1, mask = mask)
tl.store(Q + half_head_dim*1 + col_offsets,
Q2*cos1 + Q1*sin1, mask = mask)
pass
We also rewrote all kernels using OpenAI's Triton language. For example, the above is our implementation of the RoPE embedding kernel with the backward pass. We tried our best to make it super clean, readable and fast. You can explore more of our hand written kernels in our Github repo