LVM, TLC NAND and the power of 2 problem

I recently bought a Samsung Evo 840 1TB SSD drive. These drives perform pretty good for consumer drives, and despite the lack of endurance figures, some user tests show that it lasts just fine; 80TB isn’t in the enterprise range but it’s sufficient for consumer grade, e.g. desktops and laptops. Also, the larger the drive, the  longer it can last, provided you aren’t filling it up completely.

With a drive this big, LVM is very practical; I tend to rearrange storage allocation quite a bit over time, and virt-manager does a good job of managing LVM as a disk backend for VMs. Generally using raw disks or logical volumes is also a bit faster for your VMs, though you lose some nice features of the qcow2 and other file container formats.

As we know by now, aligning your SSD properly is important if you care about performance – but equally so for its lifetime. Proper alignment on erase block sizes ensures minimal P/E cycles which will make your drive last longer.. But there is a problem with TLC drives that doesn’t affect SLC or MLC – the erase block size is not a power of 2. It’s 1536kb for TLC, whereas SLC and MLC are 512kb and 1024kb or multiples of that.

Now when you just create a few partitions and ensure they are aligned, this is not going to be an issue. However, if you use LVM and intend to create and delete a lot of logical volumes over time, you’d want them to be aligned on the erase block. With current versions of LVM, you can’t do that for TLC NAND drives. LVM requires a physical extent size that is a power of 2 – and there is no number that is both a power of 2 and dividable by 1.5.

There are two workarounds that I have found:

  • Create a VG with 512kb physical extents and always make sure that you use lvcreate with an extent count that is a multiple of 3
  • Create a VG with 1MB physical extents and always make sure that your newly created volumes are multiples of 3MB

However, neither of these workarounds allow for the “set it and forget it” benefits of LVM, because they cannot be enforced – if you forget to keep this in mind or make a math error, your alignment will be off. You also can’t rely on other tools to create volumes for you because they may not allow for such granularity in choosing sizes.

This is ultimately a limitation of LVM that it should overcome, but I have not found any evidence that it can work without its power-of-2 requirement. I need to take this up to the mailinglist, but until that time, here’s a quick script that will verify the alignment of logical volumes in a volume group:

#!/bin/bash

# We assome 512 byte sector sizes
SECTSIZE=512

# SLC has 512kb erase block sizes
let SLCALIGN=(512*1024)/${SECTSIZE}
# MLC has 1024kb erase block sizes
let MLCALIGN=(1024*1024)/${SECTSIZE}
# TLC has 1536kb erase block sizes
let TLCALIGN=(1536*1024)/${SECTSIZE}

# Check all disks for partitions and verify their alignment
for disk in /sys/class/block/sd[a-z]; do
 for partition in ${disk}[0-9]; do
 START=`cat $partition/start`
 if (( ${START} % ${SLCALIGN} == 0 )); then SLC=YES; else SLC=" NO"; fi
 if (( ${START} % ${MLCALIGN} == 0 )); then MLC=YES; else MLC=" NO"; fi
 if (( ${START} % ${TLCALIGN} == 0 )); then TLC=YES; else TLC=" NO"; fi
 echo "Partition `basename ${partition}` alignment: SLC ${SLC} - MLC ${MLC} - TLC ${TLC} - sector ${START}"
 done
done

# Check any LVM volumes
PVS=`pvs -o pv_name --noheadings`
for pv in ${PVS}; do
 # Check 1st PE alignment for all PVs
 PESTART=`pvs $pv -o pe_start --units s --noheadings | sed 's/S$//' | sed 's/ //g'`
 if (( ${PESTART} % ${SLCALIGN} == 0 )); then SLC=YES; else SLC=" NO"; fi
 if (( ${PESTART} % ${MLCALIGN} == 0 )); then MLC=YES; else MLC=" NO"; fi
 if (( ${PESTART} % ${TLCALIGN} == 0 )); then TLC=YES; else TLC=" NO"; fi
 echo "LVM PV `basename ${pv}` first PE: SLC ${SLC} - MLC ${MLC} - TLC ${TLC} - sector ${START}"

 # Get PE size in kb
 PESIZEKB=`pvdisplay -c ${pv} | cut -d':' -f8`
 # Convert to sectors
 let PESECTSIZE=(${PESIZEKB}*1024)/${SECTSIZE}

 # Then check all PV segments for alignment
 for segment in `pvs --segments ${pv} -o pvseg_start --noheadings | sed 's/ //g' | grep -v '^0$'`; do
 let SEGSTART=${segment}*${PESECTSIZE}
 if (( ${SEGSTART} % ${SLCALIGN} == 0 )); then SLC=YES; else SLC=" NO"; fi
 if (( ${SEGSTART} % ${MLCALIGN} == 0 )); then MLC=YES; else MLC=" NO"; fi
 if (( ${SEGSTART} % ${TLCALIGN} == 0 )); then TLC=YES; else TLC=" NO"; fi
 echo "LVM extent check: SLC ${SLC} - MLC ${MLC} - TLC ${TLC} - extent ${segment} sector ${SEGSTART}"
 done
done

The output will look something like this:

Partition sdb1 alignment: SLC YES - MLC YES - TLC YES - sector 6144
Partition sdb2 alignment: SLC YES - MLC YES - TLC YES - sector 620544
Partition sdb3 alignment: SLC YES - MLC YES - TLC YES - sector 944338944
LVM PV sdb3 first PE:     SLC YES - MLC YES - TLC YES - sector 3670018048
LVM extent check:         SLC YES - MLC YES - TLC YES - extent 92160 sector 188743680
LVM extent check:         SLC YES - MLC YES - TLC YES - extent 92259 sector 188946432
LVM extent check:         SLC YES - MLC YES - TLC YES - extent 92274 sector 188977152
LVM extent check:         SLC YES - MLC YES - TLC YES - extent 92280 sector 188989440
LVM extent check:         SLC YES - MLC YES - TLC YES - extent 92283 sector 188995584
LVM extent check:         SLC YES - MLC YES - TLC YES - extent 92286 sector 189001728
LVM extent check:         SLC YES - MLC YES - TLC YES - extent 92289 sector 189007872
LVM extent check:         SLC YES - MLC YES - TLC YES - extent 92292 sector 189014016
LVM extent check:         SLC YES - MLC YES - TLC  NO - extent 92339 sector 189110272
LVM extent check:         SLC YES - MLC YES - TLC  NO - extent 92341 sector 189114368
LVM extent check:         SLC YES - MLC YES - TLC YES - extent 92343 sector 189118464
LVM extent check:         SLC YES - MLC YES - TLC  NO - extent 92345 sector 189122560

In the example above, there are a number of extents that were deliberately misaligned as they were not divisible by 3.

To find out which logical volume the misaligned extents belong to, you can use lvdisplay -m and search the output for the extent number.

Leave a Reply

Your email address will not be published. Required fields are marked *

Human Test *