Perl Automated Linux From Scratch

Andrew Benton andy at
Tue May 29 03:12:12 PDT 2012

Hello all,
In and effort to learn some perl I decided to write a script to extract the commands from the raw LFS xml and it turned out quite well. The XML::LibXML perl module uses libxml to process the xml and it does all the hard work.

This script takes one argument, the path to the index.xml file in the
top folder of the LFS xml and it produces 2 folders of bash scripts, 1
script for each page of the book. In each folder there is an
which runs all the other scripts in the order they are in the book. To
use the generated scripts the lfs-tools/ should be run by the
user lfs and does all of chapter 5 (except the last page,
chowning /tools has to be done by root).

root-chroot/ should be run by root after chrooting and should
run all the way to the end of chapter 6 (you'll need to set the root
password when it installs shadow).

These scripts do not attempt to do anything after chapter 6. Setting up
the bootscripts, running grub, compiling the kernel and so on, all
those bits should be done by hand.

Obviously, this is a lot more restricted than jhalfs and (although it
works) I don't expect people to start using it seriously. I just wanted
a something I could do with perl (it was either this or do project
Euler again) but having said that, I thought I'd post it somewhere
public in case anyone is interested.



use strict;
use warnings;
use File::Path;
use File::Basename;
use XML::LibXML;
use XML::LibXML::XPathContext;

# Change this to the locale you want glibc to install in chapter 6:
my $locale = "Europe/London";

# Change this to the number of jobs you want to run make with:
my $make_jobs = "-j4";

# Change this to yes if you want to run the tests:
my $run_tests = "no";

# Change this to letter if you're an uncouth colonial ;)
my $paper_size = "A4";

my $index_xml = shift @ARGV;
$index_xml =~ ".*index.xml" or die "Please tell me where the LFS index.xml is.";

my $xml_dir = dirname $index_xml;

# Create a packages hash with the package versions:
my %packages;
# start a new XML::LibXML parser instance:
my $parser = XML::LibXML->new();
# This feels like a hack. There must be a way to get LibXML to query
# packages.ent to find out the package versions but for now we'll just grep
# through packages.xml looking for terms enclosed by brackets :/
my $content = $parser->parse_file("$xml_dir/chapter03/packages.xml");
# The entities with the package versions are in <varlistentry><term> elements
foreach my $element ($content->findnodes('//varlistentry/term')) {
  # split the line into an array:
  my @tmp_line = split /\s+/, $element->to_literal;
  my $version;
  # step through the parts of the line:
  foreach my $parts (@tmp_line) {
    # We're looking for the bit in brackets:
    $version = $parts if $parts =~ s#[()]##g
  # If the second word is "Configuration" it's the udev config tarball which
  # will remove the udev version we don't stop it:
  unless ("$tmp_line[1]" eq "Configuration") {
    $packages{"\L$tmp_line[0]"} = $version

# This subroutine finds all the xml files xincluded in an xml file
sub included {
  # XML::LibXML does all the hard work. Open a new parser:
  my $parser = XML::LibXML->new();
  # Parse the subroutine's first argument into $doc:
  my $doc = $parser->parse_file($_[0]);
  # look through $doc for any XPaths
  my $xml_path_context = XML::LibXML::XPathContext->new($doc);
  # Files are included using this namespace:
  $xml_path_context->registerNs('xi', '');
  # initialise an array to store the results:
  my @results;
  # findnodes finds the elements using the namespace
  for ($xml_path_context->findnodes('//xi:include/@href')) {
    # catch the results in the variable $link so we can do a bit of processing:
    my $link = $_->getValue();
    # Ignore some chapters and avoid trouble:
    if (($link !~ "aboutdebug") &&
        ($link !~ "changingowner") &&
        ($link !~ "chroot") &&
        ($link !~ "generalinstructions") &&
        ($link !~ "introduction") &&
        ($link !~ "kernfs") &&
        ($link !~ "pkgmgt") &&
        ($link !~ "revisedchroot") &&
        ($link !~ "stripping") &&
        ($link !~ "toolchaintechnotes")) {
      push @results, $link;
  return @results

# Make an array that contains all the xml files xincluded in index.xml:
my @chapter = &included($index_xml);

# remove and then recreate the aLFS-scripts folder to ensure it's all new:
mkdir "aLFS-scripts", 0755 or die "Can't make the scripts directory: $!";
# step through the chapters in index.xml:
foreach (@chapter) {
  my $chapter_root = $_;
  # We're only going to use chapters 5 and 6:
  if ($chapter_root =~ ".*chapter0[56].xml") {
    # Make an array listing the xml files xincluded in the chapter: 
    my @chapter_page = &included("$xml_dir/$chapter_root");
    $chapter_root = dirname $chapter_root;
    my $chapter_name;
    if ($chapter_root =~ "chapter05") {
      $chapter_name = "lfs-tools";
    } else {
      $chapter_name = "root-chroot";
    mkdir "aLFS-scripts/$chapter_name", 0755 or
    die "Can't make aLFS-scripts/$chapter_name: $!";
    # Start a new
    open INDEX, ">", "aLFS-scripts/$chapter_name/" or
    die "Can't make aLFS-scripts/$chapter_name/ $!";
    # Give a hashbang, stop on errors and turn off hashing:
    print INDEX "#!/bin/bash\nset -e\nset +h\ncd \$(dirname \${0})\n\n";
    foreach (@chapter_page) {
      my $page = "$xml_dir/$chapter_root/$_";
      my $package = basename $page;
      $package =~ s#\.xml##;
      print "Creating aLFS-scripts/$chapter_name/$", "\n";
      # Add an entry to for each subscript:
      print INDEX "bash $\n";
      open SCRIPT, ">", "aLFS-scripts/$chapter_name/$" or
      die "Can't make aLFS-scripts/$chapter_name/$ $!";
      # Now we've opened the output script we can fix the $package name.
      # For binutils and gcc, remove -pass1 and -pass2 from the name:
      $package =~ s#-pass[1-2]##;
      # For the kernel, remove -headers from the name:
      $package =~ s#-headers##;
      if ($chapter_name eq "lfs-tools") {
        print SCRIPT "#!/bin/bash\nset -e\nset +h\ncd \${LFS}/sources\n\n";
      } else {
        print SCRIPT "#!/bin/bash\nset -e\nset +h\ncd /sources\n\n";
      my $texinfo_without_the_letter;
      my $vim_without_the_dot;
      # If $package has a version in the %packages hash we need to untar the
      # source and cd into it. Put a rm -rf first to be sure it's all clean:
      unless ($packages{$package} ~~ undef) {
        # Not all packages untar to a folder with the same name :/
        if ($package eq "tcl") {
          print SCRIPT "rm -rf $package$packages{$package}\n";
          print SCRIPT "tar -xf $package$packages{$package}-src.tar.*\n";
          print SCRIPT "cd $package$packages{$package}\n";
        } elsif ($package eq "expect") {
          print SCRIPT "rm -rf $package$packages{$package}\n";
          print SCRIPT "tar -xf $package$packages{$package}.tar.*\n";
          print SCRIPT "cd $package$packages{$package}\n";
        } elsif ($package eq "vim") {
          # Vim's source dir doesn't have a dot in the version :/
          ($vim_without_the_dot = $packages{$package}) =~ s#\.##g;
          print SCRIPT "rm -rf $package$vim_without_the_dot\n";
          print SCRIPT "tar -xf $package-$packages{$package}.tar.*\n";
          print SCRIPT "cd $package$vim_without_the_dot\n";
        } elsif ($package eq "texinfo") {
          # Texinfo's source dir doesn't have a letter in the version :/
          ($texinfo_without_the_letter = $packages{$package}) =~ s#[a-z]##g;
          print SCRIPT "rm -rf $package-$texinfo_without_the_letter\n";
          print SCRIPT "tar -xf $package-$packages{$package}.tar.*\n";
          print SCRIPT "cd $package-$texinfo_without_the_letter\n";
        } elsif (($package eq "gcc") ||
                 ($package eq "glibc") ||
                 ($package eq "binutils")) {
          print SCRIPT "rm -rf $package-{$packages{$package},build}\n";
          print SCRIPT "tar -xf $package-$packages{$package}.tar.*\n";
          print SCRIPT "cd $package-$packages{$package}\n";
        } else {
          print SCRIPT "rm -rf $package-$packages{$package}\n";
          print SCRIPT "tar -xf $package-$packages{$package}.tar.*\n";
          print SCRIPT "cd $package-$packages{$package}\n";
      } # end unless ($packages{$package} ~~ undef)
      # Now we get XML::LibXML to do some more magic:
      my $parser = XML::LibXML->new();
      # Parse the current page:
      my $content = $parser->parse_file($page);
      # Find what's inside the <screen><userinput> tags on the page. There may
      # be several userinput tags so we'll use a foreach:
      foreach my $element ($content->findnodes('//screen/userinput')) {
        # Put the content in a variable so we can manipulate it:
        my $page_content = $element->to_literal;
        # Parallel builds save time:
        $page_content =~ s#^make$#make $make_jobs#;
        if ($package eq "glibc") {
          # Glibc needs to copy a real locale:
          $page_content =~ s#<xxx>#$locale#;
          # Don't run tzselect:
          $page_content =~ s/^tzselect$/# $&/;
        # Groff needs a real paper size:
        $page_content =~ s#<paper_size>#$paper_size# if ($package eq "groff");
        # TODO: handle this ABI=32 issue:
        $page_content =~ s/^ABI=32/# $&/ if ($package eq "gmp");
        if (($package eq "bash") ||
            ($package eq "createfiles")) {
          # Logging in again will break the scripts, each script runs a new
          # /bin/bash in a sub shell so maybe we don't need to login again?:
          $page_content =~ s/.* --login/# $&/;
        # Don't run vim:
        $page_content =~ s/.*:options/# $&/ if ($package eq "vim");
        # Shall we run the tests?
        if ($run_tests eq "no") {
          $page_content =~ s/.*[ -_]check/# $&/g;
          $page_content =~ s/.* test/# $&/g;
          $page_content =~ s/.* dummy.log/# $&/;
          $page_content =~ s/.* a\.out/# $&/;
        # Ok, now print the book's commands to our current script:
        print SCRIPT $page_content, "\n";
      } # end foreach my $element ($content->findnodes('//screen/userinput'))
      # If $packages{$package} is defined the page untared some source code
      # which we now need to clean up:
      unless ($packages{$package} ~~ undef) {
        if ($chapter_name eq "lfs-tools") {
          print SCRIPT "cd \${LFS}/sources\n";
        } else {
          print SCRIPT "cd /sources\n";
        # Different packages have differently named source folders:
        if (($package eq "tcl") || ($package eq "expect")) {
          print SCRIPT "rm -rf $package$packages{$package}\n";
        } elsif ($package eq "vim") {
          print SCRIPT "rm -rf $package$vim_without_the_dot\n";
        } elsif ($package eq "texinfo") {
          print SCRIPT "rm -rf $package-$texinfo_without_the_letter\n";
        } elsif (($package eq "gcc") ||
                 ($package eq "glibc") ||
                 ($package eq "binutils")) {
          print SCRIPT "rm -rf $package-{$packages{$package},build}\n";
        } else {
          print SCRIPT "rm -rf $package-$packages{$package}\n";
      } # end unless ($packages{$package} ~~ undef)
      close SCRIPT;
    } # end foreach (@chapter_page)
    close INDEX;
  } # end if ($chapter_root =~ ".*chapter0[56].xml")
} # end foreach (@chapter)

# TODO: Add logging?

More information about the alfs-discuss mailing list