Saturday, December 25, 2010

Scala script to find duplicate files

Here is simple Scala script finding duplicate files and moving them to another directory. It searches album-with-duplicates for duplicates of files in main-album. All duplicates found are moved to copies directory in user's home.

If you have some ideas how to improve it, I'd appreciate if you share it in comments.

MD5 algorithm taken from here.
package com.blogspot.pawelstawicki.remove.duplicates

import java.security.MessageDigest
import java.io.{FileInputStream, File}
import org.apache.commons.io.{FilenameUtils, FileUtils, IOUtils}

/**
 * @author ${user.name}
 */
object App {
  
  def main(args : Array[String]) {
    val dir1 = new File("/photos/main-album,");
    val dir2 = new File("/photos/album-with-duplicates");

    val dir1Content = getAllFiles(dir1)
    val dir2Content = getAllFiles(dir2)

    var dir1Map = Map[String, File]()
    dir1Content.foreach(f => {
      val md5 = md5SumString(IOUtils.toByteArray(new FileInputStream(f)))
      println("md5 for " + f.getPath + ": " + md5)
      dir1Map = dir1Map + (md5 -> f)
    })

    var dir2Map = Map[String, File]()
    dir2Content.foreach(f => {
      val md5 = md5SumString(IOUtils.toByteArray(new FileInputStream(f)))
      println("md5 for " + f.getPath + ": " + md5)
      dir2Map = dir2Map + (md5 -> f)
    })

    for(md51 <- dir1Map.keys; md52 <- dir2Map.keys) {

      if (md51.equals(md52)) {
        val suspectedDuplicate = dir2Map(md52)
        val original = dir1Map(md52)

        if (checkDuplicate(original, suspectedDuplicate)) {
          println(suspectedDuplicate.getPath + " is duplicate of " + original.getPath)
          val copiesDir = new File(FileUtils.getUserDirectory + "/copies/" + FilenameUtils.getPathNoEndSeparator(original.getAbsolutePath()));
          println("Moving to " + copiesDir.getPath)
          FileUtils.moveFileToDirectory(suspectedDuplicate, copiesDir, true)
        }
      }
    }
  }

  def checkDuplicate(f1: File, f2: File): Boolean = {
    val bytes1 = new Array[Byte](1024*1024)
    val bytes2 = new Array[Byte](1024*1024)

    val input1 = new FileInputStream(f1)
    val input2 = new FileInputStream(f2)

    var bytesRead1 = input1.read(bytes1)
    while(bytesRead1 > 0) {
      val bytesRead2 = input2.read(bytes2)

      if (bytesRead1 != bytesRead2) {
        return false;
      }

      //Bytes read number the same
      if (!bytes1.sameElements(bytes2)) {
        return false
      }

      bytesRead1 = input1.read(bytes1)
    }

    //bytesRead1 is -1. Check if bytes read number from file2 is also -1
    if (input2.read(bytes2) == -1) {
      return true;
    } else {
      return false;
    }
  }

  def md5SumString(bytes : Array[Byte]) : String = {
    val md5 = MessageDigest.getInstance("MD5")
    md5.reset()
    md5.update(bytes)

    md5.digest().map(0xFF & _).map { "%02x".format(_) }.foldLeft(""){_ + _}
  }

  def getAllFiles(dir : File) : List[File] = {
    var l = List[File]()
    dir.listFiles.foreach(f => {
      if (f.isFile) {
        l = f :: l
      } else {
        l = l ::: getAllFiles(f)
      }
    })

    l
  }

}

Tuesday, December 21, 2010

JSF2.0 component for cross-field validation


Have you ever had problems with cross-field validation in JSF? Me too, so I created this component. You can validate few UIInput components and have their values as List in validator. The component is in softwaremill-faces library. To use it in maven project, add repository:

<repository>
  <url>http://tools.softwaremill.pl/nexus/content/groups/smlcommon-repos/ </url>
  <layout>default</layout>
  <releases>
    <enabled>true</enabled>
  </releases>
  <snapshots>
    <enabled>true</enabled>
  </snapshots>
</repository>
and dependency:
<dependency>
  <groupId>pl.softwaremill.common</groupId>
  <artifactId>softwaremill-faces</artifactId>
  <version>43-SNAPSHOT</version>
</dependency>
Now you can use multiValidator component. First add namespace to your .xhtml page:
xmlns:v="http://pl.softwaremill.common.faces/components"

Then just wrap components you want to cross-validate in <v:multiValidator>. If you attach validator to this component, value parameter that goes to validation method is List of values of UIInput components inside  tag.

E.g. if you want to validate two checkboxes. Each can be checked or unchecked, but at least one has to be checked.
<v:multiValidator id="multi" validator="#{bean.validationMethod}">
  <h:selectBooleanCheckbox value="#{bean.check1}" />
  <h:selectBooleanCheckbox value="#{bean.check2}" />
</v:multiValidator>
<h:message for="multi" />
Validation method in bean:
public void validationMethod(FacesContext context, UIComponent component, Object value) {
  List<Object> values = (List<Object>) value;
  //value is list of values of both selectBooleanCheckboxes
  Boolean firstChecked = (Boolean) values.get(0);
  Boolean secondChecked = (Boolean) values.get(1);

  if (! (firstChecked || secondChecked)) {
    Message message = new FacesMessage(FacesMessage.SEVERITY_ERROR, "Check at least one checkbox", null);
    throw new ValidatorException(message);
  }
}
If none checkbox is checked error message is displayed in <h:message for="multi"> tag.

Source code of this component is on github.

Any suggestions, opinions or questions regarding this component are welcome. Have a good time using it :)