jdk源码阅读---ArrayList

ArrayList是Java集合类中运用很广泛的一个结构，这篇文章是阅读ArrayList类做的记录来加深对ArrayList的理解。

ArrayList类图

下图是ArrayList类图，其中蓝色线条意味着继承，绿色线条意味着接口实现。

ArrayList的描述

ArrayList位于java.util包下面，下面是官方对ArrayList类的描述：

/**
 * Resizable-array implementation of the <tt>List</tt> interface.  Implements
 * all optional list operations, and permits all elements, including
 * <tt>null</tt>.  In addition to implementing the <tt>List</tt> interface,
 * this class provides methods to manipulate the size of the array that is
 * used internally to store the list.  (This class is roughly equivalent to
 * <tt>Vector</tt>, except that it is unsynchronized.)
 *
 * <p>The <tt>size</tt>, <tt>isEmpty</tt>, <tt>get</tt>, <tt>set</tt>,
 * <tt>iterator</tt>, and <tt>listIterator</tt> operations run in constant
 * time.  The <tt>add</tt> operation runs in <i>amortized constant time</i>,
 * that is, adding n elements requires O(n) time.  All of the other operations
 * run in linear time (roughly speaking).  The constant factor is low compared
 * to that for the <tt>LinkedList</tt> implementation.
 *
 * <p>Each <tt>ArrayList</tt> instance has a <i>capacity</i>.  The capacity is
 * the size of the array used to store the elements in the list.  It is always
 * at least as large as the list size.  As elements are added to an ArrayList,
 * its capacity grows automatically.  The details of the growth policy are not
 * specified beyond the fact that adding an element has constant amortized
 * time cost.
 *
 * <p>An application can increase the capacity of an <tt>ArrayList</tt> instance
 * before adding a large number of elements using the <tt>ensureCapacity</tt>
 * operation.  This may reduce the amount of incremental reallocation.
 *
 * <p><strong>Note that this implementation is not synchronized.</strong>
 * If multiple threads access an <tt>ArrayList</tt> instance concurrently,
 * and at least one of the threads modifies the list structurally, it
 * <i>must</i> be synchronized externally.  (A structural modification is
 * any operation that adds or deletes one or more elements, or explicitly
 * resizes the backing array; merely setting the value of an element is not
 * a structural modification.)  This is typically accomplished by
 * synchronizing on some object that naturally encapsulates the list.
 *
 * If no such object exists, the list should be "wrapped" using the
 * {@link Collections#synchronizedList Collections.synchronizedList}
 * method.  This is best done at creation time, to prevent accidental
 * unsynchronized access to the list:<pre>
 *   List list = Collections.synchronizedList(new ArrayList(...));</pre>
 *
 * <p><a name="fail-fast">
 * The iterators returned by this class's {@link #iterator() iterator} and
 * {@link #listIterator(int) listIterator} methods are <em>fail-fast</em>:</a>
 * if the list is structurally modified at any time after the iterator is
 * created, in any way except through the iterator's own
 * {@link ListIterator#remove() remove} or
 * {@link ListIterator#add(Object) add} methods, the iterator will throw a
 * {@link ConcurrentModificationException}.  Thus, in the face of
 * concurrent modification, the iterator fails quickly and cleanly, rather
 * than risking arbitrary, non-deterministic behavior at an undetermined
 * time in the future.
 *
 * <p>Note that the fail-fast behavior of an iterator cannot be guaranteed
 * as it is, generally speaking, impossible to make any hard guarantees in the
 * presence of unsynchronized concurrent modification.  Fail-fast iterators
 * throw {@code ConcurrentModificationException} on a best-effort basis.
 * Therefore, it would be wrong to write a program that depended on this
 * exception for its correctness:  <i>the fail-fast behavior of iterators
 * should be used only to detect bugs.</i>
 *

主要意思就是以下几点：

ArrayList是实现了List接口的、大小可变的数组队列。能够实现所有List可选操作，并允许包括null在内的所有元素。除了实现List接口外，此类还提供一些方法来操作内部用来存储列表的数组的大小。（此类大致上等同于 Vector 类，除了此类是不同步的。）
ArrayList的size、isEmpty、get、set、iterator和listIterator操作都以固定时间运行。add操作以分摊的固定时间运行，也就是说，添加n个元素需要O(n)时间。其他所有操作都以线性时间运行。与用于LinkedList实现的常数因子相比，此实现的常数因子较低。
每个ArrayList实例都有一个容量。该容量是指用来存储列表元素的数组的大小。它总是至少等于列表的大小。随着向ArrayList中不断添加元素，其容量也自动增长。并未指定增长策略的细节，因为这不只是添加元素会带来分摊固定时间开销那样简单。在添加大量元素前，应用程序可以使用ensureCapacity操作来增加ArrayList实例的容量。这可以减少递增式再分配的数量。
注意，此实现不是同步的。如果多个线程同时访问一个ArrayList实例，而其中至少一个线程从结构上修改了列表，那么它必须保持外部同步。（结构上的修改是指任何添加或删除一个或多个元素的操作，或者显式调整底层数组的大小；仅仅设置元素的值不是结构上的修改。）这一般通过对自然封装该列表的对象进行同步操作来完成。如果不存在这样的对象，则应该使用Collections.synchronizedList方法将该列表“包装”起来。这最好在创建时完成，以防止意外对列表进行不同步的访问。尽量避免在并发访问中使用ArrayList。可以使用同步的Vector或者CopyOnWriteList集合代替ArrayList

ArrayList定义

public class ArrayList<E> extends AbstractList<E> implements List<E>, RandomAccess, Cloneable, java.io.Serializable
可以看到，ArrayList是AbstractList的子类，同时实现了List接口。除此之外，它还实现了三个标识型接口，这几个接口都没有任何方法，仅作为标识表示实现类具备某项功能。RandomAccess表示实现类支持快速随机访问，Cloneable表示实现类支持克隆，具体表现为重写了clone方法，java.io.Serializable则表示支持序列化，如果需要对此过程自定义，可以重写writeObject与readObject方法。

ArrayList属性

private static final long serialVersionUID = 8683452581122892189L;

因为String实现了Serializable接口，所以支持序列化和反序列化支持。
Java的序列化机制是通过在运行时判断类的serialVersionUID来验证版本一致性的。

private static final int DEFAULT_CAPACITY = 10;

ArrayList的默认容量为10；

private static final Object[] EMPTY_ELEMENTDATA = {};

空实例的共享空数组实例

transient Object[] elementData;

elementData 是”Object[] 类型的数组”，它保存了添加到ArrayList中的元素。实际上，elementData是个动态数组，我们能通过构造函数 ArrayList(intinitialCapacity)来执行它的初始容量为initialCapacity；如果通过不含参数的构造函数ArrayList()来创建ArrayList，则elementData的容量默认是10。elementData数组的大小会根据ArrayList容量的增长而动态的增长，具体的增长方式下面在讨论
至于为什么要用transient修饰，下面关于序列化的时候再讨论。
private int size;
size是动态数组的实际大小。

ArrayList构造方法

//ArrayList带容量的构造函数
public ArrayList(int initialCapacity) {
    if (initialCapacity > 0) {
    //新建一个容量为initialCapacity的数组
        this.elementData = new Object[initialCapacity];
    } else if (initialCapacity == 0) {
        this.elementData = EMPTY_ELEMENTDATA;
    } else {
        throw new IllegalArgumentException("Illegal Capacity: "+
                                           initialCapacity);
    }
}
//ArrayList默认构造函数，默认为空
public ArrayList() {

    this.elementData = DEFAULTCAPACITY_EMPTY_ELEMENTDATA;
}
// 构造一个包含指定元素的list
public ArrayList(Collection<? extends E> c) {
    //直接利用Collection.toArray()方法得到一个对象数组，并赋值给elementData 
    elementData = c.toArray();
    //因为size代表的是集合元素数量，所以通过别的集合来构造ArrayList时，要给size赋值
    if ((size = elementData.length) != 0) {
        if (elementData.getClass() != Object[].class)//这里是当c.toArray出错，没有返回Object[]时，利用Arrays.copyOf 来复制集合c中的元素到elementData数组中
            elementData = Arrays.copyOf(elementData, size, Object[].class);
    } else {
        //如果集合c元素数量为0，则将空数组EMPTY_ELEMENTDATA赋值给elementData
        this.elementData = EMPTY_ELEMENTDATA;
    }
}

第一个构造方法使用提供的initialCapacity来初始化elementData数组的大小。

第二个构造方法默认数组为0。

第三个构造方法则将提供的集合转成数组返回给elementData（返回若不是Object[]将调用Arrays.copyOf方法将其转为Object[]）。

ArrayList常用方法

add增加

public boolean add(E e) {
    //扩容判断
    ensureCapacityInternal(size + 1);  // Increments modCount!!
    elementData[size++] = e;
    return true;
}

public void add(int index, E element) {
    //判断index是否越界，错误产生IndexOutOfBoundsException
    rangeCheckForAdd(index);
    //进行扩容检查
    ensureCapacityInternal(size + 1);  // Increments modCount!!
    //对数组进行复制，将空出的Index位置出入element，并将index后的所有数据后移一个位置。
    System.arraycopy(elementData, index, elementData, index + 1,size - index);
    //将index上的数据设置为element
    elementData[index] = element;
    //容量+1
    size++;
}

扩容

以上两种添加数据的方式都调用到了ensureCapacityInternal这个方法，我们看看它是如何完成工作的：

private void ensureCapacityInternal(int minCapacity) {
    //插入第一个数据就直接将其扩充至10
    if (elementData == EMPTY_ELEMENTDATA) {
        minCapacity = Math.max(DEFAULT_CAPACITY, minCapacity);
    }
    //这里把工作又交了出去
    ensureExplicitCapacity(minCapacity);
}
//如果elementData的长度不能满足需求，就需要扩充了
private void ensureExplicitCapacity(int minCapacity) {
    modCount++;

    // overflow-conscious code
    if (minCapacity - elementData.length > 0)
        grow(minCapacity);
}
private static final int MAX_ARRAY_SIZE = Integer.MAX_VALUE - 8;

//扩充
private void grow(int minCapacity) {
    // overflow-conscious code
    int oldCapacity = elementData.length;
     //可以看到这里是1.5倍扩充的
    int newCapacity = oldCapacity + (oldCapacity >> 1);
    //扩充完之后，还是没满足，这时候就直接扩充到minCapacity
    if (newCapacity - minCapacity < 0)
        newCapacity = minCapacity;
    //防止溢出
    if (newCapacity - MAX_ARRAY_SIZE > 0)
        newCapacity = hugeCapacity(minCapacity);
    // minCapacity is usually close to size, so this is a win:
    elementData = Arrays.copyOf(elementData, newCapacity);
}

ArrayList的扩容机制：

首先创建一个空数组elementData，第一次插入数据时直接扩充至10，然后如果elementData的长度不足，就扩充1.5倍，如果扩充完还不够，就使用需要的长度作为elementData的长度。
扩容的的具体做法：
得到当前的ArrayList的容量(oldCapacity)。
计算除扩容后的新容量(newCapacity)，其值(oldCapacity + (oldCapacity >>1))约是oldCapacity 的1.5倍。
这里采用的是移位运算。为什么采用这种方法呢？应该是出于效率的考虑。
当newCapacity小于所需最小容量，那么将所需最小容量赋值给newCapacity。
newCapacity大于ArrayList的所允许的最大容量,处理。进行数据的复制，完成向ArrayList实例添加元素操作。

remove删除

public E remove(int index) {
    rangeCheck(index);//数组越界检查

    modCount++;
    E oldValue = elementData(index);

    int numMoved = size - index - 1;//计算数组需要复制的数量
    if (numMoved > 0)//将index后的数据都向前移一位
        System.arraycopy(elementData, index+1, elementData, index,
                         numMoved);
    elementData[--size] = null; // clear to let GC do its work

    return oldValue;
}
// 删除指定内容的元素（只删除第一个匹配成功的）
public boolean remove(Object o) {
    if (o == null) {
        for (int index = 0; index < size; index++)
            if (elementData[index] == null) {
                fastRemove(index);
                return true;
            }
    } else {
        for (int index = 0; index < size; index++)
            if (o.equals(elementData[index])) {
                fastRemove(index);
                return true;
            }
    }
    return false;
}
//找到对应的元素后，删除。删除元素后的元素都向前移动一位
private void fastRemove(int index) {
    modCount++;
    int numMoved = size - index - 1;
    if (numMoved > 0)     // 将index后面的元素整体向前移动一位
        System.arraycopy(elementData, index+1, elementData, index,
                         numMoved);
    elementData[--size] = null; // help GC
}
//清空ArrayList，将全部的元素设为null
public void clear() {
    modCount++;

    for (int i = 0; i < size; i++)    // help  GC
        elementData[i] = null;

    size = 0;
}

//删除ArrayList中从fromIndex到toIndex（区间--左闭右开）之间所有的元素
protected void removeRange(int fromIndex, int toIndex) {
    modCount++;
    int numMoved = size - toIndex;  //需向前移动的元素的个数
    System.arraycopy(elementData, toIndex, elementData, fromIndex,
                     numMoved);

    // help GC
    int newSize = size - (toIndex-fromIndex);
    for (int i = newSize; i < size; i++) {
        elementData[i] = null;
    }
    size = newSize;
}

删除操作一定会修改modCount，且可能涉及到数组的复制，相对低效。

批量删除这里没分析，感兴趣的可以看源码。

set修改

不会修改modCount，相对增删是高效的操作。

public E set(int index, E element) {
    rangeCheck(index);//越界检查
    E oldValue = elementData(index); //取出元素 
    elementData[index] = element;//覆盖元素
    return oldValue;//返回元素
}

get查询

不会修改modCount，相对增删是高效的操作。

public E get(int index) {
    rangeCheck(index);//越界检查
    return elementData(index); //下标取数据
}
E elementData(int index) {
    return (E) elementData[index];
}

包含 contain

//普通的for循环寻找值，只不过会根据目标对象是否为null分别循环查找。
public boolean contains(Object o) {
    return indexOf(o) >= 0;
}
//普通的for循环寻找值，只不过会根据目标对象是否为null分别循环查找。
public int indexOf(Object o) {
    if (o == null) {
        for (int i = 0; i < size; i++)
            if (elementData[i]==null)
                return i;
    } else {
        for (int i = 0; i < size; i++)
            if (o.equals(elementData[i]))
                return i;
    }
    return -1;
}

迭代器 Iterator

//返回普通迭代器
public Iterator<E> iterator() {
    return new Itr();
}

//通用的迭代器实现
private class Itr implements Iterator<E> {
    int cursor;       //游标，下一个元素的索引，默认初始化为0
    int lastRet = -1; //上次访问的元素的位置,如果没有，为-1；
    int expectedModCount = modCount;//迭代过程不允许修改数组，否则就抛出异常
    //是否还有下一个
    public boolean hasNext() {
        return cursor != size;
    }
    //下一个元素
    @SuppressWarnings("unchecked")
    public E next() {
        checkForComodification();//检查数组是否被修改
        int i = cursor;
        if (i >= size)
            throw new NoSuchElementException();
        Object[] elementData = ArrayList.this.elementData;
        if (i >= elementData.length)
            throw new ConcurrentModificationException();
        cursor = i + 1; //向后移动游标
        return (E) elementData[lastRet = i];    //设置访问的位置并返回这个值
    }
    //删除元素
    public void remove() {
        if (lastRet < 0)
            throw new IllegalStateException();
        checkForComodification();//检查数组是否被修改
        try {
            ArrayList.this.remove(lastRet);
            cursor = lastRet;
            lastRet = -1;
            expectedModCount = modCount;
        } catch (IndexOutOfBoundsException ex) {
            throw new ConcurrentModificationException();
        }
    }
    @Override
    @SuppressWarnings("unchecked")
    public void forEachRemaining(Consumer<? super E> consumer) {
        Objects.requireNonNull(consumer);
        final int size = ArrayList.this.size;
        int i = cursor;
        if (i >= size) {
            return;
        }
        final Object[] elementData = ArrayList.this.elementData;
        if (i >= elementData.length) {
            throw new ConcurrentModificationException();
        }
        while (i != size && modCount == expectedModCount) {
            consumer.accept((E) elementData[i++]);
        }
        cursor = i;
        lastRet = i - 1;
        checkForComodification();
    }
    //检查数组是否被修改
    final void checkForComodification() {
        if (modCount != expectedModCount)
            throw new ConcurrentModificationException();
    }
}

iterator方法中新建一个Itr对象，Itr是一个成员内部类，实现了Iterator接口，它有三个实例成员变量，cursor表示下一个要返回的元素位置，lastRet表示最后一个返回的索引位置，expectedModCount表示期望的修改次数，初始化为外部类当前的修改次数modCount。每次发生结构性变化的时候modCount都会增加，而每次迭代操作的时候都会检查expectedModCount是否与modCount相同，这样就能检测出结构性变化。

注意：调用迭代器的remove方法前必须先调用next，比如，通过迭代器删除所有元素，直觉上，可以这样写

public static void removeAll(ArrayList<Integer> list){
    Iterator<Integer> it = list.iterator();
    while(it.hasNext()){
        it.remove();
    }
}

实际上运行会抛出异常java.lang.IllegalStateException，正确的写法是：

public static void removeAll(ArrayList<Integer> list){
    Iterator<Integer> it = list.iterator();
    while(it.hasNext()){
        it.next();
        it.remove();
    }
}

listIterator()的实现使用了另一个内部类ListItr，它继承自Itr，基本思路一样的。

总结

ArrayList是日常开发中最常用的类之一，我们分析了ArrayList的构造方法、扩容机制、常用的方法的源码解读。需要说明的是ArrayList不是线程安全的，线程安全的有Vector类，它是线程安全的，当要求线程安全时可以调用它，它与ArrayList的基本原理基本相同。

ArrayList的插入和删除的性能比较低，而LinkedList特点与ArrayList正好相反，我们会继续阅读LinkedList的源码来深入理解这个容器类。