J. Vis. Commun. Image R. xxx (2015) xxx–xxx
Contents lists available at ScienceDirect
J. Vis. Commun. Image R. journal homepage: www.elsevier.com/locate/jvci
An improved augmented reality system based on AndAR q Peng Chen ⇑, Zhang Peng, Dalong Li, Lijuan Yang College of Computer and Information Technology, China Three Gorges University, Yichang, China
a r t i c l e
i n f o
Article history: Received 1 January 2015 Accepted 29 June 2015 Available online xxxx Keywords: AndAR Mobile augmented reality Android platform Registration Inliers tracking Feature extraction and matching Low-level visual features Architecture
a b s t r a c t AndAR is a project applied to develop Mobile Augmented Reality (MAR) applications on the android platform. The existing registration technologies of AndAR are still base on markers assume that all frames from all videos contain the target objects. With the need of practical application, the registration based on natural features is more popular, but the major limitation of the registration is that many of them are based on low-level visual features. This paper improves AndAR by introducing the planar natural features. The key of registration based on planar natural features is to get the homography matrix which can be calculated with more than 4 pairs of matching feature points, so a 3D registration method based on ORB and optical flow is proposed in this paper. ORB is used for feature point matching and RANSAC is used to choose good matches, called inliers, from all the matches. When the ratio of inliers is more than 50% in some video frame, inliers tracking based on optical flow is used to calculate the homography matrix in the latter frames and when the number of inliers successfully tracked is less than 4, then it goes back to ORB feature point matching again. The result shows that the improved AndAR can augment not only reality based on markers but also reality based on planar natural features in near real time and the hybrid approach can not only improve speed but also extend the usable tracking range. Ó 2015 Elsevier Inc. All rights reserved.
1. Introduction Augmented Reality (AR) is an important branch of Virtual Reality (VR). It integrates virtual digital information into a 3D real environment in real time. With the substantial increase in performance and penetration of smartphones, the researchers concentrate on Mobile Augmented Reality (MAR). The 3D registration technology is the main difficulty for AR or MAR and its performance directly affects the performance of an AR system. For MAR, the 3D registration is the tracking of the position and pose of the smartphone in the real scene in real time and so the virtual scene can be inserted seamlessly into the real world in real time using the information of position and pose. The 3D registration of MAR is mainly based on computer vision. Similar to AR, the 3D registration technology of MAR has undergone the change from markers based [1] to natural features based [2–4]. This change is determined by the improvement in the level of hardware and the demand for using AR outdoor. Currently, 3D registration technology based on natural features is a hot spot and the direction of development in the future. The natural
q
This paper has been recommended for acceptance by Luming Zhang.
⇑ Corresponding author.
E-mail address:
[email protected] (P. Chen).
features based 3D registration methods of MAR are derived from AR, but some optimizations must be done according to the characteristics of weak computing power of smartphones. In addition, since smartphones have GPS and various sensors which can provide information about the position and pose of cameras, it can not only achieve 3D registration based on GPS and sensors to create some MAR applications for navigation [5], but also combine computer vision with sensors to improve the speed and accuracy of 3D registration [6]. The latest trend is to combine MAR with cloud computing [7,8]. If it is developed, the application of MAR will be freer and more practical. Of course, the quickest way to develop a MAR application is to use the AR Development Kit. Currently, the common AR Development Kit on the Android platform includes metaio AR, Vuforia, Dfusion and AndAR. This paper will introduce transplants from ARToolKit [9]. Though the 3D registration method of AndAR is still based on markers, the biggest advantage of AndAR is that it is open source compared to others. So AndAR is suitable for theoretical research on registration algorithm, improvement, and development of depth customization MAR application. The remainder of paper is organized as follows. Section 2 analyzes the architecture and workflow of AndAR. The standard development process is shown in Section 3 and the improvement for AndAR is shown in Section 4. Section 5 presents the results of improved AndAR. Section 6 describes the future work.
http://dx.doi.org/10.1016/j.jvcir.2015.06.016 1047-3203/Ó 2015 Elsevier Inc. All rights reserved.
Please cite this article in press as: P. Chen et al., An improved augmented reality system based on AndAR, J. Vis. Commun. (2015), http://dx.doi.org/ 10.1016/j.jvcir.2015.06.016
2
P. Chen et al. / J. Vis. Commun. Image R. xxx (2015) xxx–xxx
2. Architecture and workflow AndAR is a project that enables AR on the Android platform and is an AR Framework for Android. It not only offers a pure Java API but is also object-oriented. The developers can find applications of MAR conveniently with AndAR. 2.1. Architecture AndAR consists of three modules as shown in Fig. 1. Camera Java API: image acquisition module. Class Preview extends from class SurfaceView and implements SurfaceHodler. Callback; Class CameraPreviewHandler implements interface PreviewCallback, input parameter data in public void onPreviewFrame(byte[] data, Cameracamera){} is the real time frame data which is needed in target detection and tracking module below. This module mainly deals with acquisition of real time image frame. ARToolKit Java API: target detection and tracking module. Java Native Interface (JNI) is used to reflect related kernel functions of target detection and tracking in C language version ARToolKit into java functions. These functions are encapsulated in the Java class ARToolkit. This module mainly detects and recognizes markers and calculates transformation matrix needed by 3D registration. OpenGL Java API: Render and display module. This module is based on OpenGL ES 1.0 graphics library and mainly completes rendering virtual scene in the real time frame according to the transformation matrix.AndAR is an open source software, and developers can revise it to satisfy their needs. WE will employ this advantage to improve AndAR later. 2.2. The workflow One of the difficulties in developing AR or MAR applications is tracking the user’s view point. AndAR uses computer vision algorithms to calculate camera’s position and pose relative to physical markers, namely the camera extrinsic matrix in real time. According to the camera imaging model, the imaging process can be described as the transformation process among the world coordinate, the camera coordinate and the image coordinate. As is shown in Eq. (1), in which K known as the camera intrinsic matrix is only concerned with the internal structure of the camera which can be determined by camera calibration and it is considered to be constant. R and t of matrix Tcw are the position and pose of the camera, known as the camera extrinsic matrix.
2 3 2 u a 0 6 7 6 Zc 4 v 5 ¼ 4 0 a 1 0 0
u
3
v 75½ R 1
3 3 3 2 2 Xw Xw Xw 6Y 7 6Y 7 6Y 7 6 w7 6 w7 6 w7 t 6 7 ¼ KTcw 6 7 ¼ P6 7 4 Zw 5 4 Zw 5 4 Zw 5 2
1
Fig. 1. Architecture of AndAR.
1
1
ð1Þ
AndAR calculates the camera extrinsic matrix based on markers to complete the overlay of virtual imagery on the real world (3D registration). The advantage of the markers is that their shape and gray contrast sharply with the surrounding environment, so it is easy to be detected. The workflow of AndAR is designed based on the feature of the markers. Firstly, a color frame image is converted to a binary image according to a fixed gray threshold. Then connected components in the binary image are labeled and filtered as candidate areas according to the heuristic rules. Next, each candidate area matches with the standard templates in memory, so which marker being in the image can be known. The final step is to calculate the camera extrinsic matrix by the deformation of the marker’s border and add the virtual scene to the developers can develop application the real time image frame according to the camera extrinsic matrix.
3. Standard development process Whether the 3D registration is based on markers or natural features, the application runs on PC or mobile terminal, standard development process of a complete AR system includes the following steps. (1) Initialize the camera and read in the relevant configuration files and the standard templates. (2) Grab a frame from the real time video stream. (3) Detect and recognize the target in the video frame image, and then calculate the camera extrinsic matrix. (4) Render virtual scene and align it with markers or natural features by using the camera extrinsic matrix. (5) Close the video stream, and disconnect the camera. The process from 2–4 continues to cycle again and again. As mentioned previously, AndAR is an AR framework for Android and this framework achieves most of the development process above. In order to create a simple AR application based on AndAR, as shown in Fig. 2, we need three steps according to the object-oriented programming. (1) Create a class extended from abstract class AndARActivity, for example CustomActivity. AndARActivity is extended from Activity which is one of the basic modules of an Android application, generally on behalf of a screen of phone. (2) Create a class extended from abstract class ARObject, for example CustomObject, and override the draw method. ARObject stands for the standard template whose construction method includes information about the standard template such as name and size and membership method draw defines virtual scenes, everything drawn in the method draw will be drawn directly onto the marker. (3) Create an instance of CustomObject in CustomActivity, and call method registerARObject (ARObject) to add the standard templates and define the virtual scenes. In addition, the standard template files and the camera intrinsic matrix file need to be put into file folder assets. When the application is running, the standard template files will be transferred to the private folder of the application in the smartphone so that the files can be read in by native C/C++ codes. In this sample, the virtual model is just a cube in blue. AndAR project also provides a way to use obj models. This format can be exported by most 3D modeling software, for example, 3DS MAX. The data of 3D model in obj file should be loaded into application’s memory firstly, and this might take a while. So it should be
Please cite this article in press as: P. Chen et al., An improved augmented reality system based on AndAR, J. Vis. Commun. (2015), http://dx.doi.org/ 10.1016/j.jvcir.2015.06.016
3
P. Chen et al. / J. Vis. Commun. Image R. xxx (2015) xxx–xxx
Fig. 2. The development process.
completed in a background thread. What is more, the camera cannot be opened before the model being loaded completely. In class AndARActivity, the logic variable startPreviewRightAway is used to control when to open the camera. We can set it false to stop from opening the camera. And the thread should be put in the surfaceCreated which is the callback function of GLSurfaceView considering that the data from the camera is displayed on the surface. So the sequence is, a surface being created, then a model being loaded and finally the camera being opened. Some non-AR information can be also displayed at the fixed position of the screen, for example showing FPS or the camera extrinsic matrix at the upper left corner of the screen. In order to do this, a class is defined, for example CustomRenderer, extending from OpenGLRenderer and overriding the function draw(). Now you can display anything you want based on OpenGL. 4. Registration based on natural features As previously mentioned, AndAR is based on markers, and the 3D registration technology based on natural features is more practical. In this section, AndAR is improved by introducing 3D registration based on natural features into it according to the architecture of AndAR and 3D registration technology based on natural features. 4.1. How to get the camera extrinsic matrix As shown in Fig. 3, there are 3D coordinate system x1x2x3 and two arbitrary planes. The first plane can be defined by point b0
and two linearly independent vectors b1, b2 contained in the plane. A point X2 in the plane can be written as formula (2).
0 X2 ¼ q1 b1 þ q2 b2 þ b0 ¼ ð b1
1
B C b0 Þ@ q2 A ¼ Bq 1
ð2Þ
where B = (b1 b2 b0) 2 R33 define the plane ‘‘B’’; q = (q1 q2 1)T define the coordinate of point X2 based on basis (b1 b2). We can write a similar identity for the second plane.
0 X1 ¼ p1 a1 þ p2 a2 þ a0 ¼ ð a1
a2
p1
1
B C a0 Þ@ p2 A ¼ Ap
ð3Þ
1 where A = (a1 a2 a0) 2 R33 define the plane ‘‘A’’; p = (p1 p2 1)T define the coordinate of point X1 based on basis (a1 a2). If point X1 is constrained to the perspective projection centered at the origin of point X2, so
X1 ¼ aðqÞX2
ð4Þ
where a(q) is a scale factor depends on X2, and consequently on q. By combining the formula 2–4 with the constraint that each of the two points must be situated in its corresponding plane, the relationship between the 2D coordinates of these points is obtained.
p ¼ aðqÞA1 Bq
ð5Þ
Since the matrix A is invertible and the two vectors p and q have a unit third coordinate, we can get rid of this nonlinearity by moving to homogeneous coordinates:
ph ¼ Hqh
Fig. 3. The 2D homography.
b2
q1
ð6Þ
where ph and qh are homogeneous 3D vectors. H 2 R33 is called the homography matrix and has 8 degrees of freedom. The mapping defined by (6) is called the 2D homography. The natural features in this paper means planar textured targets such as book covers, shop-signs, or advertisements. If we regard a planar textured target as the plane ‘‘B’’ in Fig. 3, and the imaging plane of the camera as the plane ‘‘A’’, then the mapping from the planar textured target to the imaging plane can be called the 2D homography.
Please cite this article in press as: P. Chen et al., An improved augmented reality system based on AndAR, J. Vis. Commun. (2015), http://dx.doi.org/ 10.1016/j.jvcir.2015.06.016
4
P. Chen et al. / J. Vis. Commun. Image R. xxx (2015) xxx–xxx
What is more, if the target plane coincides with Z = 0 plane of the world coordinate, then the Zw in formula (1) is zero in this case, so it can be described as formula (7).
2 3 2 3 Xw u 6Y 7 6 w7 6 7 7 ¼ kC½ R1 4 v 5 ¼ kCTcw 6 4 0 5 1 1
2 R2
Xw
3
2
Xw
3
6 7 6 7 T 4 Y w 5 ¼ H4 Y w 5 1
ð7Þ
1
where k = 1/Zw. Therefore, 3D registration based on planar natural features is often based on the homography matrix that can be obtained by the use of homography relationship between two planes. The homography matrix is the dot product of the camera intrinsic matrix and the camera extrinsic matrix. While the camera intrinsic matrix can be obtained by camera calibration, the camera extrinsic matrix can be obtained by matrix decomposition. To get the homography matrix, at least four pairs of correct matching feature points need to be known. Feature point matching can be divided into feature detection and descriptor matching. 4.2. ORB feature detection and descriptor matching Currently, common feature point matching algorithms include SIFT [10], SURF [11], BRIEF [12], ORB [13]. The key factor affecting speed and accuracy of feature point matching is feature detection. There are two kinds of feature detection: descriptor based on absolute value and descriptor based on comparison. SIFT and SURF which stand for descriptor based on absolute value generally quantify gray or gradient to get a histogram, then construct descriptor based on histogram. This kind of descriptor has a high discrimination, but complex computation and low efficiency are unfavorable for MAR. Compared with SIFT and SURF, BRIEF and ORB based on comparison have great advantage. They construct the descriptor by comparing the characteristic value which is pre-trained or belongs to random points. This kind of descriptor is designed to improve speed. As can be seen from [12] and [13], it takes respectively 5228.7 (ms), 217.3 (ms), 8.87 (ms), and 15.3 (ms) for SIFT, SURF, BRIEF-32 and ORB to process one frame. Compared with ORB, BRIEF is faster but it do not have the rotation invariance. When people use the smartphone, the change of rotation and position is relatively frequent. So BRIEF is not suitable for feature point matching of MAR. ORB is proposed based on BRIEF. It has rotation invariance because of adding a direction variance to each descriptor. So this paper chooses ORB descriptor. Descriptor matching needs some matching cost function or distance function to evaluate the degree of similarity between matching features. The major cost functions are Sum of Absolute Difference (SAD) function, Sum of Squared Difference (SSD) function, and Normalized Cross Correlation (NCC) function. In addition, the major distance functions include Euclidean Distance, Manhattan Distance, Hamming Distance, Correlation Distance and Hausdorff Distance. A good matching evaluation function can not only improve the matching accuracy, but also reduce the sensitivity to image distortion and noise. Choosing which evaluation function is often determined by the feature space. SIFT descriptor and SURF descriptor are represented by feature vector, Euclidean Distance can be used to determine the degree of difference between two feature descriptors. BRIEF descriptor and ORB descriptor are represented by binary bit string. Hamming distance can be used to measure the degree of similarity between two ORB descriptors K1 and K2.
DðK 1 ; K 2 Þ ¼
N1 X xi yi i¼0
ð8Þ
where N stands for the number of bit of ORB descriptor. xi and yi are respectively the value of i-th bit of K1 and K2. represents XOR operation. The smaller the D(K1, K2) is, the higher the similarity is. On the contrary, the similarity is low. XOR operation is efficient, so using binary descriptor can greatly improve the matching efficiency. Bad matches still exist after feature matching. In order to calculate the homography matrix, a suitable number of good matches called inliers are needed to be chosen from all the matches. In order to do this, the EPnP [14] technique is used inside a RANSAC. The maximum number of iterations of RANSAC is determined empirically. The number of outliers climbs to 50%, so the maximum number of iterations is calculated based on 50% outlier ratio. In this paper, the desired probability of selecting four good inliers is set to 99%, so the number is set to 72 using Table A1 in Appendix A. In order to verify the effect of ORB with RANSAC in calculating the homography matrix, three experiments based on OpenCV are performed. Only a part of all the matching points with smaller distance are used to calculate the homography matrix with RANSAC and the object’s position in the scene image is drawn in blue based on reprojection with the homography matrix. The results are shown in Fig. 4 and Table 1.
(a) SIFT descriptor, matching with L2
(b) SURF descriptor, matching with L2
(c) ORB descriptor, matching with Hamming distance Fig. 4. The results of feature point matching based on ORB descriptor compared with SIFT and SURF.
Please cite this article in press as: P. Chen et al., An improved augmented reality system based on AndAR, J. Vis. Commun. (2015), http://dx.doi.org/ 10.1016/j.jvcir.2015.06.016
P. Chen et al. / J. Vis. Commun. Image R. xxx (2015) xxx–xxx
5
Table 1 The number of matches in each stage based on orb compared with sift and surf.
No. of initial matches No. of good matches Matches with RANSAC
SIFT
SURF
ORB
604 36 32
997 61 39
454 25 23
As can be seen from Fig. 4, the book’s position is drawn accurately based on ORB feature matching as SURF and SIFT, which means the ratio of inliers is high enough to calculate the right homography matrix. And this can be verified by the result in Table 1.
4.3. Inliers tracking based on optic flow There is a main shortcoming in position and pose estimation using feature point matching: The different feature points are detected in each frame which means the set of inliers is always different. So the displayed augmentation will jitter all the time and this reduces the realism which AR hopes to bring people. To combat this, we use an inliers tracking method based on optical flow [14]. Optical flow is the pattern of apparent motion of objects, surfaces, and edges in a visual scene caused by the relative motion between an observer (an eye or a camera) and the scene. Tracking based on optical flow mainly researches the relationship between the change of image’s gray and object’s motion. Lucas–Kanade (LK) optical flow method was proposed by Lucas and Kanade in 1981. It has three assumptions. (1) Brightness constancy. (2) Temporal persistence or small movements. (3) Spatial coherence. Lucas–Kanade optical flow assumes small and coherent motion, but in fact large and non-coherent motion is common. To combat this problem, an image pyramid is often combined with Lucas– Kanade optical flow. This is known as pyramid Lucas–Kanade optical flow. In order to verify the performance of pyramid Lucas–Kanade optical flow, an experiment is done based on OpenCV. When the function detectFeaturePoint is used to detect corners, we get the results as Fig. 5.
4.4. The complete workflow of registration The complete workflow of registration based on ORB descriptor and Lucas–Kanade optical flow are shown in Fig. 6. Firstly, ORB descriptor is extracted from the template frame image and kept in a matrix. When a new video frame comes, ORB descriptor is extracted from it. For each feature point in the template frame image, matching feature point is found in the video frame by calculating their descriptors’ Hamming distance. Then, the homography matrix can be calculated by the set of matching feature point with RANSAC. Since the camera intrinsic matrix is known, the camera extrinsic matrix can be got through matrix decomposition. If the ratio of inliers is more than 50% in some video frame, then tracking based on Lucas–Kanade optical flow is adopted in the latter frames. With time goes, the number of successful tracking feature points gradually decreases. When the number is less than 4, then it goes back to feature point matching based on ORB descriptor.
Fig. 5. Optical flow from pyramid Lucas–Kanade: the center image is one video frame image after the top image; the bottom image shows the computed motion.
4.5. Implementation in AndAR Feature point matching is the key to solve 3D registration based on planar natural features. In order to improve development efficiency, we choose to do with OpenCV. OpenCV is an open source computer vision library, and it implements many common algorithms of image processing and computer vision. OpenCV provides two kind of interface, C++ and Java, the paper chooses C++ interface in order to maintain consistency with the architecture of AndAR and current version Java interface of OpenCV supports feature point matching imperfectly. Some modifications are needed when we want to use OpenCV library in AndAR. AndAR takes advantage of JNI to enable Java code to call native library ARToolKit written in C. In AndAR, file arToolKit.c completes data type exchange between Java and C. OpenCV provides C++ interface, so we should modify the arToolKit.c file as follows:
Please cite this article in press as: P. Chen et al., An improved augmented reality system based on AndAR, J. Vis. Commun. (2015), http://dx.doi.org/ 10.1016/j.jvcir.2015.06.016
6
P. Chen et al. / J. Vis. Commun. Image R. xxx (2015) xxx–xxx
Video frame
The software environment is as follows: (1) (2) (3) (4)
Track?
adt-bundle-windows-x86-20130729 android-ndk-r9 AndAR OpenCV-2.4.6-android-sdk
Extract ORB descriptor Hamming distance matching
Inliers Tracking based on optical flow
ORB descriptor of template image
Calculate the homography matrix with RANSAC
The camera intrinsic matrix
The camera extrinsic matrix
The result is shown in Fig. 7 and Table 2. The template image is a photo of package of Intel core i3. As can be seen, there is only the real scene in the video without any augmented reality algorithm in Fig. 7(a). In Fig. 7(b), the native AndAR can only augment reality based on markers, so you can see a three-dimensional virtual flowerpot is put on the ‘‘Hiro’’ marker. In Fig. 7(c), the improved AndAR can not only augment reality based on markers, but also on natural features, so you can see that a three-dimensional virtual flowerpot is put on the ‘‘Hiro’’ marker,
Fig. 6. The workflow to calculate the camera extrinsic matrix.
(1) Change arToolKit.c into arToolKit.cpp. This modification does not affect its call to C functions in ARToolKit library, and we can now call the C++ functions in OpenCV library directly. (2) Change the JNIEnv pointer’s way of calling functions. For JNIEnv ⁄env, in C: (⁄env) ? NewStringUTF(env, ‘‘Hello From JNI!’’), but in C++: (⁄env) ? NewStringUTF(‘‘Hello from JNI!’’). Now, the most important problem of how to calculate the camera extrinsic matrix is solved, but not all. There are some minutiae to consider including how to read in the standard template images, and how to judge which detection and tracking algorithm is used at one running, based on markers or based on natural features. To do this, we mainly need to do three things below. (1) Function addObject. For markers based, the ID of standard template is stored in the text standard template file. The ID is a non-negative integer and application can grab the ID by reading in the file. If the standard template is a scene image, it does not include any IDs. To solve this problem, we use its suffix of filename to determine whether the standard template is an image or not. If it is an image, then it is distributed a negative as its ID according to the order of adding. (2) Add a member variable, whose type is character array, to the structure Object. It is used to record storage path of the standard template. It is initialed in function addObject so that it can be visited in function detectmarkers. (3) Add a judgment in while cycle of function detectmarkers. If the ID is non-negative, then execute detection and tracking based on markers; if the ID is negative, and then execute detection and tracking based on natural features. The resolution of image has a great impact on accuracy and computational complexity of feature point matching. This paper set the resolution of preview frame to 640 480. This resolution takes accuracy and computational complexity into account.
Fig. 7. The result of improved AndAR compared with none and native AndAR.
Table 2 The real time of the improved andar.
5. Application and results Our application runs on the HUAWEI Honor 3C (operation system: Android 4.2.2, CPU: MTK6582M, CPU frequency: 1.3 GHz, GPU: Mali-400MP, RAM capacity: 2 GB).
Algorithm
Time (deal with 300 frame) (s)
Frame rate (fps)
None ORB (RANSAC) ORB (RANSAC) with LK
15.715 136.103 23.580
19.1 2.2 12.7
Please cite this article in press as: P. Chen et al., An improved augmented reality system based on AndAR, J. Vis. Commun. (2015), http://dx.doi.org/ 10.1016/j.jvcir.2015.06.016
P. Chen et al. / J. Vis. Commun. Image R. xxx (2015) xxx–xxx Table A1 The number of iterations given sample size S and outlier ratio £. Sample size
2 3 4
References
Outlier ratio £ 5%
10%
20%
25%
30%
40%
50%
2 3 3
3 4 5
5 7 9
6 9 13
7 11 17
11 19 34
17 35 72
as well as a three- dimensional virtual superman stands on the package. Table 2 is the result of real-time test. When there is no algorithm, the frame rate of system is 19.1 fps, and when only ORB is used, the frame rate of system is only 2.2 fps. Combined with LK, the frame rate is increased to 12.7 fps and this basically meets the real-time requirement. 6. Conclusion and future work This paper analyzes the architecture of AndAR and introduces how to develop AR application based on AndAR. Considering that 3D registration based on natural features is a hot spot, this paper improves AndAR by adding 3D registration based on natural features into it. Our registration method is mainly based on ORB descriptor. The camera extrinsic matrix is calculated by ORB feature point matching. Lucas–Kanade optical flow is proposed to track the inliers, and this can stop augmentation from jittering. The result shows that the improved AndAR is not only based on markers, but also based on natural features. The future work will concentrate on reducing the time complexity of the algorithm and interactive design. In future work, we use the methods [15–20] proposed in recently works to further improve the performance of our current system. Acknowledgment The research work was supported by National Natural Science Foundation of China under Grant No. 61272236. Appendix A. Ransac iteration
7
[1] H. Kato, M. Billinghurst, Marker tracking and hmd calibration for a video-based augmented reality conferencing system, in: 2nd IEEE and ACM International Workshop on Augmented Reality, 1999, pp. 85–94. [2] A. Ufkes, M. Fiala, A markerless augmented reality system for mobile devices, IEEE Int. Conf. Comput. Robot Vis. 2013 (2013) 226–233. [3] T. Guan, Y. He, J. Gao, J. Yang, J. Yu, On-device mobile visual location recognition by integrating vision and inertial sensors, IEEE Trans. Multimedia 15 (7) (2013) 1688–1699. [4] R. Ji, L.Y. Duan, J. Chen, H. Yao, J. Yuan, Y. Rui, W. Gao, Location discriminative vocabulary coding for mobile landmark search, Int. J. Comput. Vis. 96 (3) (2012) 290–314. [5] T. Guan, Y.F. He, L.Y. Duan, J.Q. Yu, Efficient BOF generation and compression for on-device mobile visual location recognition, IEEE Multimedia 21 (2) (2014) 32–41. [6] Y.W. Luo, B.C. Wei, T. Guan, K. Yan, Y. Yan, Research on fast feature match method for mobile devices, Appl. Res. Comput. 30 (2) (2013) 591–594. [7] B. Huang, C.H. Lin, C. Lee, Mobile augmented reality based on cloud computing, in: Anti-Counterfeiting, 2012 International Conference on Security and Identification, 2012, pp. 1–5. [8] Rongrong Ji, Ling-Yu Duan, Hongxun Yao, Lexing Xie, Learning to distribute vocabulary indexing for scalable visual search, IEEE Trans. Multimedia (2013) 153–166. [9] http://www.hitl.washington.edu/artoolkit/, 2003. [10] D.G. Lowe, Distinctive image features from scale-invariant keypoints, Int. J. Comput. Vis. 60 (2004) 91–110. [11] B. Herbert, T. Tuytelaars, L.V. Gool, Surf: speeded up robust features, Computer Vision – ECCV, vol. 3951, 2006, pp. 404–417. [12] M. Calonder, V. Lepetit, C. Strecha, P. Fua, Brief: binary robust independent elementary features, Computer Vision – ECCV 2010, vol. 6314, 2010, pp. 778– 792. [13] E. Rublee, V. Rabaud, K. Konolige, G. Bradsk, ORB: an efficient alternative to SIFT or SURF, in: International Conference on Computer Vision (ICCV), 2011, pp. 2564–2571. [14] L. Vincent, F. Moreno-Noguer, P. Fua, Epnp: an accurate o (n) solution to the pnp problem, Int. J. Comput. Vis. 81 (2) (2009) 155–166. [15] Benchang Wei, Tao Guan, Junqing Yu, Projected residual vector quantization for ANN search, IEEE Multimedia 21 (3) (2014) 41–51. [16] R. Ji, H. Yao, W. Liu, X. Sun, Q. Tian, Task-dependent visual-codebook compression, IEEE Trans. Image Process. 21 (4) (2012) 2282–2293. [17] R. Ji, Y. Gao, B. Zhong, H. Yao, Q. Tian, Mining flickr landmarks by modeling reconstruction sparsity, ACM Trans. Multimedia Comput. Commun. Appl. (2011) 1–22. [18] Luming Zhang, Yue Gao, Yingjie Xia, Ke Lu, Jialie Shen, Rongrong Ji, Representative discovery of structure cues for weakly-supervised image segmentation, IEEE Trans. Multimedia 16 (2) (2014) 470–479. [19] Luming Zhang, Mingli Song, Xiao Liu, Li Sun, Chun Chen, Jiajun Bu, Recognizing architecture styles by hierarchical sparse coding of blocklets, Inf. Sci. 254 (2014) 141–154. [20] Luming Zhang, Yi Yang, Yue Gao, Yi Yu, Changbo Wang, Xuelong Li, A probabilistic associative model for segmenting weakly supervised images, IEEE Trans. Image Process. 23 (9) (2014) 4150–4159.
See Table A1.
Please cite this article in press as: P. Chen et al., An improved augmented reality system based on AndAR, J. Vis. Commun. (2015), http://dx.doi.org/ 10.1016/j.jvcir.2015.06.016