[root@master bin]# ./spark-shell --master spark://master:7077 15/10/24 04:39:42 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 15/10/24 04:39:42 INFO spark.SecurityManager: Changing view acls to: root 15/10/24 04:39:42 INFO spark.SecurityManager: Changing modify acls to: root 15/10/24 04:39:42 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); users with modify permissions: Set(root) 15/10/24 04:39:42 INFO spark.HttpServer: Starting HTTP Server 15/10/24 04:39:42 INFO server.Server: jetty-8.y.z-SNAPSHOT 15/10/24 04:39:42 INFO server.AbstractConnector: Started SocketConnector@0.0.0.0:50349 15/10/24 04:39:42 INFO util.Utils: Successfully started service 'HTTP class server' on port 50349. Welcome to ____ __ / __/__ ___ _____/ /__ _\ \/ _ \/ _ `/ __/ '_/ /___/ .__/\_,_/_/ /_/\_\ version 1.4.1 /_/ Using Scala version 2.10.4 (Java HotSpot(TM) 64-Bit Server VM, Java 1.7.0_80) Type in expressions to have them evaluated. Type :help for more information. 15/10/24 04:39:45 INFO spark.SparkContext: Running Spark version 1.4.1 15/10/24 04:39:45 INFO spark.SecurityManager: Changing view acls to: root 15/10/24 04:39:45 INFO spark.SecurityManager: Changing modify acls to: root 15/10/24 04:39:45 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); users with modify permissions: Set(root) 15/10/24 04:39:46 INFO slf4j.Slf4jLogger: Slf4jLogger started 15/10/24 04:39:46 INFO Remoting: Starting remoting 15/10/24 04:39:46 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriver@172.17.0.9:38355] 15/10/24 04:39:46 INFO util.Utils: Successfully started service 'sparkDriver' on port 38355. 15/10/24 04:39:46 INFO spark.SparkEnv: Registering MapOutputTracker 15/10/24 04:39:46 INFO spark.SparkEnv: Registering BlockManagerMaster 15/10/24 04:39:46 INFO storage.DiskBlockManager: Created local directory at /tmp/spark-80c40236-8018-47f1-805d-67ca798d85c2/blockmgr-4b14bf30-a562-44f7-9819-13703d71d567 15/10/24 04:39:46 INFO storage.MemoryStore: MemoryStore started with capacity 265.4 MB 15/10/24 04:39:46 INFO spark.HttpFileServer: HTTP File server directory is /tmp/spark-80c40236-8018-47f1-805d-67ca798d85c2/httpd-9e31fee6-b8ad-4d4d-9fc7-8beb8cb4ae68 15/10/24 04:39:46 INFO spark.HttpServer: Starting HTTP Server 15/10/24 04:39:46 INFO server.Server: jetty-8.y.z-SNAPSHOT 15/10/24 04:39:46 INFO server.AbstractConnector: Started SocketConnector@0.0.0.0:41112 15/10/24 04:39:46 INFO util.Utils: Successfully started service 'HTTP file server' on port 41112. 15/10/24 04:39:46 INFO spark.SparkEnv: Registering OutputCommitCoordinator 15/10/24 04:39:46 INFO server.Server: jetty-8.y.z-SNAPSHOT 15/10/24 04:39:46 INFO server.AbstractConnector: Started SelectChannelConnector@0.0.0.0:4040 15/10/24 04:39:46 INFO util.Utils: Successfully started service 'SparkUI' on port 4040. 15/10/24 04:39:46 INFO ui.SparkUI: Started SparkUI at http://172.17.0.9:4040 15/10/24 04:39:46 INFO client.AppClient$ClientActor: Connecting to master akka.tcp://sparkMaster@master:7077/user/Master... 15/10/24 04:39:46 INFO cluster.SparkDeploySchedulerBackend: Connected to Spark cluster with app ID app-20151024043946-0001 15/10/24 04:39:46 INFO client.AppClient$ClientActor: Executor added: app-20151024043946-0001/0 on worker-20151024043600-172.17.0.11-7078 (172.17.0.11:7078) with 32 cores 15/10/24 04:39:46 INFO cluster.SparkDeploySchedulerBackend: Granted executor ID app-20151024043946-0001/0 on hostPort 172.17.0.11:7078 with 32 cores, 512.0 MB RAM 15/10/24 04:39:46 INFO client.AppClient$ClientActor: Executor added: app-20151024043946-0001/1 on worker-20151024043559-172.17.0.9-7078 (172.17.0.9:7078) with 32 cores 15/10/24 04:39:46 INFO cluster.SparkDeploySchedulerBackend: Granted executor ID app-20151024043946-0001/1 on hostPort 172.17.0.9:7078 with 32 cores, 512.0 MB RAM 15/10/24 04:39:46 INFO client.AppClient$ClientActor: Executor added: app-20151024043946-0001/2 on worker-20151024043600-172.17.0.10-7078 (172.17.0.10:7078) with 32 cores 15/10/24 04:39:46 INFO cluster.SparkDeploySchedulerBackend: Granted executor ID app-20151024043946-0001/2 on hostPort 172.17.0.10:7078 with 32 cores, 512.0 MB RAM 15/10/24 04:39:46 INFO client.AppClient$ClientActor: Executor added: app-20151024043946-0001/3 on worker-20151024043600-172.17.0.12-7078 (172.17.0.12:7078) with 32 cores 15/10/24 04:39:46 INFO cluster.SparkDeploySchedulerBackend: Granted executor ID app-20151024043946-0001/3 on hostPort 172.17.0.12:7078 with 32 cores, 512.0 MB RAM 15/10/24 04:39:46 INFO client.AppClient$ClientActor: Executor updated: app-20151024043946-0001/1 is now LOADING 15/10/24 04:39:46 INFO client.AppClient$ClientActor: Executor updated: app-20151024043946-0001/0 is now LOADING 15/10/24 04:39:46 INFO client.AppClient$ClientActor: Executor updated: app-20151024043946-0001/3 is now LOADING 15/10/24 04:39:46 INFO client.AppClient$ClientActor: Executor updated: app-20151024043946-0001/2 is now LOADING 15/10/24 04:39:46 INFO client.AppClient$ClientActor: Executor updated: app-20151024043946-0001/0 is now RUNNING 15/10/24 04:39:46 INFO client.AppClient$ClientActor: Executor updated: app-20151024043946-0001/1 is now RUNNING 15/10/24 04:39:46 INFO client.AppClient$ClientActor: Executor updated: app-20151024043946-0001/2 is now RUNNING 15/10/24 04:39:46 INFO client.AppClient$ClientActor: Executor updated: app-20151024043946-0001/3 is now RUNNING 15/10/24 04:39:47 INFO util.Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 34576. 15/10/24 04:39:47 INFO netty.NettyBlockTransferService: Server created on 34576 15/10/24 04:39:47 INFO storage.BlockManagerMaster: Trying to register BlockManager 15/10/24 04:39:47 INFO storage.BlockManagerMasterEndpoint: Registering block manager 172.17.0.9:34576 with 265.4 MB RAM, BlockManagerId(driver, 172.17.0.9, 34576) 15/10/24 04:39:47 INFO storage.BlockManagerMaster: Registered BlockManager 15/10/24 04:39:47 INFO cluster.SparkDeploySchedulerBackend: SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.0 15/10/24 04:39:47 INFO repl.SparkILoop: Created spark context.. Spark context available as sc. 15/10/24 04:39:47 INFO hive.HiveContext: Initializing execution hive, version 0.13.1 15/10/24 04:39:47 INFO metastore.HiveMetaStore: 0: Opening raw store with implemenation class:org.apache.hadoop.hive.metastore.ObjectStore 15/10/24 04:39:47 INFO metastore.ObjectStore: ObjectStore, initialize called 15/10/24 04:39:48 WARN DataNucleus.General: Plugin (Bundle) "org.datanucleus" is already registered. Ensure you dont have multiple JAR versions of the same plugin in the classpath. The URL "file:/usr/local/spark-1.4.1/lib/datanucleus-core-3.2.10.jar" is already registered, and you are trying to register an identical plugin located at URL "file:/usr/local/spark/lib/datanucleus-core-3.2.10.jar." 15/10/24 04:39:48 WARN DataNucleus.General: Plugin (Bundle) "org.datanucleus.api.jdo" is already registered. Ensure you dont have multiple JAR versions of the same plugin in the classpath. The URL "file:/usr/local/spark-1.4.1/lib/datanucleus-api-jdo-3.2.6.jar" is already registered, and you are trying to register an identical plugin located at URL "file:/usr/local/spark/lib/datanucleus-api-jdo-3.2.6.jar." 15/10/24 04:39:48 WARN DataNucleus.General: Plugin (Bundle) "org.datanucleus.store.rdbms" is already registered. Ensure you dont have multiple JAR versions of the same plugin in the classpath. The URL "file:/usr/local/spark-1.4.1/lib/datanucleus-rdbms-3.2.9.jar" is already registered, and you are trying to register an identical plugin located at URL "file:/usr/local/spark/lib/datanucleus-rdbms-3.2.9.jar." 15/10/24 04:39:48 INFO DataNucleus.Persistence: Property datanucleus.cache.level2 unknown - will be ignored 15/10/24 04:39:48 INFO DataNucleus.Persistence: Property hive.metastore.integral.jdo.pushdown unknown - will be ignored 15/10/24 04:39:48 WARN DataNucleus.Connection: BoneCP specified but not present in CLASSPATH (or one of dependencies) 15/10/24 04:39:48 WARN DataNucleus.Connection: BoneCP specified but not present in CLASSPATH (or one of dependencies) 15/10/24 04:39:49 INFO cluster.SparkDeploySchedulerBackend: Registered executor: AkkaRpcEndpointRef(Actor[akka.tcp://sparkExecutor@172.17.0.12:58541/user/Executor#-446719170]) with ID 3 15/10/24 04:39:49 INFO cluster.SparkDeploySchedulerBackend: Registered executor: AkkaRpcEndpointRef(Actor[akka.tcp://sparkExecutor@172.17.0.10:51948/user/Executor#-1365413710]) with ID 2 15/10/24 04:39:49 INFO cluster.SparkDeploySchedulerBackend: Registered executor: AkkaRpcEndpointRef(Actor[akka.tcp://sparkExecutor@172.17.0.9:39912/user/Executor#-479102452]) with ID 1 15/10/24 04:39:49 INFO cluster.SparkDeploySchedulerBackend: Registered executor: AkkaRpcEndpointRef(Actor[akka.tcp://sparkExecutor@172.17.0.11:42322/user/Executor#-1407719536]) with ID 0 15/10/24 04:39:49 INFO storage.BlockManagerMasterEndpoint: Registering block manager 172.17.0.10:53506 with 265.4 MB RAM, BlockManagerId(2, 172.17.0.10, 53506) 15/10/24 04:39:49 INFO storage.BlockManagerMasterEndpoint: Registering block manager 172.17.0.12:36521 with 265.4 MB RAM, BlockManagerId(3, 172.17.0.12, 36521) 15/10/24 04:39:49 INFO storage.BlockManagerMasterEndpoint: Registering block manager 172.17.0.9:38735 with 265.4 MB RAM, BlockManagerId(1, 172.17.0.9, 38735) 15/10/24 04:39:49 INFO storage.BlockManagerMasterEndpoint: Registering block manager 172.17.0.11:47951 with 265.4 MB RAM, BlockManagerId(0, 172.17.0.11, 47951) 15/10/24 04:39:57 INFO metastore.ObjectStore: Setting MetaStore object pin classes with hive.metastore.cache.pinobjtypes="Table,StorageDescriptor,SerDeInfo,Partition,Database,Type,FieldSchema,Order" 15/10/24 04:39:57 INFO metastore.MetaStoreDirectSql: MySQL check failed, assuming we are not on mysql: Lexical error at line 1, column 5. Encountered: "@" (64), after : "". 15/10/24 04:39:58 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table. 15/10/24 04:39:58 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table. 15/10/24 04:40:06 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table. 15/10/24 04:40:06 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table. 15/10/24 04:40:08 INFO metastore.ObjectStore: Initialized ObjectStore 15/10/24 04:40:08 WARN metastore.ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 0.13.1aa 15/10/24 04:40:09 INFO metastore.HiveMetaStore: Added admin role in metastore 15/10/24 04:40:09 INFO metastore.HiveMetaStore: Added public role in metastore 15/10/24 04:40:10 INFO metastore.HiveMetaStore: No user is added in admin role, since config is empty 15/10/24 04:40:10 INFO session.SessionState: No Tez session required at this point. hive.execution.engine=mr. 15/10/24 04:40:10 INFO repl.SparkILoop: Created sql context (with Hive support).. SQL context available as sqlContext. scala> val file = sc.textFile("/sourcedata/somefile") 15/10/24 04:40:16 INFO storage.MemoryStore: ensureFreeSpace(80576) called with curMem=0, maxMem=278302556 15/10/24 04:40:16 INFO storage.MemoryStore: Block broadcast_0 stored as values in memory (estimated size 78.7 KB, free 265.3 MB) 15/10/24 04:40:16 INFO storage.MemoryStore: ensureFreeSpace(17558) called with curMem=80576, maxMem=278302556 15/10/24 04:40:16 INFO storage.MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 17.1 KB, free 265.3 MB) 15/10/24 04:40:16 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on 172.17.0.9:34576 (size: 17.1 KB, free: 265.4 MB) 15/10/24 04:40:16 INFO spark.SparkContext: Created broadcast 0 from textFile at :21 file: org.apache.spark.rdd.RDD[String] = MapPartitionsRDD[1] at textFile at :21 scala> file.take(3) 15/10/24 04:40:20 INFO mapred.FileInputFormat: Total input paths to process : 1 15/10/24 04:40:20 INFO spark.SparkContext: Starting job: take at :24 15/10/24 04:40:20 INFO scheduler.DAGScheduler: Got job 0 (take at :24) with 1 output partitions (allowLocal=true) 15/10/24 04:40:20 INFO scheduler.DAGScheduler: Final stage: ResultStage 0(take at :24) 15/10/24 04:40:20 INFO scheduler.DAGScheduler: Parents of final stage: List() 15/10/24 04:40:20 INFO scheduler.DAGScheduler: Missing parents: List() 15/10/24 04:40:20 INFO scheduler.DAGScheduler: Submitting ResultStage 0 (MapPartitionsRDD[1] at textFile at :21), which has no missing parents 15/10/24 04:40:20 INFO storage.MemoryStore: ensureFreeSpace(3136) called with curMem=98134, maxMem=278302556 15/10/24 04:40:20 INFO storage.MemoryStore: Block broadcast_1 stored as values in memory (estimated size 3.1 KB, free 265.3 MB) 15/10/24 04:40:20 INFO storage.MemoryStore: ensureFreeSpace(1822) called with curMem=101270, maxMem=278302556 15/10/24 04:40:20 INFO storage.MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 1822.0 B, free 265.3 MB) 15/10/24 04:40:20 INFO storage.BlockManagerInfo: Added broadcast_1_piece0 in memory on 172.17.0.9:34576 (size: 1822.0 B, free: 265.4 MB) 15/10/24 04:40:20 INFO spark.SparkContext: Created broadcast 1 from broadcast at DAGScheduler.scala:874 15/10/24 04:40:20 INFO scheduler.DAGScheduler: Submitting 1 missing tasks from ResultStage 0 (MapPartitionsRDD[1] at textFile at :21) 15/10/24 04:40:20 INFO scheduler.TaskSchedulerImpl: Adding task set 0.0 with 1 tasks 15/10/24 04:40:20 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, 172.17.0.9, PROCESS_LOCAL, 1400 bytes) 15/10/24 04:40:20 INFO storage.BlockManagerInfo: Added broadcast_1_piece0 in memory on 172.17.0.9:38735 (size: 1822.0 B, free: 265.4 MB) 15/10/24 04:40:20 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on 172.17.0.9:38735 (size: 17.1 KB, free: 265.4 MB) 15/10/24 04:40:21 WARN scheduler.TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0, 172.17.0.9): java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.hadoop.fs.LustreFileSystem not found at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1895) at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2379) at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2392) at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:89) at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2431) at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2413) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:368) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:167) at org.apache.hadoop.mapred.JobConf.getWorkingDirectory(JobConf.java:653) at org.apache.hadoop.mapred.FileInputFormat.setInputPaths(FileInputFormat.java:427) at org.apache.hadoop.mapred.FileInputFormat.setInputPaths(FileInputFormat.java:400) at org.apache.spark.SparkContext$$anonfun$hadoopFile$1$$anonfun$32.apply(SparkContext.scala:980) at org.apache.spark.SparkContext$$anonfun$hadoopFile$1$$anonfun$32.apply(SparkContext.scala:980) at org.apache.spark.rdd.HadoopRDD$$anonfun$getJobConf$6.apply(HadoopRDD.scala:176) at org.apache.spark.rdd.HadoopRDD$$anonfun$getJobConf$6.apply(HadoopRDD.scala:176) at scala.Option.map(Option.scala:145) at org.apache.spark.rdd.HadoopRDD.getJobConf(HadoopRDD.scala:176) at org.apache.spark.rdd.HadoopRDD$$anon$1.(HadoopRDD.scala:220) at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:216) at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:101) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:63) at org.apache.spark.scheduler.Task.run(Task.scala:70) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.ClassNotFoundException: Class org.apache.hadoop.fs.LustreFileSystem not found at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1801) at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1893) ... 30 more 15/10/24 04:40:21 INFO scheduler.TaskSetManager: Starting task 0.1 in stage 0.0 (TID 1, 172.17.0.12, PROCESS_LOCAL, 1400 bytes) 15/10/24 04:40:21 INFO storage.BlockManagerInfo: Added broadcast_1_piece0 in memory on 172.17.0.12:36521 (size: 1822.0 B, free: 265.4 MB) 15/10/24 04:40:21 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on 172.17.0.12:36521 (size: 17.1 KB, free: 265.4 MB) 15/10/24 04:40:21 INFO scheduler.TaskSetManager: Lost task 0.1 in stage 0.0 (TID 1) on executor 172.17.0.12: java.lang.RuntimeException (java.lang.ClassNotFoundException: Class org.apache.hadoop.fs.LustreFileSystem not found) [duplicate 1] 15/10/24 04:40:21 INFO scheduler.TaskSetManager: Starting task 0.2 in stage 0.0 (TID 2, 172.17.0.9, PROCESS_LOCAL, 1400 bytes) 15/10/24 04:40:22 INFO scheduler.TaskSetManager: Lost task 0.2 in stage 0.0 (TID 2) on executor 172.17.0.9: java.lang.RuntimeException (java.lang.ClassNotFoundException: Class org.apache.hadoop.fs.LustreFileSystem not found) [duplicate 2] 15/10/24 04:40:22 INFO scheduler.TaskSetManager: Starting task 0.3 in stage 0.0 (TID 3, 172.17.0.12, PROCESS_LOCAL, 1400 bytes) 15/10/24 04:40:22 INFO scheduler.TaskSetManager: Lost task 0.3 in stage 0.0 (TID 3) on executor 172.17.0.12: java.lang.RuntimeException (java.lang.ClassNotFoundException: Class org.apache.hadoop.fs.LustreFileSystem not found) [duplicate 3] 15/10/24 04:40:22 ERROR scheduler.TaskSetManager: Task 0 in stage 0.0 failed 4 times; aborting job 15/10/24 04:40:22 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool 15/10/24 04:40:22 INFO scheduler.TaskSchedulerImpl: Cancelling stage 0 15/10/24 04:40:22 INFO scheduler.DAGScheduler: ResultStage 0 (take at :24) failed in 1.548 s 15/10/24 04:40:22 INFO scheduler.DAGScheduler: Job 0 failed: take at :24, took 1.611185 s org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 3, 172.17.0.12): java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.hadoop.fs.LustreFileSystem not found at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1895) at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2379) at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2392) at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:89) at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2431) at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2413) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:368) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:167) at org.apache.hadoop.mapred.JobConf.getWorkingDirectory(JobConf.java:653) at org.apache.hadoop.mapred.FileInputFormat.setInputPaths(FileInputFormat.java:427) at org.apache.hadoop.mapred.FileInputFormat.setInputPaths(FileInputFormat.java:400) at org.apache.spark.SparkContext$$anonfun$hadoopFile$1$$anonfun$32.apply(SparkContext.scala:980) at org.apache.spark.SparkContext$$anonfun$hadoopFile$1$$anonfun$32.apply(SparkContext.scala:980) at org.apache.spark.rdd.HadoopRDD$$anonfun$getJobConf$6.apply(HadoopRDD.scala:176) at org.apache.spark.rdd.HadoopRDD$$anonfun$getJobConf$6.apply(HadoopRDD.scala:176) at scala.Option.map(Option.scala:145) at org.apache.spark.rdd.HadoopRDD.getJobConf(HadoopRDD.scala:176) at org.apache.spark.rdd.HadoopRDD$$anon$1.(HadoopRDD.scala:220) at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:216) at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:101) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:63) at org.apache.spark.scheduler.Task.run(Task.scala:70) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.ClassNotFoundException: Class org.apache.hadoop.fs.LustreFileSystem not found at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1801) at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1893) ... 30 more Driver stacktrace: at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1273) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1264) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1263) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1263) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:730) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:730) at scala.Option.foreach(Option.scala:236) at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:730) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1457) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1418) at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48) scala>