Forums

PySpark : question from a beginner about spark context

Hello i'm trying to understand how Spark wprks and I'm learning PySpark.

I know know Python and the Pandas library.

I understand that if I want to read a big cvs file with Pandas usin dataframe, it may not work (or it will take a long time to read).

As such PySpark is an alternative.

I read some artcicles and I understaoof the first thing to do is to create a SparkContext.

I understant the SparkContext will manage the cluster which will read the csv file and transform datas.

So I hade this code in a juptyter notebook

# Import de SparkContext du module pyspark
from pyspark import SparkContext

sc = SparkContext('local')
sc

if i execute this code twice,t he 2nd time I will get an error because I cant' have 2 spark contexts. Why can't i have 2 sparks contexts?

I wanted to try this:

# Import de SparkContext du module pyspark from pyspark import SparkContext

sc1 = SparkContext('local')

sc2 = SparkContext('local')

I have 2 different names: sc1 and sc2. Een id i execute only one time, I have an error. Why cant' I have 2 sparks context sc1and sc2?

thank you

This is a support forum for PythonAnywhere. If you want PySpark help, you will have better luck on a PySpark forum