Parse Dataframe and store output in a single file [duplicate]

up vote
0
down vote

favorite

This question already has an answer here:

Spark split a column value into multiple rows

1 answer

I have a data frame using Spark SQL in Scala with columns A and B with values:

A | B

1 a|b|c

2 b|d

3 d|e|f

I need to store the output to a single textfile in following format

1 a

1 b

1 c

2 b

2 d

3 d

3 e

3 f

How can I do that?

edited 19 hours ago

SCouto

3,67031227

asked 19 hours ago

Nick

8419

marked as duplicate by user6910411 apache-spark
Users with the apache-spark badge can single-handedly close apache-spark questions as duplicates and reopen them as needed.

StackExchange.ready(function() {
if (StackExchange.options.isMobile) return;

$('.dupe-hammer-message-hover:not(.hover-bound)').each(function() {
var $hover = $(this).addClass('hover-bound'),
$msg = $hover.siblings('.dupe-hammer-message');

$hover.hover(
function() {
$hover.showInfoMessage('', {
messageElement: $msg.clone().show(),
transient: false,
position: { my: 'bottom left', at: 'top center', offsetTop: -7 },
dismissable: false,
relativeToBody: true
});
},
function() {
StackExchange.helpers.removeMessages();
}
);
});
});
17 hours ago

This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.

add a comment |

up vote
0
down vote

favorite

This question already has an answer here:

Spark split a column value into multiple rows

1 answer

I have a data frame using Spark SQL in Scala with columns A and B with values:

A | B

1 a|b|c

2 b|d

3 d|e|f

I need to store the output to a single textfile in following format

1 a

1 b

1 c

2 b

2 d

3 d

3 e

3 f

How can I do that?

edited 19 hours ago

SCouto

3,67031227

asked 19 hours ago

Nick

8419

marked as duplicate by user6910411 apache-spark
Users with the apache-spark badge can single-handedly close apache-spark questions as duplicates and reopen them as needed.

StackExchange.ready(function() {
if (StackExchange.options.isMobile) return;

$('.dupe-hammer-message-hover:not(.hover-bound)').each(function() {
var $hover = $(this).addClass('hover-bound'),
$msg = $hover.siblings('.dupe-hammer-message');

$hover.hover(
function() {
$hover.showInfoMessage('', {
messageElement: $msg.clone().show(),
transient: false,
position: { my: 'bottom left', at: 'top center', offsetTop: -7 },
dismissable: false,
relativeToBody: true
});
},
function() {
StackExchange.helpers.removeMessages();
}
);
});
});
17 hours ago

This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.

add a comment |

up vote
0
down vote

favorite

This question already has an answer here:

Spark split a column value into multiple rows

1 answer

I have a data frame using Spark SQL in Scala with columns A and B with values:

A | B

1 a|b|c

2 b|d

3 d|e|f

I need to store the output to a single textfile in following format

1 a

1 b

1 c

2 b

2 d

3 d

3 e

3 f

How can I do that?

edited 19 hours ago

SCouto

3,67031227

asked 19 hours ago

Nick

8419

This question already has an answer here:

Spark split a column value into multiple rows

1 answer

I have a data frame using Spark SQL in Scala with columns A and B with values:

A | B

1 a|b|c

2 b|d

3 d|e|f

I need to store the output to a single textfile in following format

1 a

1 b

1 c

2 b

2 d

3 d

3 e

3 f

How can I do that?

This question already has an answer here:

Spark split a column value into multiple rows

1 answer

scala apache-spark apache-spark-sql

edited 19 hours ago

SCouto

3,67031227

asked 19 hours ago

Nick

8419

edited 19 hours ago

SCouto

3,67031227

asked 19 hours ago

Nick

8419

edited 19 hours ago

SCouto

3,67031227

edited 19 hours ago

SCouto

3,67031227

edited 19 hours ago

SCouto

3,67031227

asked 19 hours ago

Nick

8419

asked 19 hours ago

Nick

8419

asked 19 hours ago

Nick

8419

marked as duplicate by user6910411 apache-spark
Users with the apache-spark badge can single-handedly close apache-spark questions as duplicates and reopen them as needed.

StackExchange.ready(function() {
if (StackExchange.options.isMobile) return;

$('.dupe-hammer-message-hover:not(.hover-bound)').each(function() {
var $hover = $(this).addClass('hover-bound'),
$msg = $hover.siblings('.dupe-hammer-message');

$hover.hover(
function() {
$hover.showInfoMessage('', {
messageElement: $msg.clone().show(),
transient: false,
position: { my: 'bottom left', at: 'top center', offsetTop: -7 },
dismissable: false,
relativeToBody: true
});
},
function() {
StackExchange.helpers.removeMessages();
}
);
});
});
17 hours ago

This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.

marked as duplicate by user6910411 apache-spark
Users with the apache-spark badge can single-handedly close apache-spark questions as duplicates and reopen them as needed.

StackExchange.ready(function() {
if (StackExchange.options.isMobile) return;

$('.dupe-hammer-message-hover:not(.hover-bound)').each(function() {
var $hover = $(this).addClass('hover-bound'),
$msg = $hover.siblings('.dupe-hammer-message');

$hover.hover(
function() {
$hover.showInfoMessage('', {
messageElement: $msg.clone().show(),
transient: false,
position: { my: 'bottom left', at: 'top center', offsetTop: -7 },
dismissable: false,
relativeToBody: true
});
},
function() {
StackExchange.helpers.removeMessages();
}
);
});
});
17 hours ago

This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.

add a comment |

2 Answers
2

active

oldest

votes

up vote
2
down vote

accepted

You can get the desired Dataframe with an expode and a split:

val resultDF = df.withColumn("B", explode(split($"B", "\|")))

Result

+---+---+

|  A|  B|

+---+---+

|  1|  a|

|  1|  b|

|  1|  c|

|  2|  b|

|  2|  d|

|  3|  d|

|  3|  e|

|  3|  f|

+---+---+

Then you can save in a single file with a coalesce(1)

  resultDF.coalesce(1).rdd.saveAsTextFile("desiredPath")

answered 19 hours ago

SCouto

3,67031227

explode function is not recognized in my code. What dependency do I need to add?
– Nick
18 hours ago

1

this should be enough: import org.apache.spark.sql.functions._
– SCouto
18 hours ago

add a comment |

up vote
0
down vote

You can do something like,

val df = ???

val resDF =df.withColumn("B", explode(split(col("B"), "\|")))



resDF.coalesce(1).write.option("delimiter", " ").csv("path/to/file")

answered 19 hours ago

Chitral Verma

8931317

explode(split(col : this part of your code is not recognized
– Nick
18 hours ago

col comes from org.apache.spark.sql.functions
– Chitral Verma
17 hours ago

add a comment |

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

up vote
2
down vote

accepted

You can get the desired Dataframe with an expode and a split:

val resultDF = df.withColumn("B", explode(split($"B", "\|")))

Result

+---+---+

|  A|  B|

+---+---+

|  1|  a|

|  1|  b|

|  1|  c|

|  2|  b|

|  2|  d|

|  3|  d|

|  3|  e|

|  3|  f|

+---+---+

Then you can save in a single file with a coalesce(1)

  resultDF.coalesce(1).rdd.saveAsTextFile("desiredPath")

answered 19 hours ago

SCouto

3,67031227

explode function is not recognized in my code. What dependency do I need to add?
– Nick
18 hours ago

1

this should be enough: import org.apache.spark.sql.functions._
– SCouto
18 hours ago

add a comment |

up vote
2
down vote

accepted

You can get the desired Dataframe with an expode and a split:

val resultDF = df.withColumn("B", explode(split($"B", "\|")))

Result

+---+---+

|  A|  B|

+---+---+

|  1|  a|

|  1|  b|

|  1|  c|

|  2|  b|

|  2|  d|

|  3|  d|

|  3|  e|

|  3|  f|

+---+---+

Then you can save in a single file with a coalesce(1)

  resultDF.coalesce(1).rdd.saveAsTextFile("desiredPath")

answered 19 hours ago

SCouto

3,67031227

explode function is not recognized in my code. What dependency do I need to add?
– Nick
18 hours ago

1

this should be enough: import org.apache.spark.sql.functions._
– SCouto
18 hours ago

add a comment |

up vote
2
down vote

accepted

You can get the desired Dataframe with an expode and a split:

val resultDF = df.withColumn("B", explode(split($"B", "\|")))

Result

+---+---+

|  A|  B|

+---+---+

|  1|  a|

|  1|  b|

|  1|  c|

|  2|  b|

|  2|  d|

|  3|  d|

|  3|  e|

|  3|  f|

+---+---+

Then you can save in a single file with a coalesce(1)

  resultDF.coalesce(1).rdd.saveAsTextFile("desiredPath")

answered 19 hours ago

SCouto

3,67031227

You can get the desired Dataframe with an expode and a split:

val resultDF = df.withColumn("B", explode(split($"B", "\|")))

Result

+---+---+

|  A|  B|

+---+---+

|  1|  a|

|  1|  b|

|  1|  c|

|  2|  b|

|  2|  d|

|  3|  d|

|  3|  e|

|  3|  f|

+---+---+

Then you can save in a single file with a coalesce(1)

  resultDF.coalesce(1).rdd.saveAsTextFile("desiredPath")

answered 19 hours ago

SCouto

3,67031227

answered 19 hours ago

SCouto

3,67031227

answered 19 hours ago

SCouto

3,67031227

answered 19 hours ago

SCouto

3,67031227

explode function is not recognized in my code. What dependency do I need to add?
– Nick
18 hours ago

1

this should be enough: import org.apache.spark.sql.functions._
– SCouto
18 hours ago

add a comment |

explode function is not recognized in my code. What dependency do I need to add?
– Nick
18 hours ago

1

this should be enough: import org.apache.spark.sql.functions._
– SCouto
18 hours ago

explode function is not recognized in my code. What dependency do I need to add?
– Nick
18 hours ago

this should be enough: import org.apache.spark.sql.functions._
– SCouto
18 hours ago

add a comment |

up vote
0
down vote

You can do something like,

val df = ???

val resDF =df.withColumn("B", explode(split(col("B"), "\|")))



resDF.coalesce(1).write.option("delimiter", " ").csv("path/to/file")

answered 19 hours ago

Chitral Verma

8931317

explode(split(col : this part of your code is not recognized
– Nick
18 hours ago

col comes from org.apache.spark.sql.functions
– Chitral Verma
17 hours ago

add a comment |

up vote
0
down vote

You can do something like,

val df = ???

val resDF =df.withColumn("B", explode(split(col("B"), "\|")))



resDF.coalesce(1).write.option("delimiter", " ").csv("path/to/file")

answered 19 hours ago

Chitral Verma

8931317

explode(split(col : this part of your code is not recognized
– Nick
18 hours ago

col comes from org.apache.spark.sql.functions
– Chitral Verma
17 hours ago

add a comment |

up vote
0
down vote

You can do something like,

val df = ???

val resDF =df.withColumn("B", explode(split(col("B"), "\|")))



resDF.coalesce(1).write.option("delimiter", " ").csv("path/to/file")

answered 19 hours ago

Chitral Verma

8931317

You can do something like,

val df = ???

val resDF =df.withColumn("B", explode(split(col("B"), "\|")))



resDF.coalesce(1).write.option("delimiter", " ").csv("path/to/file")

answered 19 hours ago

Chitral Verma

8931317

answered 19 hours ago

Chitral Verma

8931317

answered 19 hours ago

Chitral Verma

8931317

answered 19 hours ago

Chitral Verma

8931317

explode(split(col : this part of your code is not recognized
– Nick
18 hours ago

col comes from org.apache.spark.sql.functions
– Chitral Verma
17 hours ago

add a comment |

explode(split(col : this part of your code is not recognized
– Nick
18 hours ago

col comes from org.apache.spark.sql.functions
– Chitral Verma
17 hours ago

explode(split(col : this part of your code is not recognized
– Nick
18 hours ago

col comes from org.apache.spark.sql.functions
– Chitral Verma
17 hours ago

add a comment |

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Nrthugu